Numpy 数组¶

设置¶

In [ ]:

Copied!

pip install ydf -U
pip install ydf -U

In [1]:

Copied!

import ydf
import numpy as np
import ydf
import numpy as np

Numpy¶

Numpy 数组非常适合训练和使用 YDF 模型。YDF 不直接接受 Numpy 数组，而是接受 Numpy 数组的字典。使用字典可以更好地管理你的特征。

让我们定义一个数据集：

In [2]:

Copied!





number_of_examples = 10
dataset = {
    "f1": np.random.uniform(size=number_of_examples),
    "f2": np.random.uniform(size=number_of_examples),
    "l": np.random.randint(0, 2, size=number_of_examples),
}

dataset
number_of_examples = 10
dataset = {
    "f1": np.random.uniform(size=number_of_examples),
    "f2": np.random.uniform(size=number_of_examples),
    "l": np.random.randint(0, 2, size=number_of_examples),
}

dataset

Out[2]:

{'f1': array([0.8408175 , 0.23268677, 0.97215838, 0.06059025, 0.43041995,
        0.2838354 , 0.54476241, 0.68916471, 0.15604299, 0.38484593]),
 'f2': array([0.53119829, 0.07066887, 0.367039  , 0.88090998, 0.76215773,
        0.11381487, 0.84171988, 0.34631154, 0.04948825, 0.56829104]),
 'l': array([0, 1, 1, 1, 1, 1, 0, 0, 1, 0])}

然后，让我们训练一个模型并生成预测。

In [3]:

Copied!

model = ydf.RandomForestLearner(label="l").train(dataset)
model = ydf.RandomForestLearner(label="l").train(dataset)

Train model on 10 examples
Model trained in 0:00:00.006883

In [4]:

Copied!

model.predict(dataset)
model.predict(dataset)

Out[4]:

array([0.37999973, 0.8599993 , 0.4866663 , 0.6733328 , 0.48999962,
       0.836666  , 0.3866664 , 0.48999962, 0.8633326 , 0.5699996 ],
      dtype=float32)

如果您的输入数据是一个单独的numpy数组，只需将其包装到一个字典中 :).

训练示例可以是一维或二维的Numpy数组。如果是二维数组，第二维定义不同的特征。这类似于分别喂入每个维度。

In [5]:

Copied!





number_of_examples = 10

# "f1" is an array of size [num_examples, 3]. YDF sees it as a feature with 20 dimensions.
# "f2" is still a single dimensional feature.
dataset = {
    "f1": np.random.uniform(size=(number_of_examples, 3)),
    "f2": np.random.uniform(size=number_of_examples),
    "l": np.random.randint(0, 2, size=number_of_examples),
}
dataset
number_of_examples = 10

# "f1" is an array of size [num_examples, 3]. YDF sees it as a feature with 20 dimensions.
# "f2" is still a single dimensional feature.
dataset = {
    "f1": np.random.uniform(size=(number_of_examples, 3)),
    "f2": np.random.uniform(size=number_of_examples),
    "l": np.random.randint(0, 2, size=number_of_examples),
}
dataset

Out[5]:

{'f1': array([[0.77831876, 0.44491803, 0.06950368],
        [0.51402546, 0.35996753, 0.75910236],
        [0.35404616, 0.30025651, 0.50369477],
        [0.83403873, 0.61047313, 0.07814819],
        [0.38385037, 0.40671211, 0.47912743],
        [0.99550808, 0.93747089, 0.74900908],
        [0.13106712, 0.48648687, 0.77925262],
        [0.25118286, 0.34226331, 0.03312203],
        [0.5772139 , 0.03045939, 0.81802417],
        [0.27276707, 0.24643098, 0.62696742]]),
 'f2': array([0.65184742, 0.14970149, 0.16338311, 0.01975033, 0.43429271,
        0.1691804 , 0.14664926, 0.90239627, 0.35412598, 0.31156112]),
 'l': array([0, 1, 1, 0, 1, 0, 0, 0, 1, 0])}

In [6]:

Copied!

model = ydf.RandomForestLearner(label="l").train(dataset)
model.predict(dataset)
model = ydf.RandomForestLearner(label="l").train(dataset)
model.predict(dataset)

Train model on 10 examples
Model trained in 0:00:00.003045

Out[6]:

array([0.27333316, 0.59999955, 0.5633329 , 0.25333318, 0.46999964,
       0.31333312, 0.34999976, 0.38999972, 0.6199995 , 0.47999963],
      dtype=float32)