Numpy 数组¶
设置¶
In [ ]:
Copied!
pip install ydf -U
pip install ydf -U
In [1]:
Copied!
import ydf
import numpy as np
import ydf
import numpy as np
In [2]:
Copied!
number_of_examples = 10
dataset = {
"f1": np.random.uniform(size=number_of_examples),
"f2": np.random.uniform(size=number_of_examples),
"l": np.random.randint(0, 2, size=number_of_examples),
}
dataset
number_of_examples = 10
dataset = {
"f1": np.random.uniform(size=number_of_examples),
"f2": np.random.uniform(size=number_of_examples),
"l": np.random.randint(0, 2, size=number_of_examples),
}
dataset
Out[2]:
{'f1': array([0.8408175 , 0.23268677, 0.97215838, 0.06059025, 0.43041995,
0.2838354 , 0.54476241, 0.68916471, 0.15604299, 0.38484593]),
'f2': array([0.53119829, 0.07066887, 0.367039 , 0.88090998, 0.76215773,
0.11381487, 0.84171988, 0.34631154, 0.04948825, 0.56829104]),
'l': array([0, 1, 1, 1, 1, 1, 0, 0, 1, 0])}
然后,让我们训练一个模型并生成预测。
In [3]:
Copied!
model = ydf.RandomForestLearner(label="l").train(dataset)
model = ydf.RandomForestLearner(label="l").train(dataset)
Train model on 10 examples Model trained in 0:00:00.006883
In [4]:
Copied!
model.predict(dataset)
model.predict(dataset)
Out[4]:
array([0.37999973, 0.8599993 , 0.4866663 , 0.6733328 , 0.48999962,
0.836666 , 0.3866664 , 0.48999962, 0.8633326 , 0.5699996 ],
dtype=float32)
如果您的输入数据是一个单独的numpy数组,只需将其包装到一个字典中 :).
训练示例可以是一维或二维的Numpy数组。如果是二维数组,第二维定义不同的特征。这类似于分别喂入每个维度。
In [5]:
Copied!
number_of_examples = 10
# "f1" is an array of size [num_examples, 3]. YDF sees it as a feature with 20 dimensions.
# "f2" is still a single dimensional feature.
dataset = {
"f1": np.random.uniform(size=(number_of_examples, 3)),
"f2": np.random.uniform(size=number_of_examples),
"l": np.random.randint(0, 2, size=number_of_examples),
}
dataset
number_of_examples = 10
# "f1" is an array of size [num_examples, 3]. YDF sees it as a feature with 20 dimensions.
# "f2" is still a single dimensional feature.
dataset = {
"f1": np.random.uniform(size=(number_of_examples, 3)),
"f2": np.random.uniform(size=number_of_examples),
"l": np.random.randint(0, 2, size=number_of_examples),
}
dataset
Out[5]:
{'f1': array([[0.77831876, 0.44491803, 0.06950368],
[0.51402546, 0.35996753, 0.75910236],
[0.35404616, 0.30025651, 0.50369477],
[0.83403873, 0.61047313, 0.07814819],
[0.38385037, 0.40671211, 0.47912743],
[0.99550808, 0.93747089, 0.74900908],
[0.13106712, 0.48648687, 0.77925262],
[0.25118286, 0.34226331, 0.03312203],
[0.5772139 , 0.03045939, 0.81802417],
[0.27276707, 0.24643098, 0.62696742]]),
'f2': array([0.65184742, 0.14970149, 0.16338311, 0.01975033, 0.43429271,
0.1691804 , 0.14664926, 0.90239627, 0.35412598, 0.31156112]),
'l': array([0, 1, 1, 0, 1, 0, 0, 0, 1, 0])}
In [6]:
Copied!
model = ydf.RandomForestLearner(label="l").train(dataset)
model.predict(dataset)
model = ydf.RandomForestLearner(label="l").train(dataset)
model.predict(dataset)
Train model on 10 examples Model trained in 0:00:00.003045
Out[6]:
array([0.27333316, 0.59999955, 0.5633329 , 0.25333318, 0.46999964,
0.31333312, 0.34999976, 0.38999972, 0.6199995 , 0.47999963],
dtype=float32)