TabularPredictor.predict¶

TabularPredictor.predict(data: DataFrame | str, model: str | None = None, as_pandas: bool = True, transform_features: bool = True, *, decision_threshold: float | None = None) → Series | ndarray[source]¶

使用训练好的模型为新数据的label列值生成预测。

Parameters:

data (pd.DataFrame 或 str) – 用于进行预测的数据。应包含与训练数据相同的列名，并遵循相同的格式（可能包含Predictor不会使用的额外列，包括标签列本身）。如果传递的是str，data将使用str值作为文件路径加载。
model (str (optional)) – 用于获取预测的模型名称。默认为 None，表示使用验证集上得分最高的模型。有效的模型列表可以通过调用 predictor.model_names() 在 predictor 中查看。
as_pandas (bool, default = True) – 是否将输出作为 pd.Series (True) 或 np.ndarray (False) 返回。
transform_features (bool, default = True) –
If True, preprocesses data before predicting with models. If False, skips global feature preprocessing.

This is useful to save on inference time if you have already called data = predictor.transform_features(data).
decision_threshold (float, default = None) – The decision threshold used to convert prediction probabilities to predictions. Only relevant for binary classification, otherwise ignored. If None, defaults to predictor.decision_threshold. Valid values are in the range [0.0, 1.0] You can obtain an optimized decision_threshold by first calling predictor.calibrate_decision_threshold(). Useful to set for metrics such as balanced_accuracy and f1 as 0.5 is often not an optimal threshold. Predictions are calculated via the following logic on the positive class: 1 if pred > decision_threshold else 0

Return type:

预测数组，每个预测对应于给定数据集中的每一行。根据as_pandas参数的不同，可能是np.ndarray或pd.Series。