TabularPredictor.predict

TabularPredictor.predict(data: DataFrame | str, model: str | None = None, as_pandas: bool = True, transform_features: bool = True, *, decision_threshold: float | None = None) Series | ndarray[source]

使用训练好的模型为新数据的label列值生成预测。

Parameters:
  • data (pd.DataFrame 或 str) – 用于进行预测的数据。应包含与训练数据相同的列名,并遵循相同的格式 (可能包含Predictor不会使用的额外列,包括标签列本身)。 如果传递的是str,data将使用str值作为文件路径加载。

  • model (str (optional)) – 用于获取预测的模型名称。默认为 None,表示使用验证集上得分最高的模型。 有效的模型列表可以通过调用 predictor.model_names()predictor 中查看。

  • as_pandas (bool, default = True) – 是否将输出作为 pd.Series (True) 或 np.ndarray (False) 返回。

  • transform_features (bool, default = True) –

    If True, preprocesses data before predicting with models. If False, skips global feature preprocessing.

    This is useful to save on inference time if you have already called data = predictor.transform_features(data).

  • decision_threshold (float, default = None) – The decision threshold used to convert prediction probabilities to predictions. Only relevant for binary classification, otherwise ignored. If None, defaults to predictor.decision_threshold. Valid values are in the range [0.0, 1.0] You can obtain an optimized decision_threshold by first calling predictor.calibrate_decision_threshold(). Useful to set for metrics such as balanced_accuracy and f1 as 0.5 is often not an optimal threshold. Predictions are calculated via the following logic on the positive class: 1 if pred > decision_threshold else 0

Return type:

预测数组,每个预测对应于给定数据集中的每一行。根据as_pandas参数的不同,可能是np.ndarraypd.Series