校准分类器

一个极简示例,展示如何校准HiClass LCN模型。可以通过calibration_method参数选择校准方法,例如:

  • Isotonic Regression
  • Platt scaling
  • Beta scaling
  • IVAP
  • CVAP
rf = RandomForestClassifier()
classifier = LocalClassifierPerNode(
    local_classifier=rf,
    calibration_method='isotonic'
)
rf = RandomForestClassifier()
classifier = LocalClassifierPerNode(
    local_classifier=rf,
    calibration_method='platt'
)
rf = RandomForestClassifier()
classifier = LocalClassifierPerNode(
    local_classifier=rf,
    calibration_method='beta'
)
rf = RandomForestClassifier()
classifier = LocalClassifierPerNode(
    local_classifier=rf,
    calibration_method='ivap'
)
rf = RandomForestClassifier()
classifier = LocalClassifierPerNode(
    local_classifier=rf,
    calibration_method='cvap'
)

此外,可以通过定义一个概率组合器来聚合多个级别的概率:

  • Multiply (Default)
  • Geometric Mean
  • Arithmetic Mean
  • No Aggregation
rf = RandomForestClassifier()
classifier = LocalClassifierPerNode(
    local_classifier=rf,
    calibration_method='isotonic',
    probability_combiner='multiply'
)
rf = RandomForestClassifier()
classifier = LocalClassifierPerNode(
    local_classifier=rf,
    calibration_method='isotonic',
    probability_combiner='geometric'
)
rf = RandomForestClassifier()
classifier = LocalClassifierPerNode(
    local_classifier=rf,
    calibration_method='isotonic',
    probability_combiner='arithmetic'
)
rf = RandomForestClassifier()
classifier = LocalClassifierPerNode(
    local_classifier=rf,
    calibration_method='isotonic',
    probability_combiner=None
)

层次分类器可以通过在模型上调用校准或使用Pipeline进行校准:

  • Default
  • Pipeline
rf = RandomForestClassifier()
classifier = LocalClassifierPerNode(
    local_classifier=rf,
    calibration_method='isotonic'
)

classifier.fit(X_train, Y_train)
classifier.calibrate(X_cal, Y_cal)
classifier.predict_proba(X_test)
from hiclass import Pipeline

rf = RandomForestClassifier()
classifier = LocalClassifierPerNode(
    local_classifier=rf,
    calibration_method='isotonic'
)

pipeline = Pipeline([
    ('classifier', classifier),
])

pipeline.fit(X_train, Y_train)
pipeline.calibrate(X_cal, Y_cal)
pipeline.predict_proba(X_test)

在下面的代码中,等渗回归用于校准模型。

输出:

['Cow' 'Lizard' 'Sheep' 'Snake']
[[0.25 0.25 0.25 0.25]
 [0.25 0.25 0.25 0.25]
 [0.25 0.25 0.25 0.25]
 [0.25 0.25 0.25 0.25]]

from sklearn.ensemble import RandomForestClassifier

from hiclass import LocalClassifierPerNode

# Define data
X_train = [[1], [2], [3], [4]]
X_test = [[4], [3], [2], [1]]
X_cal = [[5], [6], [7], [8]]
Y_train = [
    ["Animal", "Mammal", "Sheep"],
    ["Animal", "Mammal", "Cow"],
    ["Animal", "Reptile", "Snake"],
    ["Animal", "Reptile", "Lizard"],
]

Y_cal = [
    ["Animal", "Mammal", "Cow"],
    ["Animal", "Mammal", "Sheep"],
    ["Animal", "Reptile", "Lizard"],
    ["Animal", "Reptile", "Snake"],
]

# Use random forest classifiers for every node
rf = RandomForestClassifier()

# Use local classifier per node with isotonic regression as calibration method
classifier = LocalClassifierPerNode(
    local_classifier=rf, calibration_method="isotonic", probability_combiner="multiply"
)

# Train local classifier per node
classifier.fit(X_train, Y_train)

# Calibrate local classifier per node
classifier.calibrate(X_cal, Y_cal)

# Predict probabilities
probabilities = classifier.predict_proba(X_test)

# Print probabilities and labels for the last level
print(classifier.classes_[2])
print(probabilities)

脚本的总运行时间: ( 0 分钟 0.882 秒)

Gallery generated by Sphinx-Gallery