双月分类¶

在本笔记本中，我们展示了如何使用Fortuna在MNIST分类任务中获得校准的预测不确定性估计。

从scikit-learn下载双月数据¶

首先，让我们从scikit-learn下载双月数据。

[1]:

from sklearn.datasets import make_moons

train_data = make_moons(n_samples=500, noise=0.07, random_state=0)
val_data = make_moons(n_samples=500, noise=0.07, random_state=1)
test_data = make_moons(n_samples=500, noise=0.07, random_state=2)

将数据转换为兼容的数据加载器¶

Fortuna 帮助你将数据和数据加载器转换为 Fortuna 可以消化的数据加载器。

[2]:

from fortuna.data import DataLoader

train_data_loader = DataLoader.from_array_data(
    train_data, batch_size=128, shuffle=True, prefetch=True
)
val_data_loader = DataLoader.from_array_data(val_data, batch_size=128, prefetch=True)
test_data_loader = DataLoader.from_array_data(test_data, batch_size=128, prefetch=True)

构建一个概率分类器¶

让我们构建一个概率分类器。这是一个包含多个可配置属性的接口对象，即model、prior、posterior_approximator、output_calibrator。在这个例子中，我们使用了一个MLP模型、一个自动微分变分推断后验近似器，以及默认的温度缩放输出校准器。

[3]:

from fortuna.prob_model import ProbClassifier, ADVIPosteriorApproximator
from fortuna.model import MLP
import flax.linen as nn

output_dim = 2
prob_model = ProbClassifier(
    model=MLP(output_dim=output_dim, activations=(nn.tanh, nn.tanh)),
    posterior_approximator=ADVIPosteriorApproximator(),
)

WARNING:root:No module named 'transformer' is installed. If you are not working with models from the `transformers` library ignore this warning, otherwise install the optional 'transformers' dependency of Fortuna using poetry. You can do so by entering: `poetry install --extras 'transformers'`.

训练概率模型：后验拟合和校准¶

我们现在可以训练概率模型。这包括拟合后验分布和校准概率模型。

[4]:

from fortuna.prob_model import (
    FitConfig,
    FitMonitor,
    FitOptimizer,
    CalibConfig,
    CalibMonitor,
)
from fortuna.metric.classification import accuracy
import optax

status = prob_model.train(
    train_data_loader=train_data_loader,
    val_data_loader=val_data_loader,
    calib_data_loader=val_data_loader,
    fit_config=FitConfig(
        monitor=FitMonitor(metrics=(accuracy,), early_stopping_patience=10),
        optimizer=FitOptimizer(method=optax.adam(1e-1)),
    ),
    calib_config=CalibConfig(monitor=CalibMonitor(early_stopping_patience=2)),
)

Epoch: 53 | loss: -537.0011 | accuracy: 0.99713:  52%|█████▏    | 52/100 [00:04<00:04, 11.30it/s]
Epoch: 100 | loss: 1.51912: 100%|██████████| 100/100 [00:01<00:00, 52.92it/s]

估计预测统计¶

我们现在可以通过调用概率分类器的predictive属性和感兴趣的方法来计算一些预测统计量。大多数预测统计量，例如均值或众数，需要一个输入数据点的加载器。你可以通过调用数据加载器的方法to_inputs_loader轻松获得这个加载器。

[5]:

test_log_probs = prob_model.predictive.log_prob(data_loader=test_data_loader)
test_inputs_loader = test_data_loader.to_inputs_loader()
test_means = prob_model.predictive.mean(inputs_loader=test_inputs_loader)
test_modes = prob_model.predictive.mode(
    inputs_loader=test_inputs_loader, means=test_means
)

[6]:

import matplotlib.pyplot as plt
from fortuna.data import InputsLoader
import numpy as np

fig = plt.figure(figsize=(6, 3))
size = 150
xx = np.linspace(-4, 4, size)
yy = np.linspace(-4, 4, size)
grid = np.array([[_xx, _yy] for _xx in xx for _yy in yy])
grid_loader = InputsLoader.from_array_inputs(grid)
grid_entropies = prob_model.predictive.entropy(grid_loader).reshape(size, size)
grid = grid.reshape(size, size, 2)
plt.title("Predictions and entropy", fontsize=12)
im = plt.pcolor(grid[:, :, 0], grid[:, :, 1], grid_entropies)
plt.scatter(
    test_data[0][:, 0],
    test_data[0][:, 1],
    s=1,
    c=["C0" if i == 1 else "C1" for i in test_modes],
)
plt.colorbar()
plt.show()

2024-11-23 11:00:20.389456: E external/xla/xla/service/slow_operation_alarm.cc:65] Constant folding an instruction is taking > 1s:

  %reduce.52 = f32[30,22500]{1,0} reduce(f32[30,22500,2]{2,1,0} %constant.15, f32[] %constant.13), dimensions={2}, to_apply=%region_1.48, metadata={op_name="jit(_entropy_term)/jit(main)/reduce_max[axes=(2,)]" source_file="/home/docs/checkouts/readthedocs.org/user_builds/aws-fortuna/checkouts/latest/fortuna/prob_output_layer/classification.py" source_line=28}

This isn't necessarily a bug; constant-folding is inherently a trade-off between compilation time and speed at runtime. XLA has some guards that attempt to keep constant folding from taking too long, but fundamentally you'll always be able to come up with an input program that takes a long time.

If you'd like to file a bug, run with envvar XLA_FLAGS=--xla_dump_to=/tmp/foo and attach the results.
2024-11-23 11:00:22.545968: E external/xla/xla/service/slow_operation_alarm.cc:133] The operation took 3.156596789s
Constant folding an instruction is taking > 1s:

  %reduce.52 = f32[30,22500]{1,0} reduce(f32[30,22500,2]{2,1,0} %constant.15, f32[] %constant.13), dimensions={2}, to_apply=%region_1.48, metadata={op_name="jit(_entropy_term)/jit(main)/reduce_max[axes=(2,)]" source_file="/home/docs/checkouts/readthedocs.org/user_builds/aws-fortuna/checkouts/latest/fortuna/prob_output_layer/classification.py" source_line=28}

This isn't necessarily a bug; constant-folding is inherently a trade-off between compilation time and speed at runtime. XLA has some guards that attempt to keep constant folding from taking too long, but fundamentally you'll always be able to come up with an input program that takes a long time.

If you'd like to file a bug, run with envvar XLA_FLAGS=--xla_dump_to=/tmp/foo and attach the results.

../_images/examples_two_moons_classification_12_1.svg

计算指标¶

在分类中，预测模式是对标签的预测，而预测均值是对每个标签概率的预测。因此，我们可以使用这些来计算几个指标，例如准确率、Brier分数、预期校准误差（ECE）等。

[7]:

from fortuna.metric.classification import (
    accuracy,
    expected_calibration_error,
    brier_score,
)

test_targets = test_data_loader.to_array_targets()
acc = accuracy(preds=test_modes, targets=test_targets)
brier = brier_score(probs=test_means, targets=test_targets)
ece = expected_calibration_error(
    preds=test_modes,
    probs=test_means,
    targets=test_targets,
    plot=True,
    plot_options=dict(figsize=(10, 2)),
)
print(f"Test accuracy: {acc}")
print(f"Brier score: {brier}")
print(f"ECE: {ece}")

Test accuracy: 0.9980000257492065
Brier score: 0.02082330361008644
ECE: 0.06903287768363953

../_images/examples_two_moons_classification_14_1.svg

如果我们有模型输出作为起点怎么办？¶

如果您已经训练了一个模型并获得了模型输出，您仍然可以使用Fortuna来校准它们，并估计不确定性。仅用于教育目的，让我们将上述预测均值的对数作为模型输出，并假装这些输出是由其他框架生成的。此外，我们存储了验证和测试目标变量的数组，并假设这些也是给定的。

[8]:

import numpy as np

calib_outputs = np.log(
    1e-6 + prob_model.predictive.mean(inputs_loader=val_data_loader.to_inputs_loader())
)
test_outputs = np.log(1e-6 + test_means)

calib_targets = val_data_loader.to_array_targets()
test_targets = test_data_loader.to_array_targets()

我们现在调用一个校准分类器，使用默认的温度缩放输出校准器，并校准模型输出。

[9]:

from fortuna.output_calib_model import OutputCalibClassifier, Config, Monitor

calib_model = OutputCalibClassifier()
calib_status = calib_model.calibrate(
    calib_outputs=calib_outputs,
    calib_targets=calib_targets,
    config=Config(monitor=Monitor(early_stopping_patience=2)),
)

Epoch: 100 | loss: 0.00095: 100%|██████████| 100/100 [00:00<00:00, 209.06it/s]

与上述类似，我们现在可以计算预测统计量。

[10]:

test_log_probs = calib_model.predictive.log_prob(
    outputs=test_outputs, targets=test_targets
)
test_means = calib_model.predictive.mean(outputs=test_outputs)
test_modes = calib_model.predictive.mode(outputs=test_outputs)

然后可以计算指标，就像上面所做的那样。