MNIST分类¶
在本笔记本中,我们展示了如何从头开始使用Fortuna在MNIST分类任务中获得校准的预测不确定性估计。在本例的最后一部分,展示了如何直接从预训练模型的输出开始完成此操作。
从TensorFlow下载MNIST数据¶
首先,让我们从TensorFlow Datasets下载MNIST数据。其他来源也同样适用。
[1]:
import tensorflow as tf
import tensorflow_datasets as tfds
def download(split_range, shuffle=False):
ds = tfds.load(
name="MNIST",
split=f"train[{split_range}]",
as_supervised=True,
shuffle_files=True,
).map(lambda x, y: (tf.cast(x, tf.float32) / 255.0, y))
if shuffle:
ds = ds.shuffle(10, reshuffle_each_iteration=True)
return ds.batch(128).prefetch(1)
train_data_loader, val_data_loader, test_data_loader = (
download(":80%", shuffle=True),
download("80%:90%"),
download("90%:"),
)
2024-11-23 10:41:08.939462: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
/home/docs/checkouts/readthedocs.org/user_builds/aws-fortuna/envs/latest/lib/python3.11/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html
from .autonotebook import tqdm as notebook_tqdm
2024-11-23 10:41:11.908389: W tensorflow/tsl/platform/cloud/google_auth_provider.cc:184] All attempts to get a Google authentication bearer token failed, returning an empty token. Retrieving token from files failed with "NOT_FOUND: Could not locate the credentials file.". Retrieving token from GCE failed with "FAILED_PRECONDITION: Error executing an HTTP request: libcurl code 6 meaning 'Couldn't resolve host name', error details: Could not resolve host: metadata.google.internal".
Downloading and preparing dataset 11.06 MiB (download: 11.06 MiB, generated: 21.00 MiB, total: 32.06 MiB) to /home/docs/tensorflow_datasets/mnist/3.0.1...
Dl Completed...: 100%|██████████| 5/5 [00:00<00:00, 9.39 file/s]
Dataset mnist downloaded and prepared to /home/docs/tensorflow_datasets/mnist/3.0.1. Subsequent calls will reuse this data.
将数据转换为兼容的数据加载器¶
Fortuna 帮助你将数据和数据加载器转换为 Fortuna 可以消化的数据加载器。
[2]:
from fortuna.data import DataLoader
train_data_loader = DataLoader.from_tensorflow_data_loader(train_data_loader)
val_data_loader = DataLoader.from_tensorflow_data_loader(val_data_loader)
test_data_loader = DataLoader.from_tensorflow_data_loader(test_data_loader)
构建一个概率分类器¶
让我们构建一个概率分类器。这是一个包含多个可配置属性的接口对象,即model、prior、posterior_approximator、output_calibrator。在这个例子中,我们使用了一个LeNet5模型,一个作用于模型最后一层的拉普拉斯后验近似器,以及默认的温度缩放输出校准器。
[3]:
from fortuna.prob_model import ProbClassifier, LaplacePosteriorApproximator
from fortuna.model import LeNet5
output_dim = 10
prob_model = ProbClassifier(
model=LeNet5(output_dim=output_dim),
posterior_approximator=LaplacePosteriorApproximator(),
)
WARNING:root:No module named 'transformer' is installed. If you are not working with models from the `transformers` library ignore this warning, otherwise install the optional 'transformers' dependency of Fortuna using poetry. You can do so by entering: `poetry install --extras 'transformers'`.
训练概率模型:后验拟合和校准¶
我们现在可以训练概率模型。这包括拟合后验分布和校准概率模型。由于我们使用的是从最大后验(MAP)近似开始的拉普拉斯近似,我们通过参数map_fit_config来配置MAP。
[4]:
from fortuna.prob_model import (
FitConfig,
FitMonitor,
FitOptimizer,
CalibConfig,
CalibMonitor,
)
from fortuna.metric.classification import accuracy
status = prob_model.train(
train_data_loader=train_data_loader,
val_data_loader=val_data_loader,
calib_data_loader=val_data_loader,
fit_config=FitConfig(
optimizer=FitOptimizer(
freeze_fun=lambda path, val: "trainable"
if "output_subnet" in path
else "frozen"
)
),
map_fit_config=FitConfig(
monitor=FitMonitor(early_stopping_patience=2, metrics=(accuracy,)),
optimizer=FitOptimizer(),
),
calib_config=CalibConfig(monitor=CalibMonitor(early_stopping_patience=2)),
)
Epoch: 10 | loss: 41673.14844 | accuracy: 0.99219: 9%|▉ | 9/100 [05:29<55:29, 36.58s/it]
Tuning prior log-var: 100%|██████████| 21/21 [00:32<00:00, 1.57s/it]
Epoch: 36 | loss: 142.10379: 35%|███▌ | 35/100 [00:18<00:33, 1.93it/s]
估计预测统计¶
我们现在可以通过调用概率分类器的predictive属性和感兴趣的方法来计算一些预测统计量。大多数预测统计量,例如均值或众数,需要一个输入数据点的加载器。你可以通过调用数据加载器的方法to_inputs_loader轻松获得这个加载器。
[5]:
test_log_probs = prob_model.predictive.log_prob(data_loader=test_data_loader)
test_inputs_loader = test_data_loader.to_inputs_loader()
test_means = prob_model.predictive.mean(inputs_loader=test_inputs_loader)
test_modes = prob_model.predictive.mode(
inputs_loader=test_inputs_loader, means=test_means
)
计算指标¶
在分类中,预测模式是对标签的预测,而预测均值是对每个标签概率的预测。因此,我们可以使用这些来计算几个指标,例如准确率、Brier分数、预期校准误差(ECE)等。
[6]:
from fortuna.metric.classification import (
accuracy,
expected_calibration_error,
brier_score,
)
test_targets = test_data_loader.to_array_targets()
acc = accuracy(preds=test_modes, targets=test_targets)
brier = brier_score(probs=test_means, targets=test_targets)
ece = expected_calibration_error(
preds=test_modes,
probs=test_means,
targets=test_targets,
plot=True,
plot_options=dict(figsize=(10, 2)),
)
print(f"Test accuracy: {acc}")
print(f"Brier score: {brier}")
print(f"ECE: {ece}")
Test accuracy: 0.9806666374206543
Brier score: 0.028491374105215073
ECE: 0.005060586612671614
保形预测集¶
Fortuna 允许生成符合预测集,这些集合是一些可能的标签,直到达到某个覆盖概率阈值。这些可以从使用或不使用 Fortuna 获得的概率估计开始计算。
[7]:
from fortuna.conformal import AdaptivePredictionConformalClassifier
val_means = prob_model.predictive.mean(inputs_loader=val_data_loader.to_inputs_loader())
conformal_sets = AdaptivePredictionConformalClassifier().conformal_set(
val_probs=val_means,
test_probs=test_means,
val_targets=val_data_loader.to_array_targets(),
error=0.05,
)
我们可以检查,平均而言,错误分类输入的符合集比正确分类的要大。这证实了模型在错误时应该更加不确定的直觉。
[8]:
import numpy as np
avg_size = np.mean([len(s) for s in np.array(conformal_sets, dtype="object")])
avg_size_wellclassified = np.mean(
[
len(s)
for s in np.array(conformal_sets, dtype="object")[test_modes == test_targets]
]
)
avg_size_misclassified = np.mean(
[
len(s)
for s in np.array(conformal_sets, dtype="object")[test_modes != test_targets]
]
)
print(f"Average conformal set size: {avg_size}")
print(
f"Average conformal set size over well classified input: {avg_size_wellclassified}"
)
print(f"Average conformal set size over misclassified input: {avg_size_misclassified}")
Average conformal set size: 9.938666666666666
Average conformal set size over well classified input: 9.95819170632223
Average conformal set size over misclassified input: 8.948275862068966
如果我们有模型输出作为起点怎么办?¶
如果您已经训练了一个MNIST模型并获得了模型输出,您仍然可以使用Fortuna来校准它们,并估计不确定性。仅用于教育目的,让我们将上述估计的预测均值的对数作为模型输出,并假装这些输出是由其他框架生成的。此外,我们存储了验证和测试目标变量的数组,并假设这些也是给定的。
[9]:
import numpy as np
calib_outputs = np.log(val_means)
test_outputs = np.log(test_means)
calib_targets = val_data_loader.to_array_targets()
test_targets = test_data_loader.to_array_targets()
我们现在调用一个校准分类器,使用默认的温度缩放输出校准器,并校准模型输出。
[10]:
from fortuna.output_calib_model import OutputCalibClassifier
calib_model = OutputCalibClassifier()
calib_status = calib_model.calibrate(
calib_outputs=calib_outputs, calib_targets=calib_targets
)
Epoch: 100 | loss: 0.02432: 100%|██████████| 100/100 [00:00<00:00, 174.34it/s]
与上述类似,我们现在可以计算预测统计量。
[11]:
test_log_probs = calib_model.predictive.log_prob(
outputs=test_outputs, targets=test_targets
)
test_means = calib_model.predictive.mean(outputs=test_outputs)
test_modes = calib_model.predictive.mode(outputs=test_outputs)
然后可以计算指标和保形区间,完全如上所述。