AutoMM中的超参数优化¶
超参数优化(HPO)是一种帮助解决调整机器学习模型超参数挑战的方法。机器学习算法有多个复杂的超参数,这些参数生成了一个巨大的搜索空间,而深度学习方法的搜索空间甚至比传统机器学习算法更大。在大规模搜索空间上进行调整是一个艰巨的挑战,但AutoMM提供了多种选项,让您可以根据领域知识和计算资源的限制来指导拟合过程。
创建图像数据集¶
在本教程中,我们将再次使用来自Kaggle的Shopee-IET数据集的子集进行演示。每张图片包含一件衣物,相应的标签指定了其衣物类别。我们的数据子集包含以下可能的标签:BabyPants
, BabyShirt
, womencasualshoes
, womenchiffontop
。
我们可以通过自动下载URL数据来加载数据集:
import warnings
warnings.filterwarnings('ignore')
from datetime import datetime
from autogluon.multimodal.utils.misc import shopee_dataset
download_dir = './ag_automm_tutorial_hpo'
train_data, test_data = shopee_dataset(download_dir)
train_data = train_data.sample(frac=0.5)
print(train_data)
Downloading ./ag_automm_tutorial_hpo/file.zip from https://automl-mm-bench.s3.amazonaws.com/vision_datasets/shopee.zip...
image label
640 /home/ci/autogluon/docs/tutorials/multimodal/a... 3
354 /home/ci/autogluon/docs/tutorials/multimodal/a... 1
434 /home/ci/autogluon/docs/tutorials/multimodal/a... 2
785 /home/ci/autogluon/docs/tutorials/multimodal/a... 3
433 /home/ci/autogluon/docs/tutorials/multimodal/a... 2
.. ... ...
661 /home/ci/autogluon/docs/tutorials/multimodal/a... 3
754 /home/ci/autogluon/docs/tutorials/multimodal/a... 3
349 /home/ci/autogluon/docs/tutorials/multimodal/a... 1
727 /home/ci/autogluon/docs/tutorials/multimodal/a... 3
168 /home/ci/autogluon/docs/tutorials/multimodal/a... 0
[400 rows x 2 columns]
/home/ci/opt/venv/lib/python3.11/site-packages/mmengine/optim/optimizer/zero_optimizer.py:11: DeprecationWarning: `TorchScript` support for functional optimizers is deprecated and will be removed in a future PyTorch release. Consider using the `torch.compile` optimizer instead.
from torch.distributed.optim import \
0%| | 0.00/84.0M [00:00<?, ?iB/s]
10%|▉ | 8.38M/84.0M [00:00<00:01, 52.4MiB/s]
19%|█▉ | 16.1M/84.0M [00:00<00:01, 64.3MiB/s]
27%|██▋ | 22.9M/84.0M [00:00<00:01, 55.4MiB/s]
34%|███▍ | 28.6M/84.0M [00:00<00:01, 32.3MiB/s]
39%|███▉ | 32.8M/84.0M [00:00<00:01, 32.2MiB/s]
44%|████▎ | 36.6M/84.0M [00:00<00:01, 33.5MiB/s]
50%|████▉ | 41.9M/84.0M [00:01<00:01, 34.2MiB/s]
58%|█████▊ | 48.5M/84.0M [00:01<00:01, 35.0MiB/s]
62%|██████▏ | 52.3M/84.0M [00:01<00:01, 30.1MiB/s]
70%|██████▉ | 58.7M/84.0M [00:01<00:00, 36.4MiB/s]
78%|███████▊ | 65.3M/84.0M [00:01<00:00, 41.5MiB/s]
83%|████████▎ | 69.8M/84.0M [00:01<00:00, 38.4MiB/s]
88%|████████▊ | 73.9M/84.0M [00:01<00:00, 35.8MiB/s]
92%|█████████▏| 77.6M/84.0M [00:02<00:00, 28.8MiB/s]
99%|█████████▊| 82.8M/84.0M [00:02<00:00, 33.6MiB/s]
100%|██████████| 84.0M/84.0M [00:02<00:00, 35.9MiB/s]
这个数据集中总共有400个数据点。image
列存储实际图像的路径,label
列代表标签类别。
常规模型拟合¶
回想一下,如果我们使用Autogluon预定义的默认设置,我们可以简单地使用MultiModalPredictor
来拟合模型,只需三行代码:
from autogluon.multimodal import MultiModalPredictor
predictor_regular = MultiModalPredictor(label="label")
start_time = datetime.now()
predictor_regular.fit(
train_data=train_data,
hyperparameters = {"model.timm_image.checkpoint_name": "ghostnet_100"}
)
end_time = datetime.now()
elapsed_seconds = (end_time - start_time).total_seconds()
elapsed_min = divmod(elapsed_seconds, 60)
print("Total fitting time: ", f"{int(elapsed_min[0])}m{int(elapsed_min[1])}s")
Total fitting time: 0m55s
No path specified. Models will be saved in: "AutogluonModels/ag-20241127_103022"
=================== System Info ===================
AutoGluon Version: 1.2b20241127
Python Version: 3.11.9
Operating System: Linux
Platform Machine: x86_64
Platform Version: #1 SMP Tue Sep 24 10:00:37 UTC 2024
CPU Count: 8
Pytorch Version: 2.5.1+cu124
CUDA Version: 12.4
Memory Avail: 28.43 GB / 30.95 GB (91.9%)
Disk Space Avail: 174.90 GB / 255.99 GB (68.3%)
===================================================
AutoGluon infers your prediction problem is: 'multiclass' (because dtype of label-column == int, but few unique label-values observed).
4 unique label values: [3, 1, 2, 0]
If 'multiclass' is not the correct problem_type, please manually specify the problem_type parameter during Predictor init (You may specify problem_type as one of: ['binary', 'multiclass', 'regression', 'quantile'])
AutoMM starts to create your model. ✨✨✨
To track the learning progress, you can open a terminal and launch Tensorboard:
```shell
# Assume you have installed tensorboard
tensorboard --logdir /home/ci/autogluon/docs/tutorials/multimodal/advanced_topics/AutogluonModels/ag-20241127_103022
```
Seed set to 0
GPU Count: 1
GPU Count to be Used: 1
GPU 0 Name: Tesla T4
GPU 0 Memory: 0.43GB/15.0GB (Used/Total)
Using 16bit Automatic Mixed Precision (AMP)
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
HPU available: False, using: 0 HPUs
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
| Name | Type | Params | Mode
------------------------------------------------------------------------------
0 | model | TimmAutoModelForImagePrediction | 3.9 M | train
1 | validation_metric | MulticlassAccuracy | 0 | train
2 | loss_func | CrossEntropyLoss | 0 | train
------------------------------------------------------------------------------
3.9 M Trainable params
0 Non-trainable params
3.9 M Total params
15.627 Total estimated model params size (MB)
418 Modules in train mode
0 Modules in eval mode
Epoch 0, global step 1: 'val_accuracy' reached 0.17500 (best 0.17500), saving model to '/home/ci/autogluon/docs/tutorials/multimodal/advanced_topics/AutogluonModels/ag-20241127_103022/epoch=0-step=1.ckpt' as top 3
Epoch 0, global step 3: 'val_accuracy' reached 0.23750 (best 0.23750), saving model to '/home/ci/autogluon/docs/tutorials/multimodal/advanced_topics/AutogluonModels/ag-20241127_103022/epoch=0-step=3.ckpt' as top 3
Epoch 1, global step 4: 'val_accuracy' reached 0.25000 (best 0.25000), saving model to '/home/ci/autogluon/docs/tutorials/multimodal/advanced_topics/AutogluonModels/ag-20241127_103022/epoch=1-step=4.ckpt' as top 3
Epoch 1, global step 6: 'val_accuracy' reached 0.30000 (best 0.30000), saving model to '/home/ci/autogluon/docs/tutorials/multimodal/advanced_topics/AutogluonModels/ag-20241127_103022/epoch=1-step=6.ckpt' as top 3
Epoch 2, global step 7: 'val_accuracy' reached 0.35000 (best 0.35000), saving model to '/home/ci/autogluon/docs/tutorials/multimodal/advanced_topics/AutogluonModels/ag-20241127_103022/epoch=2-step=7.ckpt' as top 3
Epoch 2, global step 9: 'val_accuracy' reached 0.42500 (best 0.42500), saving model to '/home/ci/autogluon/docs/tutorials/multimodal/advanced_topics/AutogluonModels/ag-20241127_103022/epoch=2-step=9.ckpt' as top 3
Epoch 3, global step 10: 'val_accuracy' reached 0.46250 (best 0.46250), saving model to '/home/ci/autogluon/docs/tutorials/multimodal/advanced_topics/AutogluonModels/ag-20241127_103022/epoch=3-step=10.ckpt' as top 3
Epoch 3, global step 12: 'val_accuracy' reached 0.61250 (best 0.61250), saving model to '/home/ci/autogluon/docs/tutorials/multimodal/advanced_topics/AutogluonModels/ag-20241127_103022/epoch=3-step=12.ckpt' as top 3
Epoch 4, global step 13: 'val_accuracy' reached 0.60000 (best 0.61250), saving model to '/home/ci/autogluon/docs/tutorials/multimodal/advanced_topics/AutogluonModels/ag-20241127_103022/epoch=4-step=13.ckpt' as top 3
Epoch 4, global step 15: 'val_accuracy' reached 0.61250 (best 0.61250), saving model to '/home/ci/autogluon/docs/tutorials/multimodal/advanced_topics/AutogluonModels/ag-20241127_103022/epoch=4-step=15.ckpt' as top 3
Epoch 5, global step 16: 'val_accuracy' reached 0.61250 (best 0.61250), saving model to '/home/ci/autogluon/docs/tutorials/multimodal/advanced_topics/AutogluonModels/ag-20241127_103022/epoch=5-step=16.ckpt' as top 3
Epoch 5, global step 18: 'val_accuracy' reached 0.66250 (best 0.66250), saving model to '/home/ci/autogluon/docs/tutorials/multimodal/advanced_topics/AutogluonModels/ag-20241127_103022/epoch=5-step=18.ckpt' as top 3
Epoch 6, global step 19: 'val_accuracy' reached 0.66250 (best 0.66250), saving model to '/home/ci/autogluon/docs/tutorials/multimodal/advanced_topics/AutogluonModels/ag-20241127_103022/epoch=6-step=19.ckpt' as top 3
Epoch 6, global step 21: 'val_accuracy' reached 0.66250 (best 0.66250), saving model to '/home/ci/autogluon/docs/tutorials/multimodal/advanced_topics/AutogluonModels/ag-20241127_103022/epoch=6-step=21.ckpt' as top 3
Epoch 7, global step 22: 'val_accuracy' reached 0.70000 (best 0.70000), saving model to '/home/ci/autogluon/docs/tutorials/multimodal/advanced_topics/AutogluonModels/ag-20241127_103022/epoch=7-step=22.ckpt' as top 3
Epoch 7, global step 24: 'val_accuracy' was not in top 3
Epoch 8, global step 25: 'val_accuracy' reached 0.67500 (best 0.70000), saving model to '/home/ci/autogluon/docs/tutorials/multimodal/advanced_topics/AutogluonModels/ag-20241127_103022/epoch=8-step=25.ckpt' as top 3
Epoch 8, global step 27: 'val_accuracy' reached 0.70000 (best 0.70000), saving model to '/home/ci/autogluon/docs/tutorials/multimodal/advanced_topics/AutogluonModels/ag-20241127_103022/epoch=8-step=27.ckpt' as top 3
Epoch 9, global step 28: 'val_accuracy' reached 0.68750 (best 0.70000), saving model to '/home/ci/autogluon/docs/tutorials/multimodal/advanced_topics/AutogluonModels/ag-20241127_103022/epoch=9-step=28.ckpt' as top 3
Epoch 9, global step 30: 'val_accuracy' reached 0.72500 (best 0.72500), saving model to '/home/ci/autogluon/docs/tutorials/multimodal/advanced_topics/AutogluonModels/ag-20241127_103022/epoch=9-step=30.ckpt' as top 3
Epoch 10, global step 31: 'val_accuracy' was not in top 3
Epoch 10, global step 33: 'val_accuracy' reached 0.72500 (best 0.72500), saving model to '/home/ci/autogluon/docs/tutorials/multimodal/advanced_topics/AutogluonModels/ag-20241127_103022/epoch=10-step=33.ckpt' as top 3
Epoch 11, global step 34: 'val_accuracy' reached 0.71250 (best 0.72500), saving model to '/home/ci/autogluon/docs/tutorials/multimodal/advanced_topics/AutogluonModels/ag-20241127_103022/epoch=11-step=34.ckpt' as top 3
Epoch 11, global step 36: 'val_accuracy' was not in top 3
Epoch 12, global step 37: 'val_accuracy' reached 0.72500 (best 0.72500), saving model to '/home/ci/autogluon/docs/tutorials/multimodal/advanced_topics/AutogluonModels/ag-20241127_103022/epoch=12-step=37.ckpt' as top 3
Epoch 12, global step 39: 'val_accuracy' was not in top 3
Epoch 13, global step 40: 'val_accuracy' was not in top 3
Epoch 13, global step 42: 'val_accuracy' was not in top 3
Epoch 14, global step 43: 'val_accuracy' was not in top 3
Epoch 14, global step 45: 'val_accuracy' was not in top 3
Start to fuse 3 checkpoints via the greedy soup algorithm.
AutoMM has created your model. 🎉🎉🎉
To load the model, use the code below:
```python
from autogluon.multimodal import MultiModalPredictor
predictor = MultiModalPredictor.load("/home/ci/autogluon/docs/tutorials/multimodal/advanced_topics/AutogluonModels/ag-20241127_103022")
```
If you are not satisfied with the model, try to increase the training time,
adjust the hyperparameters (https://auto.gluon.ai/stable/tutorials/multimodal/advanced_topics/customization.html),
or post issues on GitHub (https://github.com/autogluon/autogluon/issues).
让我们检查一下拟合模型的测试准确率:
scores = predictor_regular.evaluate(test_data, metrics=["accuracy"])
print('Top-1 test acc: %.3f' % scores["accuracy"])
Top-1 test acc: 0.738
在模型拟合期间使用HPO¶
如果您希望对拟合过程有更多的控制,您可以在MultiModalPredictor
中通过简单地在hyperparameter
和hyperparameter_tune_kwargs
中添加更多选项来指定超参数优化(HPO)的各种选项。
在MultiModalPredictor中,我们有几个选项。我们在后端使用了Ray Tune tune
库,因此我们需要传入一个Tune搜索空间或一个AutoGluon搜索空间,这些将被转换为Tune搜索空间。
定义用于神经网络训练的各种
超参数
值的搜索空间:
hyperparameters = {
"optimization.learning_rate": tune.uniform(0.00005, 0.005),
"optimization.optim_type": tune.choice(["adamw", "sgd"]),
"optimization.max_epochs": tune.choice(["10", "20"]),
"model.timm_image.checkpoint_name": tune.choice(["swin_base_patch4_window7_224", "convnext_base_in22ft1k"])
}
这是一个示例,但不是详尽的列表。您可以在自定义AutoMM中找到完整的支持列表。
使用
hyperparameter_tune_kwargs
定义HPO的搜索策略。您可以传入一个字符串或初始化一个ray.tune.schedulers.TrialScheduler
对象。
-
a. Specifying how to search through your chosen hyperparameter space (supports `random` and `bayes`):
"searcher": "bayes"
-
b. Specifying how to schedule jobs to train a network under a particular hyperparameter configuration (supports `FIFO` and `ASHA`):
"scheduler": "ASHA"
-
c. Number of trials you would like to carry out HPO:
"num_trials": 20
-
d. Number of checkpoints to keep on disk per trial, see Ray documentation for more details. Must be >= 1. (default is 3):
"num_to_keep": 3
让我们使用不同的学习率和骨干模型的组合来进行HPO工作:
from ray import tune
predictor_hpo = MultiModalPredictor(label="label")
hyperparameters = {
"optimization.learning_rate": tune.uniform(0.00005, 0.001),
"model.timm_image.checkpoint_name": tune.choice(["ghostnet_100",
"mobilenetv3_large_100"])
}
hyperparameter_tune_kwargs = {
"searcher": "bayes", # random
"scheduler": "ASHA",
"num_trials": 2,
"num_to_keep": 3,
}
start_time_hpo = datetime.now()
predictor_hpo.fit(
train_data=train_data,
hyperparameters=hyperparameters,
hyperparameter_tune_kwargs=hyperparameter_tune_kwargs,
)
end_time_hpo = datetime.now()
elapsed_seconds_hpo = (end_time_hpo - start_time_hpo).total_seconds()
elapsed_min_hpo = divmod(elapsed_seconds_hpo, 60)
print("Total fitting time: ", f"{int(elapsed_min_hpo[0])}m{int(elapsed_min_hpo[1])}s")
No path specified. Models will be saved in: "AutogluonModels/ag-20241127_103119"
=================== System Info ===================
AutoGluon Version: 1.2b20241127
Python Version: 3.11.9
Operating System: Linux
Platform Machine: x86_64
Platform Version: #1 SMP Tue Sep 24 10:00:37 UTC 2024
CPU Count: 8
Pytorch Version: 2.5.1+cu124
CUDA Version: 12.4
Memory Avail: 27.49 GB / 30.95 GB (88.8%)
Disk Space Avail: 174.88 GB / 255.99 GB (68.3%)
===================================================
AutoGluon infers your prediction problem is: 'multiclass' (because dtype of label-column == int, but few unique label-values observed).
4 unique label values: [3, 1, 2, 0]
If 'multiclass' is not the correct problem_type, please manually specify the problem_type parameter during Predictor init (You may specify problem_type as one of: ['binary', 'multiclass', 'regression', 'quantile'])
Removing non-optimal trials and only keep the best one.
Start to fuse 3 checkpoints via the greedy soup algorithm.
AutoMM has created your model. 🎉🎉🎉
To load the model, use the code below:
```python
from autogluon.multimodal import MultiModalPredictor
predictor = MultiModalPredictor.load("/home/ci/autogluon/docs/tutorials/multimodal/advanced_topics/AutogluonModels/ag-20241127_103119")
```
If you are not satisfied with the model, try to increase the training time,
adjust the hyperparameters (https://auto.gluon.ai/stable/tutorials/multimodal/advanced_topics/customization.html),
or post issues on GitHub (https://github.com/autogluon/autogluon/issues).
调优状态
Current time: | 2024-11-27 10:33:02 |
Running for: | 00:01:37.90 |
Memory: | 5.2/30.9 GiB |
系统信息
Using AsyncHyperBand: num_stopped=0Bracket: Iter 4096.000: None | Iter 1024.000: None | Iter 256.000: None | Iter 64.000: None | Iter 16.000: 0.871874988079071 | Iter 4.000: 0.6593749821186066 | Iter 1.000: 0.32499998807907104
Logical resource usage: 8.0/8 CPUs, 1.0/1 GPUs (0.0/1.0 accelerator_type:T4)
试验状态
Trial name | status | loc | model.names | model.timm_image.che ckpoint_name | optimization.learnin g_rate | iter | total time (s) | val_accuracy |
---|---|---|---|---|---|---|---|---|
64b09fc1 | TERMINATED | 10.0.0.57:6256 | ('timm_image', _6b80 | mobilenetv3_lar_18e0 | 0.000156298 | 28 | 37.1824 | 0.875 |
0fdb8f17 | TERMINATED | 10.0.0.57:6456 | ('timm_image', _d6c0 | mobilenetv3_lar_18e0 | 0.000934244 | 32 | 42.2214 | 0.9125 |
试验进度
Trial name | should_checkpoint | val_accuracy |
---|---|---|
0fdb8f17 | True | 0.9125 |
64b09fc1 | True | 0.875 |
Total fitting time: 1m46s
让我们检查一下HPO后拟合模型的测试准确率:
scores_hpo = predictor_hpo.evaluate(test_data, metrics=["accuracy"])
print('Top-1 test acc: %.3f' % scores_hpo["accuracy"])
Top-1 test acc: 0.912
从训练日志中,您应该能够看到当前的最佳试验如下:
Current best trial: 47aef96a with val_accuracy=0.862500011920929 and parameters={'optimization.learning_rate': 0.0007195214018085505, 'model.timm_image.checkpoint_name': 'ghostnet_100'}
在我们简单的2次试验HPO运行后,通过搜索不同的学习率和模型,我们得到了比上一节提供的开箱即用解决方案更好的测试准确率。HPO有助于选择具有最高验证准确率的超参数组合。
Other Examples¶
You may go to AutoMM Examples to explore other examples about AutoMM.
Customization¶
To learn how to customize AutoMM, please refer to Customize AutoMM.