快速入门

Autodistill 使您能够使用大型基础视觉模型自动标记数据，以训练小型、微调过的视觉模型。这个过程称为蒸馏。

您经过微调的模型将运行得更小、更快，因此更适合在边缘设备上部署。

Autodistill 如何工作¶

Autodistill中有两个主要概念：

一个 基础模型，用于自动标记数据。示例包括 Grounding DINO、Grounded SAM 和 CLIP。
一个目标模型，该模型是在自动标记的数据上训练的。示例包括 YOLOv5、YOLOv8 和 DETR。

如果您想标记数据并运行自己的训练，可以仅使用基础模型来使用Autodistill。

您还可以使用 Autodistill 同时使用基础模型和目标模型来构建端到端的标记和训练管道。

提炼模型（教程）¶

提示

请参阅演示笔记本以快速了解autodistill。此笔记本演示了如何构建一个无标签的牛奶容器检测模型。

如果您想直接跳到完整代码，而不需要教程，请转到代码摘要部分。

让我们提炼一个模型以了解Autodistill是如何工作的。我们将使用Autodistill自动标记一个牛奶瓶数据集。

在这个例子中，我们将展示如何使用 autodistill-grounded-sam 和 autodistill-yolov8 将 GroundedSAM 蒸馏成一个小型 YOLOv8 模型。

步骤 #1: 安装 Autodistill 和模型¶

首先，安装所需的依赖项：

pip install autodistill autodistill-grounded-sam autodistill-yolov8

提示

查看 Autodistill 支持的模型列表，获取所有支持的模型的完整列表。

NoneNone

步骤 #2: 设置本体¶

每个基础模型需要本体。一个本体告诉Autodistill你想要识别什么，以及在你的数据集中应该使用哪些标签。

例如，如果你想识别牛奶瓶，你可以使用以下本体：

{
    "milk bottle": "bottle",
    "milk bottle cap": "bottle cap"
}

该本体将告诉Autodistill识别牛奶瓶和牛奶瓶盖，并将标签保存为 bottle 和 bottle cap 在您的数据集中。

步骤 #3: 设置模型¶

让我们建立我们的模型。创建一个新的Python文件并添加以下代码：

from autodistill_grounded_sam import GroundedSAM
from autodistill.detection import CaptionOntology
from autodistill_yolov8 import YOLOv8
from autodistill.utils import plot
import cv2

# define an ontology to map class names to our GroundingDINO prompt
# the ontology dictionary has the format {caption: class}
# where caption is the prompt sent to the base model, and class is the label that will
# be saved for that caption in the generated annotations
base_model = GroundedSAM(ontology=CaptionOntology({"milk bottle": "bottle", "milk bottle cap": "bottle cap"}))

步骤 #4：测试基础模型¶

我们可以使用 predict 函数测试我们的基础模型：

results = base_model.predict("milk.jpg")

plot(
    image=cv2.imread("milk.jpg"),
    classes=base_model.ontology.classes(),
    detections=results
)

步骤 #5: 标记一个数据集¶

现在我们有了一个基础模型，可以用它给数据集打标签。你可以使用以下代码标记数据集：

base_model.label_folder(
    input_folder="./images",
    output_folder="./labeled-images"
)

步骤 #6: 训练目标模型¶

我们可以使用像 YOLOv8 这样的目标模型在我们的标签数据集上训练模型。您可以使用以下代码训练目标模型：

target_model = YOLOv8("yolov8n.pt")
target_model.train("./labeled-images/data.yaml", epochs=200)

您的模型权重将保存在一个名为 runs 的文件夹中。

对于YOLOv8模型，您可以使用ultralytics Python包在本地运行推断，或者将您的模型部署到Roboflow。

代码摘要¶

这是我们上述使用的所有代码，汇总成一个代码片段：

from autodistill_grounded_sam import GroundedSAM
from autodistill.detection import CaptionOntology
from autodistill_yolov8 import YOLOv8

# define an ontology to map class names to our GroundingDINO prompt
# the ontology dictionary has the format {caption: class}
# where caption is the prompt sent to the base model, and class is the label that will
# be saved for that caption in the generated annotations

base_model = GroundedSAM(ontology=CaptionOntology({"milk bottle": "bottle", "milk bottle cap": "bottle cap"}))

results = base_model.predict("milk.jpg")

base_model.label_folder(
    input_folder="./images",
    output_folder="./labeled-images"
)

target_model = YOLOv8("yolov8n.pt")
target_model.train("./labeled-images/data.yaml", epochs=200)

接下来的步骤¶

以上，我们使用Autodistill对数据集进行了标注。接下来，请根据我们的指导在我应该使用哪个模型？指南中探索Autodistill模型的生态系统。该网站包含所有Autodistill模型的文档，以及您可以使用的与每个模型一起工作的实用工具。