使用MLflow

MLflow 是一个用于跟踪机器学习结果的图形化工具。PyKEEN 将 MLflow 集成到管道和 HPO 管道中。

要使用它，您首先需要使用pip install mlflow安装MLflow，并在后台运行mlflow ui。更多信息可以在MLflow快速入门中找到。默认情况下，它将在http://localhost:5000运行。

管道示例

此示例展示了如何将MLflow与pykeen.pipeline.pipeline()函数一起使用。在result_tracker_kwargs中，至少需要tracking_uri和experiment_name。

from pykeen.pipeline import pipeline

pipeline_result = pipeline(
    model='RotatE',
    dataset='Kinships',
    result_tracker='mlflow',
    result_tracker_kwargs=dict(
        tracking_uri='http://localhost:5000',
        experiment_name='Tutorial Training of RotatE on Kinships',
    ),
)

如果您导航到http://localhost:5000的MLflow UI，您会看到实验出现在左侧栏中。

如果您点击实验，您将看到以下内容：

HPO 示例

此示例展示了如何将MLflow与pykeen.hpo.hpo_pipeline()函数一起使用。

from pykeen.hpo import hpo_pipeline

pipeline_result = hpo_pipeline(
    model='RotatE',
    dataset='Kinships',
    result_tracker='mlflow',
    result_tracker_kwargs=dict(
        tracking_uri='http://localhost:5000',
        experiment_name='Tutorial HPO Training of RotatE on Kinships',
    ),
)

对于这个例子，可以通过MLflow进行相同的导航。

重用实验

在MLflow UI中，您会看到实验被分配了一个ID。这意味着您可以使用experiment_id关键字参数来重新使用相同的ID，将不同的子实验分组在一起，而不是使用experiment_name。

from pykeen.pipeline import pipeline

experiment_id = 4  # if doesn't already exist, will throw an error!
pipeline_result = pipeline(
    model='RotatE',
    dataset='Kinships',
    result_tracker='mlflow'
    result_tracker_kwargs=dict(
        tracking_uri='http://localhost:5000',
        experiment_id=4,
    ),
)

添加标签

标签是您可能希望添加到实验并存储在MLflow中的额外键/值信息。默认情况下，MLflow会添加列在https://www.mlflow.org/docs/latest/tracking.html#id41上的标签。

例如，如果您正在使用自定义输入，您可能希望添加生成结果的输入文件版本，如下所示：

from pykeen.pipeline import pipeline

data_version = ...

pipeline_result = pipeline(
    model='RotatE',
    training=...,
    testing=...,
    validation=...,
    result_tracker='mlflow',
    result_tracker_kwargs=dict(
        tracking_uri='http://localhost:5000',
        experiment_name='Tutorial Training of RotatE on Kinships',
        tags={
            "data_version": md5_hash,
        },
    ),
)

有关有效关键字参数的额外文档可以在 pykeen.trackers.MLFlowResultTracker 下找到。