编译管道

使用 KFP SDK 定义和编译一个基本的管道。

概述

提交一个管道以供执行,您必须使用KFP SDK编译器将其编译为YAML。

在下面的示例中,编译器创建一个名为 pipeline.yaml 的文件,该文件包含您管道的密闭表示。 输出被称为 中间表示 (IR) YAML,这是一个序列化的 PipelineSpec 协议缓冲区消息。

from kfp import compiler, dsl

@dsl.component
def comp(message: str) -> str:
    print(message)
    return message

@dsl.pipeline
def my_pipeline(message: str) -> str:
    """My ML pipeline."""
    return comp(message=message).output

compiler.Compiler().compile(my_pipeline, package_path='pipeline.yaml')

因为组件实际上是管道,你也可以将它们编译为 IR YAML:

@dsl.component
def comp(message: str) -> str:
    print(message)
    return message

compiler.Compiler().compile(comp, package_path='component.yaml')

您可以在 GitHub 上查看一个 IR YAML 示例。 文件的内容并不旨在供人类阅读,但文件顶部的注释提供了管道的摘要:

# PIPELINE DEFINITION
# Name: my-pipeline
# Description: My ML pipeline.
# Inputs:
#    message: str
# Outputs:
#    Output: str
...

类型检查

默认情况下,DSL 编译器对您的管道进行静态类型检查,以确保传递数据的组件之间的一致性。静态类型检查有助于在不运行管道的情况下识别组件输入/输出不一致,从而缩短开发迭代。

具体来说,类型检查器检查组件输入期望的数据类型与提供的数据类型之间的类型相等性。有关 KFP 数据类型的更多信息,请参见 Data Types

例如,对于参数,列表输入可能只能传递给带有 typing.List 注解的参数。类似地,浮点数可能只能传递给带有 float 注解的参数。

输入数据类型和注释也必须与工件匹配,只有一个例外:Artifact 类型与所有其他工件类型兼容。 从这个意义上讲,Artifact 类型既是默认工件类型,也是工件的“任意”类型。

在以下部分中,您可以禁用类型检查。

编译器参数

Compiler.compile 方法接受以下参数:

名称类型描述
pipeline_funcfunctionRequired
Pipeline function constructed with the @dsl.pipeline or component constructed with the @dsl.component decorator.
package_pathstringRequired
Output YAML file path. For example, ~/my_pipeline.yaml or ~/my_component.yaml.
pipeline_namestringOptional
If specified, sets the name of the pipeline template in the pipelineInfo.name field in the compiled IR YAML output. Overrides the name of the pipeline or component specified by the name parameter in the @dsl.pipeline decorator.
pipeline_parametersDict[str, Any]Optional
Map of parameter names to argument values. This lets you provide default values for pipeline or component parameters. You can override these default values during pipeline submission.
type_checkboolOptional
Indicates whether static type checking is enabled during compilation.

IR YAML

IR YAML是编译后的管道或组件的中间表示。它是PipelineSpec协议缓冲区消息类型的实例,属于平台无关的管道表示协议。它被视为一种中间表示,因为KFP后端将PipelineSpec编译为Argo Workflow YAML,作为执行的最终管道定义。

与 v1 组件 YAML 不同,IR YAML 并不打算直接编写。虽然 IR YAML 并不是为了使人类易于阅读,但如果你对其内容有一些了解,仍然可以对其进行检查:

章节描述示例
componentsThis section is a map of the names of all components used in the pipeline to ComponentSpec. ComponentSpec defines the interface, including inputs and outputs, of a component.
For primitive components, ComponentSpec contains a reference to the executor containing the component implementation.

For pipelines used as components, ComponentSpec contains a DagSpec instance, which includes references to the underlying primitive components.
View on Github
deployment_specThis section contains a map of executor name to ExecutorSpec. ExecutorSpec contains the implementation for a primitive component.View on Github
rootThis section defines the steps of the outermost pipeline definition, also called the pipeline root definition. The root definition is the workflow executed when you submit the IR YAML. It is an instance of ComponentSpec.View on Github
pipeline_info This section contains pipeline metadata, including the pipelineInfo.name field. This field contains the name of your pipeline template. When you upload your pipeline, a pipeline context name is created based on this template name. The pipeline context lets the backend and the dashboard associate artifacts and executions from pipeline runs using the pipeline template. You can use a pipeline context to determine the best model by comparing metrics and artifacts from multiple pipeline runs based on the same training pipeline.View on Github
sdk_versionThis section records the version of the KFP SDK used to compile the pipeline.View on Github
schema_versionThis section records the version of the PipelineSpec schema used for the IR YAML.View on Github
default_pipeline_rootThis section records the remote storage root path, such as a MinIO URI or Google Cloud Storage URI, where the pipeline output is written.View on Github

反馈

此页面有帮助吗?