配置参考

`[evaluations.evaluation_name]`

配置文件的evaluations子部分定义了TensorZero中评估的行为。您可以通过包含多个[evaluations.evaluation_name]部分来定义多个评估。

如果您的evaluation_name不是基础字符串，可以用引号进行转义。例如，基础字符串中不允许包含句点，因此您可以将名为foo.bar的评估定义为[evaluations."foo.bar"]。

[evaluations.email-guardrails]
# ...

`type`

类型： Literal "static" (后续可能会在此处添加其他选项)
必填项： 是

`function_name`

类型： string
必填项： 是

这应该是网关配置中[functions]部分定义的函数名称。该值设置运行此评估时应评估哪个函数。

`[evaluations.evaluation_name.evaluators.evaluator_name]`

evaluators子部分定义了特定评估器的行为，该评估器将作为其父级评估的一部分运行。您可以通过包含多个[evaluations.evaluation_name.evaluators.evaluator_name]部分来定义多个评估器。

如果您的evaluator_name不是基本字符串，可以用引号进行转义。例如，基本字符串中不允许使用句点，因此您可以将includes.jpg定义为[evaluations.evaluation_name.evaluators."includes.jpg"]。

[evaluations.email-guardrails]
# ...

[evaluations.email-guardrails.evaluators."includes.jpg"]
# ...

[evaluations.email-guardrails.evaluators.check-signature]
# ...

`type`

类型： string
必填项： 是

定义评估器的类型。

TensorZero 目前支持以下变体类型：

类型	描述
`llm_judge`	Use a TensorZero function as a judge
`exact_match`	Evaluates whether the generated output exactly matches the reference output (skips the datapoint if unavailable).

[evaluations.email-guardrails.evaluators.check-signature]
# ...
type = "llm_judge"
# ...

type: "exact_match"

`cutoff`

类型： float
必填: 否

设置一个用户自定义的阈值，当测试结果达到该阈值时即视为通过。这对于将评估作为自动化测试运行的应用场景非常有用。如果该评估器的平均值低于设定的截止值，评估二进制文件将返回非零状态码。

type: "llm_judge"

`input_format`

类型： 字符串
必填： 否（默认值：serialized）

定义提供给LLM评判者的输入格式。

serialized: 将输入消息、生成的输出和参考输出（如果包含）作为单个序列化字符串传递。
messages: 将输入消息、生成的输出和参考输出（如果包含）作为对话历史中的独立消息传递。

[evaluations.email-guardrails.evaluators.check-signature]
# ...
type = "llm_judge"
input_format = "messages"
# ...

`output_type`

类型： string
必填项： 是

定义LLM评判器评估结果的预期数据类型。

float: 评判者预期返回一个浮点数。
boolean: 预期判断函数将返回一个布尔值。

[evaluations.email-guardrails.evaluators.check-signature]
# ...
type = "llm_judge"
output_type = "float"
# ...

`include.reference_output`

类型： boolean
必填： 否（默认值：false）

如果设置为true，与评估数据点关联的参考输出将被包含在提供给LLM评判者的输入中。在这种情况下，对于没有参考输出的数据点，评估运行将不会执行此评估器。

[evaluations.email-guardrails.evaluators.check-signature]
# ...
type = "llm_judge"
include = { reference_output = true }
# ...

`optimize`

类型： string
必填项： 是

定义由LLM评判生成的指标应最大化还是最小化。

max: 数值越大越好。
min: 数值越小越好。

[evaluations.email-guardrails.evaluators.check-signature]
# ...
type = "llm_judge"
optimize = "max"
# ...

`cutoff`

类型： float
必填: 否

设置用户定义的测试通过阈值。这对于将评估作为自动化测试运行的应用场景很有用。如果该评估器的平均值低于截止值（当optimize为max时）或高于截止值（当optimize为min时），评估二进制文件将返回非零状态码。

[evaluations.email-guardrails.evaluators.check-signature]
# ...
type = "llm_judge"
optimize = "max" # Example: Maximize score
cutoff = 0.8 # Example: Consider passing if average score is >= 0.8
# ...

`[evaluations.evaluation_name.evaluators.evaluator_name.variants.variant_name]`

LLM Judge评估器定义了一个TensorZero函数，用于评判另一个TensorZero函数的输出。因此，普通TensorZero函数支持的所有变体类型同样适用于LLM作为评判者——包括我们所有的推理时优化。

您可以在此区块中包含标准的变体配置，但需进行两处修改：

无需为每个变体分配weight权重，只需将单个变体标记为active激活状态即可。
对于chat_completion变体，我们不需要system_template，而是要求将system_instructions作为文本文件提供，且不接受其他模板。

此处我们仅列出与标准TensorZero函数配置不同的变体配置项。其余选项请参阅变体配置参考。

[evaluations.email-guardrails.evaluators.check-signature]
# ...
type = "llm_judge"
optimize = "max"

[evaluations.email-guardrails.evaluators.check-signature.variants."claude3.5sonnet"]
type = "chat_completion"
model = "anthropic::claude-3-5-sonnet-20241022"
temperature = 0.1
system_instructions = "./evaluations/email-guardrails/check-signature/system_instructions.txt"
# ... other chat completion configuration ...

[evaluations.email-guardrails.evaluators.check-signature.variants."mix3claude3.5sonnet"]
active = true  # if we run the `email-guardrails` evaluation, this is the variant we'll use for the check-signature evaluator
type = "experimental_mixture_of_n"
candidates = ["claude3.5sonnet", "claude3.5sonnet", "claude3.5sonnet"]

`active`

类型: 布尔值
必填项：如果只配置了一个变体，则默认为true。否则，必须且只能有一个变体将此字段设置为true。

设置评估运行时应使用哪个变体。

[evaluations.email-guardrails.evaluators.check-signature]
# ...

[evaluations.email-guardrails.evaluators.check-signature.variants."mix3claude3.5sonnet"]
active = true # if we run the `email-guardrails` evaluation, this is the variant we'll use for the check-signature evaluator
type = "experimental_mixture_of_n"

`system_instructions`

类型： string (路径)
必填: 是

定义系统指令文件的路径。该路径相对于配置文件。

该文件应包含一个文本文件，其中包含LLM评判者的系统指令。这些指令应指导评判者输出一个浮点数或布尔值。我们使用JSON模式来确保评判者返回一个JSON对象，格式为{"thinking": "", "score": }，并根据评估器的output_type进行配置。

Evaluate if the text follows the haiku structure of exactly three lines with a 5-7-5 syllable pattern, totaling 17 syllables. Verify only this specific syllable structure of a haiku without making content assumptions.

[evaluations.email-guardrails.evaluators.check-signature]
# ...
system_instructions = "./evaluations/email-guardrails/check-signature/claude_35_sonnet/system_instructions.txt"
# ...

配置参考

`[evaluations.evaluation_name]`

`type`

`function_name`

`[evaluations.evaluation_name.evaluators.evaluator_name]`

`type`

`type: "exact_match"`

`cutoff`

`type: "llm_judge"`

`input_format`

`output_type`

`include.reference_output`

`optimize`

`cutoff`

`[evaluations.evaluation_name.evaluators.evaluator_name.variants.variant_name]`

`active`

`system_instructions`