RewardFunctionsConfig (GRPO)
以下是奖励函数配置的类定义。
class RewardFunctionsConfig(BaseModel):
runtime: RewardFunctionsRuntimeConfig | None = Field(default=None)
functions: dict[str, RewardFunction] = Field(default_factory=dict)
class RewardFunctionsRuntimeConfig(BaseModel):
packages: list[str] | None = Field(default=None)
class RewardFunction(BaseModel):
encoded_fn: str = Field(serialization_alias="encodedFn")
此配置用于设置GRPO训练所需的奖励函数,并指定运行奖励函数需要安装的软件包依赖项。例如:
def my_reward_function(prompt, completion, example) -> float:
import my_pkg
my_pkg.score(...)
cfg = RewardFunctionsConfig(
runtime=RewardFunctionsRuntimeConfig(
packages=[
"mypkg",
]
),
functions={
"my_reward": RewardFunction.from_callable(my_reward_function)
},
)