跳至主要内容

SamplingParamsConfig (GRPO)

以下是 Sampling Params Config 的类定义。

class SamplingParamsConfig(BaseModel):
temperature: float | None = Field(default=None) # non-negative float, default is 0.9
top_p: float | None = Field(default=None) # value in (0, 1], default is 1.0
top_k: int | None = Field(default=None) # non-negative integer or -1 to consider all tokens, default is -1
max_tokens: int | None = Field(default=None) # non-negative integer, default is 1024

这些参数用于控制GRPO微调过程中的采样过程,模型会为给定提示生成候选补全。