ray.rllib.policy.sample_batch.SampleBatch.get_single_step_input_dict#

SampleBatch.get_single_step_input_dict(view_requirements: Dict[str, ViewRequirement], index: str | int = 'last') → SampleBatch[源代码]#

在给定的索引处从 self 创建单个 ts SampleBatch。

作为模型（动作或价值函数）调用的输入字典使用。

参数:

view_requirements – 从模型中获取的视图需求字典，用于生成 input_dict。
index – 一个整数索引值，指示在轨迹中生成 compute_actions 输入字典的位置。设置为 “last” 以在轨迹的最后生成字典（例如，用于价值估计）。请注意，”last” 与 -1 不同，因为 “last” 将使用最终的 NEXT_OBS 作为观察输入。

返回:

ModelV2 调用的（单时间步）输入字典。