Table of Contents

Shortcuts

视频解码器¶

class torchcodec.decoders.VideoDecoder(source: Union[str, Path, bytes, Tensor], *, stream_index: Optional[int] = None, dimension_order: Literal['NCHW', 'NHWC'] = 'NCHW', num_ffmpeg_threads: int = 1, device: Optional[Union[str, device]] = 'cpu')[source]¶

一个单流视频解码器。

这个解码器总是对视频进行扫描。

Parameters:

source (str, Pathlib.path, torch.Tensor, 或 bytes) –
视频的来源。
- 如果是 str 或 Pathlib.path：本地视频文件的路径。
- 如果是 bytes 对象或 torch.Tensor：原始编码的视频数据。
stream_index (int, optional) – 指定从视频中的哪个流解码帧。请注意，此索引在所有媒体类型中是绝对的。如果未指定，则使用最佳流。
dimension_order (str, optional) –
解码帧的维度顺序。这可以是“NCHW”（默认）或“NHWC”，其中N是批次大小，C是通道数，H是帧的高度，W是帧的宽度。 .. 注意:
```
帧在底层的FFmpeg实现中是以NHWC格式原生解码的。将这些转换为NCHW格式是一个无需复制的廉价操作，允许这些帧使用`torchvision transforms
`_进行转换。
```
num_ffmpeg_threads (int, 可选) – 用于解码的线程数。如果您并行运行多个VideoDecoder实例，使用1进行单线程解码可能是最佳选择。如果您运行单个VideoDecoder实例，使用更高的数字进行多线程解码是最佳选择。传递0让FFmpeg决定线程数。默认值：1。
device (str 或 torch.device, 可选) – 用于解码的设备。默认值：“cpu”。

Variables:

metadata (VideoStreamMetadata) – 视频流的元数据。
stream_index (int) – 该解码器从中检索帧的流索引。如果在初始化时提供了流索引，则此值相同。如果未指定，则为最佳流。

使用 VideoDecoder 的示例：

使用CUDA和NVDEC在GPU上加速视频解码

Accelerated video decoding on GPUs with CUDA and NVDEC

使用VideoDecoder解码视频

Decoding a video with VideoDecoder

如何采样视频片段

How to sample video clips

__getitem__(key: Union[Integral, slice]) → Tensor[source]¶

返回指定索引或范围内的帧或帧序列作为张量。

Parameters:: key (int 或 slice) – 要检索的帧的索引或范围。
Returns:: 给定索引或范围内的帧或帧。
Return type:: torch.Tensor

get_frame_at(index: int) → 框架[source]¶

返回给定索引处的单个帧。

Parameters:: index (int) – 要检索的帧的索引。
Returns:: 给定索引处的帧。
Return type:: Frame

get_frame_played_at(seconds: float) → 框架[source]¶

返回在给定时间戳（以秒为单位）播放的单个帧。

Parameters:: seconds (float) – 播放帧时的时间戳，单位为秒。
Returns:: 在seconds播放的帧。
Return type:: Frame

get_frames_at(indices: list[int]) → FrameBatch[source]¶

返回给定索引处的帧。

注意

调用此方法比重复单独调用get_frame_at()更高效。此方法确保不会两次解码同一帧，并且避免了“向后查找”操作，这些操作速度较慢。

Parameters:: indices (list of int) – 要检索的帧的索引。
Returns:: 给定索引处的帧。
Return type:: FrameBatch

get_frames_in_range(start: int, stop: int, step: int = 1) → FrameBatch[source]¶

返回给定索引范围内的多个帧。

帧在 [开始, 停止) 范围内。

Parameters:

start (int) – 要检索的第一帧的索引。
stop (int) – 索引范围的结束（不包含在内，按照Python的惯例）。
步长 (int, 可选) – 帧之间的步长。默认值：1。

Returns:

指定范围内的帧。

Return type:

get_frames_played_at(seconds: list[float]) → FrameBatch[source]¶

返回在给定时间戳（以秒为单位）播放的帧。

注意

调用此方法比重复调用get_frame_played_at()更高效。此方法确保不会两次解码同一帧，并且避免了“向后查找”操作，这些操作较慢。

Parameters:: seconds (list of float) – 帧播放时的时间戳，单位为秒。
Returns:: 在seconds播放的帧。
Return type:: FrameBatch

get_frames_played_in_range(start_seconds: float, stop_seconds: float) → FrameBatch[source]¶

返回给定范围内的多个帧。

帧位于半开区间 [start_seconds, stop_seconds)。每个返回帧的pts，以秒为单位，位于半开区间内。

Parameters:

start_seconds (float) – 范围开始的时间，以秒为单位。
stop_seconds (float) – 时间，以秒为单位，表示范围的结束。作为一个半开区间，结束时间不包括在内。

Returns:

指定范围内的帧。

Return type: