StreamingMediaDecoder¶

class torio.io.StreamingMediaDecoder(src: Union[str, Path, BinaryIO], format: Optional[str] = None, option: Optional[Dict[str, str]] = None, buffer_size: int = 4096)[源代码]¶

已弃用

警告

此类从 2.8 版本开始已弃用。它将在 2.9 版本中被移除。此弃用是大型重构工作的一部分，旨在将 TorchAudio 转移到维护阶段。PyTorch 的音频和视频解码和编码功能正在整合到 TorchCodec 中。更多信息请参阅 https://github.com/pytorch/audio/issues/3902。

逐块获取和解码音视频流。

有关此类详细用法，请参阅教程。

参数 (Args)

src (str, 路径类, bytes 或文件类对象): 媒体源。
如果为字符串类型，则必须是 FFmpeg 可以处理的资源指示符。这包括文件路径、URL、设备标识符或过滤器表达式。支持的值取决于系统中找到的 FFmpeg。

如果为 bytes，则必须是连续内存中的编码媒体数据。

如果为文件类对象，则必须支持具有 read 方法，其签名是 read(size: int) -> bytes。此外，如果文件类对象具有 seek 方法，则在解析媒体元数据时会使用该方法。这提高了编解码器检测的可靠性。seek 方法的签名必须是 seek(offset: int, whence: int) -> int。

请参考以下内容了解 read 和 seek 方法的预期签名和行为。

https://docs.pythonlang.cn/3/library/io.html#io.BufferedIOBase.read

https://docs.pythonlang.cn/3/library/io.html#io.IOBase.seek

format (str 或 None，可选)
覆盖输入格式，或指定源声音设备。默认值：None（无覆盖或设备输入）。

此参数用于两种不同的用例。

覆盖源格式。当输入数据不包含头部时，此功能很有用。

指定输入源设备。这允许从硬件设备（如麦克风、摄像头和屏幕）或虚拟设备加载媒体流。

注意

此选项大致对应于 ffmpeg 命令的 -f 选项。有关可能的值，请参阅 FFmpeg 文档。

https://ffmpeg.net.cn/ffmpeg-formats.html#Demuxers

请使用 get_demuxers() 列出当前环境中可用的解复用器。

对于设备访问，可用值因硬件（AV 设备）和软件配置（ffmpeg 构建）而异。

https://ffmpeg.net.cn/ffmpeg-devices.html#Input-Devices

请使用 get_input_devices() 列出当前环境中可用的输入设备。

option (str 到 str 的 dict, optional)
初始化格式上下文（打开源）时传递的自定义选项。

您可以使用此参数在将输入源传递给解码器之前对其进行修改。

默认值：None。

buffer_size (int)
内部缓冲区大小（字节）。仅当 src 为文件类对象时使用。

默认值：4096。

属性¶

default_audio_stream¶

property StreamingMediaDecoder.default_audio_stream¶

默认音频流的索引。None 表示没有音频流

类型: Optional[int]

default_video_stream¶

property StreamingMediaDecoder.default_video_stream¶

默认视频流的索引。None 表示没有视频流

类型: Optional[int]

num_out_streams¶

property StreamingMediaDecoder.num_out_streams¶

客户端代码配置的输出流数量。

类型: int

num_src_streams¶

property StreamingMediaDecoder.num_src_streams¶

提供的媒体源中找到的流数量。

类型: int

方法¶

add_audio_stream¶

StreamingMediaDecoder.add_audio_stream(frames_per_chunk: int, buffer_chunk_size: int = 3, *, stream_index: Optional[int] = None, decoder: Optional[str] = None, decoder_option: Optional[Dict[str, str]] = None, filter_desc: Optional[str] = None)[源代码]¶

添加输出音频流

参数

frames_per_chunk (int) –
每块返回的帧数。如果源流在缓冲足够帧数之前耗尽，则块将按原样返回。

提供 -1 会禁用分块，而 pop_chunks() 方法会将所有缓冲的帧连接起来并返回。
buffer_chunk_size (int, optional) –
内部缓冲区大小。当缓冲的块数超过此数字时，旧帧将被丢弃。例如，如果 frames_per_chunk 为 5 且 buffer_chunk_size 为 3，则丢弃早于 15 的帧。提供 -1 会禁用此行为。

默认值：3。
stream_index (int 或 None, optional) – 源音频流索引。如果省略，则使用 default_audio_stream。
decoder (str 或 None, optional) –
要使用的解码器名称。提供时，使用指定的解码器而不是默认解码器。

要列出可用解码器，请为音频使用 get_audio_decoders()，为视频使用 get_video_decoders()。

默认值：None。
decoder_option (dict 或 None, optional) –
传递给解码器的选项。从 str 到 str 的映射。(默认值：None)

要列出解码器的解码器选项，可以使用 ffmpeg -h decoder=<DECODER> 命令。

除了特定于解码器的选项外，您还可以传递与多线程相关的选项。这些选项仅在解码器支持它们时才有效。如果两者都未提供，StreamingMediaDecoder 默认为单线程。

"threads"：线程数（以 str 表示）。提供值 "0" 将让 FFmpeg 根据其启发式方法进行决定。

"thread_type"：要使用的多线程方法。有效值为 "frame" 或 "slice"。请注意，每个解码器支持的方法集不同。如果未提供，则使用默认值。
- "frame"：一次解码一个以上的帧。每个线程处理一个帧。这将增加每个线程一帧的解码延迟。
- "slice"：一次解码单个帧的一个以上部分。
filter_desc (str 或 None, optional) – 过滤器描述。可用过滤器的列表可在 https://ffmpeg.net.cn/ffmpeg-filters.html 找到。请注意，不支持复杂过滤器。

add_basic_audio_stream¶

StreamingMediaDecoder.add_basic_audio_stream(frames_per_chunk: int, buffer_chunk_size: int = 3, *, stream_index: Optional[int] = None, decoder: Optional[str] = None, decoder_option: Optional[Dict[str, str]] = None, format: Optional[str] = 'fltp', sample_rate: Optional[int] = None, num_channels: Optional[int] = None)[源代码]¶

添加输出音频流

参数

frames_per_chunk (int) –
每块返回的帧数。如果源流在缓冲足够帧数之前耗尽，则块将按原样返回。

提供 -1 会禁用分块，而 pop_chunks() 方法会将所有缓冲的帧连接起来并返回。
buffer_chunk_size (int, optional) –
内部缓冲区大小。当缓冲的块数超过此数字时，旧帧将被丢弃。例如，如果 frames_per_chunk 为 5 且 buffer_chunk_size 为 3，则丢弃早于 15 的帧。提供 -1 会禁用此行为。

默认值：3。
stream_index (int 或 None, optional) – 源音频流索引。如果省略，则使用 default_audio_stream。
decoder (str 或 None, optional) –
要使用的解码器名称。提供时，使用指定的解码器而不是默认解码器。

要列出可用解码器，请为音频使用 get_audio_decoders()，为视频使用 get_video_decoders()。

默认值：None。
decoder_option (dict 或 None, optional) –
传递给解码器的选项。从 str 到 str 的映射。(默认值：None)

要列出解码器的解码器选项，可以使用 ffmpeg -h decoder=<DECODER> 命令。

除了特定于解码器的选项外，您还可以传递与多线程相关的选项。这些选项仅在解码器支持它们时才有效。如果两者都未提供，StreamingMediaDecoder 默认为单线程。

"threads"：线程数（以 str 表示）。提供值 "0" 将让 FFmpeg 根据其启发式方法进行决定。

"thread_type"：要使用的多线程方法。有效值为 "frame" 或 "slice"。请注意，每个解码器支持的方法集不同。如果未提供，则使用默认值。
- "frame"：一次解码一个以上的帧。每个线程处理一个帧。这将增加每个线程一帧的解码延迟。
- "slice"：一次解码单个帧的一个以上部分。
format (str, optional) –
输出采样格式（精度）。

如果为 None，则输出块的 dtype 与源音频的精度相对应。

否则，采样将被转换，并且输出 dtype 将更改如下。
- "u8p"：输出为 torch.uint8 类型。
- "s16p"：输出为 torch.int16 类型。
- "s32p"：输出为 torch.int32 类型。
- "s64p"：输出为 torch.int64 类型。
- "fltp"：输出为 torch.float32 类型。
- "dblp"：输出为 torch.float64 类型。
默认值："fltp"。
sample_rate (int 或 None, optional) – 如果提供，则重采样音频。
num_channels (int，或 None，optional) – 如果提供，则更改通道数。

add_basic_video_stream¶

StreamingMediaDecoder.add_basic_video_stream(frames_per_chunk: int, buffer_chunk_size: int = 3, *, stream_index: Optional[int] = None, decoder: Optional[str] = None, decoder_option: Optional[Dict[str, str]] = None, format: Optional[str] = 'rgb24', frame_rate: Optional[int] = None, width: Optional[int] = None, height: Optional[int] = None, hw_accel: Optional[str] = None)[源代码]¶

添加输出视频流

参数

frames_per_chunk (int) –
每块返回的帧数。如果源流在缓冲足够帧数之前耗尽，则块将按原样返回。

提供 -1 会禁用分块，而 pop_chunks() 方法会将所有缓冲的帧连接起来并返回。
buffer_chunk_size (int, optional) –
内部缓冲区大小。当缓冲的块数超过此数字时，旧帧将被丢弃。例如，如果 frames_per_chunk 为 5 且 buffer_chunk_size 为 3，则丢弃早于 15 的帧。提供 -1 会禁用此行为。

默认值：3。
stream_index (int 或 None, optional) – 源视频流索引。如果省略，则使用 default_video_stream。
decoder (str 或 None, optional) –
要使用的解码器名称。提供时，使用指定的解码器而不是默认解码器。

要列出可用解码器，请为音频使用 get_audio_decoders()，为视频使用 get_video_decoders()。

默认值：None。
decoder_option (dict 或 None, optional) –
传递给解码器的选项。从 str 到 str 的映射。(默认值：None)

要列出解码器的解码器选项，可以使用 ffmpeg -h decoder=<DECODER> 命令。

除了特定于解码器的选项外，您还可以传递与多线程相关的选项。这些选项仅在解码器支持它们时才有效。如果两者都未提供，StreamingMediaDecoder 默认为单线程。

"threads"：线程数（以 str 表示）。提供值 "0" 将让 FFmpeg 根据其启发式方法进行决定。

"thread_type"：要使用的多线程方法。有效值为 "frame" 或 "slice"。请注意，每个解码器支持的方法集不同。如果未提供，则使用默认值。
- "frame"：一次解码一个以上的帧。每个线程处理一个帧。这将增加每个线程一帧的解码延迟。
- "slice"：一次解码单个帧的一个以上部分。
format (str, optional) –
更改图像通道的格式。有效值为：
- "rgb24"：8 位 * 3 通道 (R, G, B)
- "bgr24"：8 位 * 3 通道 (B, G, R)
- "yuv420p"：8 位 * 3 通道 (Y, U, V)
- "gray"：8 位 * 1 通道
默认值："rgb24"。
frame_rate (int 或 None, optional) – 如果提供，则更改帧率。
width (int 或 None, optional) – 如果提供，则更改图像宽度。单位：像素。
height (int 或 None, optional) – 如果提供，则更改图像高度。单位：像素。
hw_accel (str 或 None, optional) –
启用硬件加速。

当视频在 CUDA 硬件上解码时，例如 decoder=”h264_cuvid”，将 CUDA 设备指示符传递给 hw_accel（即 hw_accel=”cuda:0”）将使 StreamingMediaDecoder 将生成的帧直接放置在指定的 CUDA 设备上，作为 CUDA 张量。

如果为 None，则帧将被移动到 CPU 内存。默认值：None。

add_video_stream¶

StreamingMediaDecoder.add_video_stream(frames_per_chunk: int, buffer_chunk_size: int = 3, *, stream_index: Optional[int] = None, decoder: Optional[str] = None, decoder_option: Optional[Dict[str, str]] = None, filter_desc: Optional[str] = None, hw_accel: Optional[str] = None)[源代码]¶

添加输出视频流

参数

frames_per_chunk (int) –
每块返回的帧数。如果源流在缓冲足够帧数之前耗尽，则块将按原样返回。

提供 -1 会禁用分块，而 pop_chunks() 方法会将所有缓冲的帧连接起来并返回。
buffer_chunk_size (int, optional) –
内部缓冲区大小。当缓冲的块数超过此数字时，旧帧将被丢弃。例如，如果 frames_per_chunk 为 5 且 buffer_chunk_size 为 3，则丢弃早于 15 的帧。提供 -1 会禁用此行为。

默认值：3。
stream_index (int 或 None, optional) – 源视频流索引。如果省略，则使用 default_video_stream。
decoder (str 或 None, optional) –
要使用的解码器名称。提供时，使用指定的解码器而不是默认解码器。

要列出可用解码器，请为音频使用 get_audio_decoders()，为视频使用 get_video_decoders()。

默认值：None。
decoder_option (dict 或 None, optional) –
传递给解码器的选项。从 str 到 str 的映射。(默认值：None)

要列出解码器的解码器选项，可以使用 ffmpeg -h decoder=<DECODER> 命令。

除了特定于解码器的选项外，您还可以传递与多线程相关的选项。这些选项仅在解码器支持它们时才有效。如果两者都未提供，StreamingMediaDecoder 默认为单线程。

"threads"：线程数（以 str 表示）。提供值 "0" 将让 FFmpeg 根据其启发式方法进行决定。

"thread_type"：要使用的多线程方法。有效值为 "frame" 或 "slice"。请注意，每个解码器支持的方法集不同。如果未提供，则使用默认值。
- "frame"：一次解码一个以上的帧。每个线程处理一个帧。这将增加每个线程一帧的解码延迟。
- "slice"：一次解码单个帧的一个以上部分。
hw_accel (str 或 None, optional) –
启用硬件加速。

当视频在 CUDA 硬件上解码时，例如 decoder=”h264_cuvid”，将 CUDA 设备指示符传递给 hw_accel（即 hw_accel=”cuda:0”）将使 StreamingMediaDecoder 将生成的帧直接放置在指定的 CUDA 设备上，作为 CUDA 张量。

如果为 None，则帧将被移动到 CPU 内存。默认值：None。
filter_desc (str 或 None, optional) – 过滤器描述。可用过滤器的列表可在 https://ffmpeg.net.cn/ffmpeg-filters.html 找到。请注意，不支持复杂过滤器。

fill_buffer¶

StreamingMediaDecoder.fill_buffer(timeout: Optional[float] = None, backoff: float = 10.0) → int[源代码]¶

Keep processing packets until all buffers have at least one chunk

参数

timeout (float or None, optional) – See process_packet(). (Default: None)
backoff (float, optional) – See process_packet(). (Default: 10.0)

返回

0 Packets are processed properly and buffers are ready to be popped once.

1 The streamer reached EOF. All the output stream processors flushed the pending frames. The caller should stop calling this method.

返回类型

int

get_metadata¶

StreamingMediaDecoder.get_metadata() → Dict[str, str][源代码]¶

Get the metadata of the source media.

返回: dict

get_out_stream_info¶

StreamingMediaDecoder.get_out_stream_info(i: int) → OutputStream[源代码]¶

Get the metadata of output stream

参数

i (int) – Stream index.

返回

OutputStreamTypes: Information about the output stream. If the output stream is audio type, then OutputAudioStream is returned. If it is video type, then OutputVideoStream is returned.

get_src_stream_info¶

StreamingMediaDecoder.get_src_stream_info(i: int) → InputStream[源代码]¶

Get the metadata of source stream

参数: i (int) – Stream index.
返回: Information about the source stream. If the source stream is audio type, then SourceAudioStream is returned. If it is video type, then SourceVideoStream is returned. Otherwise SourceStream class is returned.
返回类型: InputStreamTypes

is_buffer_ready¶

StreamingMediaDecoder.is_buffer_ready() → bool[源代码]¶: Returns true if all the output streams have at least one chunk filled.

pop_chunks¶

StreamingMediaDecoder.pop_chunks() → Tuple[Optional[ChunkTensor]][源代码]¶

Pop one chunk from all the output stream buffers.

返回: Buffer contents. If a buffer does not contain any frame, then None is returned instead.
返回类型: Tuple[Optional[ChunkTensor]]

process_all_packets¶

StreamingMediaDecoder.process_all_packets()[源代码]¶: Process packets until it reaches EOF.

process_packet¶

StreamingMediaDecoder.process_packet(timeout: Optional[float] = None, backoff: float = 10.0) → int[源代码]¶

Read the source media and process one packet.

If a packet is read successfully, then the data in the packet will be decoded and passed to corresponding output stream processors.

If the packet belongs to a source stream that is not connected to an output stream, then the data are discarded.

When the source reaches EOF, then it triggers all the output stream processors to enter drain mode. All the output stream processors flush the pending frames.

参数

timeout (float or None, optional) –
Timeout in milli seconds.

This argument changes the retry behavior when it failed to process a packet due to the underlying media resource being temporarily unavailable.

When using a media device such as a microphone, there are cases where the underlying buffer is not ready. Calling this function in such case would cause the system to report EAGAIN (resource temporarily unavailable).
- >=0: Keep retrying until the given time passes.
- 0<: Keep retrying forever.
- None : No retrying and raise an exception immediately.
默认值：None。

注意

The retry behavior is applicable only when the reason is the unavailable resource. It is not invoked if the reason of failure is other.
backoff (float, optional) –
Time to wait before retrying in milli seconds.

This option is effective only when timeout is effective. (not None)

When timeout is effective, this backoff controls how long the function should wait before retrying. Default: 10.0.

返回

0 A packet was processed properly. The caller can keep calling this function to buffer more frames.

1 The streamer reached EOF. All the output stream processors flushed the pending frames. The caller should stop calling this method.

返回类型

int

remove_stream¶

StreamingMediaDecoder.remove_stream(i: int)[源代码]¶

Remove an output stream.

参数: i (int) – Index of the output stream to be removed.

seek¶

StreamingMediaDecoder.seek(timestamp: float, mode: str = 'precise')[源代码]¶

Seek the stream to the given timestamp [second]

参数

timestamp (float) – Target time in second.
mode (str) –
Controls how seek is done. Valid choices are;
- ”key”: Seek into the nearest key frame before the given timestamp.
- ”any”: Seek into any frame (including non-key frames) before the given timestamp.
- ”precise”: First seek into the nearest key frame before the given timestamp, then decode frames until it reaches the closes frame to the given timestamp.
注意

All the modes invalidate and reset the internal state of decoder. When using “any” mode and if it ends up seeking into non-key frame, the image decoded may be invalid due to lack of key frame. Using “precise” will workaround this issue by decoding frames from previous key frame, but will be slower.

stream¶

StreamingMediaDecoder.stream(timeout: Optional[float] = None, backoff: float = 10.0) → Iterator[Tuple[Optional[ChunkTensor], ...]][源代码]¶

Return an iterator that generates output tensors

参数

timeout (float or None, optional) – See process_packet(). (Default: None)
backoff (float, optional) – See process_packet(). (Default: 10.0)

返回

Iterator that yields a tuple of chunks that correspond to the output streams defined by client code. If an output stream is exhausted, then the chunk Tensor is substituted with None. The iterator stops if all the output streams are exhausted.

返回类型

Iterator[Tuple[Optional[ChunkTensor], …]]

支持结构¶

ChunkTensor¶

class torio.io._streaming_media_decoder.ChunkTensor[源代码]¶

Decoded media frames with metadata.

The instance of this class represents the decoded video/audio frames with metadata, and the instance itself behave like Tensor.

Client codes can pass instance of this class as-if it’s Tensor class, or call the methods defined on Tensor class.

示例

>>> # Define input streams
>>> reader = StreamingMediaDecoder(...)
>>> reader.add_audio_stream(frames_per_chunk=4000, sample_rate=8000)
>>> reader.add_video_stream(frames_per_chunk=7, frame_rate=28)
>>> # Decode the streams and fetch frames
>>> reader.fill_buffer()
>>> audio_chunk, video_chunk = reader.pop_chunks()

>>> # Access metadata
>>> (audio_chunk.pts, video_chunks.pts)
(0.0, 0.0)
>>>
>>> # The second time the PTS is different
>>> reader.fill_buffer()
>>> audio_chunk, video_chunk = reader.pop_chunks()
>>> (audio_chunk.pts, video_chunks.pts)
(0.5, 0.25)

>>> # Call PyTorch ops on chunk
>>> audio_chunk.shape
torch.Size([4000, 2]
>>> power = torch.pow(video_chunk, 2)
>>>
>>> # the result is a plain torch.Tensor class
>>> type(power)
<class 'torch.Tensor'>
>>>
>>> # Metadata is not available on the result
>>> power.pts
AttributeError: 'Tensor' object has no attribute 'pts'

pts: float¶

Presentation time stamp of the first frame in the chunk.

Unit: second.

SourceStream¶

class torio.io._streaming_media_decoder.SourceStream[源代码]¶

The metadata of a source stream, returned by get_src_stream_info().

This class is used when representing streams of media type other than audio or video.

When source stream is audio or video type, SourceAudioStream and SourceVideoStream, which reports additional media-specific attributes, are used respectively.

media_type: str¶: The type of the stream. One of "audio", "video", "data", "subtitle", "attachment" and empty string.

注意

Only audio and video streams are supported for output.

注意

Still images, such as PNG and JPEG formats are reported as video.

codec: str¶: Short name of the codec. Such as "pcm_s16le" and "h264".

codec_long_name: str¶

Detailed name of the codec.

Such as “PCM signed 16-bit little-endian” and “H.264 / AVC / MPEG-4 AVC / MPEG-4 part 10”.

format: Optional[str]¶

Media format. Such as "s16" and "yuv420p".

Commonly found audio values are;

"u8", "u8p": Unsigned 8-bit unsigned interger.
"s16", "s16p": 16-bit signed integer.
"s32", "s32p": 32-bit signed integer.
"flt", "fltp": 32-bit floating-point.

注意

p at the end indicates the format is planar. Channels are grouped together instead of interspersed in memory.

bit_rate: Optional[int]¶: Bit rate of the stream in bits-per-second. This is an estimated values based on the initial few frames of the stream. For container formats and variable bit rate, it can be 0.

num_frames: Optional[int]¶: 流中的帧数

bits_per_sample: Optional[int]¶: 这是每个输出样本中的有效位数。对于压缩格式，它可以是 0。

metadata: Dict[str, str]¶: 附加到源流的元数据。

SourceAudioStream¶

class torio.io._streaming_media_decoder.SourceAudioStream[source]¶

音频源流的元数据，由 get_src_stream_info() 返回。

此类用于表示音频流。

除了 SourceStream 报告的属性外，还会报告以下属性。

sample_rate: float¶: 音频的采样率。

num_channels: int¶: 通道数。

SourceVideoStream¶

class torio.io._streaming_media_decoder.SourceVideoStream[source]¶

视频源流的元数据，由 get_src_stream_info() 返回。

此类用于表示视频流。

除了 SourceStream 报告的属性外，还会报告以下属性。

width: int¶: 视频帧的像素宽度。

height: int¶: 视频帧的像素高度。

frame_rate: float¶: 帧率。

OutputStream¶

class torio.io._streaming_media_decoder.OutputStream[source]¶

在 StreamingMediaDecoder 上配置的输出流，由 get_out_stream_info() 返回。

source_index: int¶: 此输出流连接到的源流的索引。

filter_description: str¶: 应用于源流的滤镜图的描述。

media_type: str¶: 流的类型。 "audio" 或 "video"。

format: str¶

Media format. Such as "s16" and "yuv420p".

Commonly found audio values are;

"u8", "u8p": Unsigned 8-bit unsigned interger.
"s16", "s16p": 16-bit signed integer.
"s32", "s32p": 32-bit signed integer.
"flt", "fltp": 32-bit floating-point.

注意

p at the end indicates the format is planar. Channels are grouped together instead of interspersed in memory.

OutputAudioStream¶

class torio.io._streaming_media_decoder.OutputAudioStream[source]¶

有关通过 add_audio_stream() 或 add_basic_audio_stream() 配置的音频输出流的信息。

除了 OutputStream 报告的属性外，还会报告以下属性。

sample_rate: float¶: 音频的采样率。

num_channels: int¶: 通道数。

OutputVideoStream¶

class torio.io._streaming_media_decoder.OutputVideoStream[source]¶

有关通过 add_video_stream() 或 add_basic_video_stream() 配置的视频输出流的信息。

除了 OutputStream 报告的属性外，还会报告以下属性。

width: int¶: 视频帧的像素宽度。

height: int¶: 视频帧的像素高度。

frame_rate: float¶: 帧率。