精确与近似搜索模式:性能与准确性比较¶
在本示例中,我们将介绍 VideoDecoder
类中 seek_mode
参数。此参数在 VideoDecoder
创建速度与检索帧的搜索准确性之间进行权衡(即在近似模式下,请求第 i
帧不一定返回第 i
帧)。
首先,一些样板代码:我们将从网上下载一个短视频,并使用 ffmpeg CLI 将其重复 100 次。最终我们会得到两个视频:一个大约 13 秒的短视频和一个大约 20 分钟的长视频。您可以忽略这部分,直接跳转到 性能:VideoDecoder 创建。
import torch
import requests
import tempfile
from pathlib import Path
import shutil
import subprocess
from time import perf_counter_ns
# Video source: https://www.pexels.com/video/dog-eating-854132/
# License: CC0. Author: Coverr.
url = "https://videos.pexels.com/video-files/854132/854132-sd_640_360_25fps.mp4"
response = requests.get(url, headers={"User-Agent": ""})
if response.status_code != 200:
raise RuntimeError(f"Failed to download video. {response.status_code = }.")
temp_dir = tempfile.mkdtemp()
short_video_path = Path(temp_dir) / "short_video.mp4"
with open(short_video_path, 'wb') as f:
for chunk in response.iter_content():
f.write(chunk)
long_video_path = Path(temp_dir) / "long_video.mp4"
ffmpeg_command = [
"ffmpeg",
"-stream_loop", "99", # repeat video 100 times
"-i", f"{short_video_path}",
"-c", "copy",
f"{long_video_path}"
]
subprocess.run(ffmpeg_command, check=True, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
from torchcodec.decoders import VideoDecoder
print(f"Short video duration: {VideoDecoder(short_video_path).metadata.duration_seconds} seconds")
print(f"Long video duration: {VideoDecoder(long_video_path).metadata.duration_seconds / 60} minutes")
Short video duration: 13.8 seconds
Long video duration: 23.0 minutes
性能:VideoDecoder
创建¶
从性能角度来看,seek_mode
参数最终影响的是 VideoDecoder
对象的**创建**。视频越长,性能提升越高。
def bench(f, average_over=50, warmup=2, **f_kwargs):
for _ in range(warmup):
f(**f_kwargs)
times = []
for _ in range(average_over):
start = perf_counter_ns()
f(**f_kwargs)
end = perf_counter_ns()
times.append(end - start)
times = torch.tensor(times) * 1e-6 # ns to ms
std = times.std().item()
med = times.median().item()
print(f"{med = :.2f}ms +- {std:.2f}")
print("Creating a VideoDecoder object with seek_mode='exact' on a short video:")
bench(VideoDecoder, source=short_video_path, seek_mode="exact")
print("Creating a VideoDecoder object with seek_mode='approximate' on a short video:")
bench(VideoDecoder, source=short_video_path, seek_mode="approximate")
print()
print("Creating a VideoDecoder object with seek_mode='exact' on a long video:")
bench(VideoDecoder, source=long_video_path, seek_mode="exact")
print("Creating a VideoDecoder object with seek_mode='approximate' on a long video:")
bench(VideoDecoder, source=long_video_path, seek_mode="approximate")
Creating a VideoDecoder object with seek_mode='exact' on a short video:
med = 8.06ms +- 0.02
Creating a VideoDecoder object with seek_mode='approximate' on a short video:
med = 7.08ms +- 0.02
Creating a VideoDecoder object with seek_mode='exact' on a long video:
med = 114.17ms +- 1.21
Creating a VideoDecoder object with seek_mode='approximate' on a long video:
med = 10.50ms +- 0.03
性能:帧解码和剪辑采样¶
严格来说,seek_mode
参数仅影响 VideoDecoder
创建的性能。它不直接影响帧解码或采样的性能。**但是**,由于帧解码和采样模式通常涉及 VideoDecoder
的创建(每个视频一个),seek_mode
最终可能会影响解码器和采样器的性能。例如
from torchcodec import samplers
def sample_clips(seek_mode):
return samplers.clips_at_random_indices(
decoder=VideoDecoder(
source=long_video_path,
seek_mode=seek_mode
),
num_clips=5,
num_frames_per_clip=2,
)
print("Sampling clips with seek_mode='exact':")
bench(sample_clips, seek_mode="exact")
print("Sampling clips with seek_mode='approximate':")
bench(sample_clips, seek_mode="approximate")
Sampling clips with seek_mode='exact':
med = 302.87ms +- 35.44
Sampling clips with seek_mode='approximate':
med = 182.62ms +- 54.41
准确性:元数据和帧检索¶
我们已经看到,使用 seek_mode="approximate"
可以显著加快 VideoDecoder
的创建。为此付出的代价是,搜索的准确性可能不如使用 seek_mode="exact"
。它也可能影响元数据的精确性。
然而,在许多情况下,您会发现两种模式之间没有准确性差异,这意味着 seek_mode="approximate"
是净收益
print("Metadata of short video with seek_mode='exact':")
print(VideoDecoder(short_video_path, seek_mode="exact").metadata)
print("Metadata of short video with seek_mode='approximate':")
print(VideoDecoder(short_video_path, seek_mode="approximate").metadata)
exact_decoder = VideoDecoder(short_video_path, seek_mode="exact")
approx_decoder = VideoDecoder(short_video_path, seek_mode="approximate")
for i in range(len(exact_decoder)):
torch.testing.assert_close(
exact_decoder.get_frame_at(i).data,
approx_decoder.get_frame_at(i).data,
atol=0, rtol=0,
)
print("Frame seeking is the same for this video!")
Metadata of short video with seek_mode='exact':
VideoStreamMetadata:
duration_seconds_from_header: 13.8
begin_stream_seconds_from_header: 0.0
bit_rate: 505790.0
codec: h264
stream_index: 0
begin_stream_seconds_from_content: 0.0
end_stream_seconds_from_content: 13.8
width: 640
height: 360
num_frames_from_header: 345
num_frames_from_content: 345
average_fps_from_header: 25.0
pixel_aspect_ratio: 1
duration_seconds: 13.8
begin_stream_seconds: 0.0
end_stream_seconds: 13.8
num_frames: 345
average_fps: 25.0
Metadata of short video with seek_mode='approximate':
VideoStreamMetadata:
duration_seconds_from_header: 13.8
begin_stream_seconds_from_header: 0.0
bit_rate: 505790.0
codec: h264
stream_index: 0
begin_stream_seconds_from_content: None
end_stream_seconds_from_content: None
width: 640
height: 360
num_frames_from_header: 345
num_frames_from_content: None
average_fps_from_header: 25.0
pixel_aspect_ratio: 1
duration_seconds: 13.8
begin_stream_seconds: 0
end_stream_seconds: 13.8
num_frames: 345
average_fps: 25.0
Frame seeking is the same for this video!
它在幕后做了什么?¶
使用 seek_mode="exact"
时,VideoDecoder
在实例化时会执行一次扫描。扫描不涉及解码,而是处理整个文件以推断更准确的元数据(如持续时间),并构建内部帧和关键帧索引。这个内部索引可能比文件头中的索引更准确,从而带来更准确的搜索行为。没有扫描时,TorchCodec 只依赖文件中包含的元数据,这可能并不总是那么准确。
我应该使用哪种模式?¶
总的经验法则是:
如果您非常关心帧搜索的准确性,请使用“exact”。
如果您可以牺牲搜索准确性来换取速度,这通常是在进行剪辑采样时,请使用“approximate”。
如果您的视频没有可变帧率且元数据正确,那么“approximate”模式是净收益:它将与“exact”模式一样准确,同时速度更快。
脚本总运行时间: (0 分 35.314 秒)