HDemucs¶

class torchaudio.models.HDemucs(sources: List[str], audio_channels: int = 2, channels: int = 48, growth: int = 2, nfft: int = 4096, depth: int = 6, freq_emb: float = 0.2, emb_scale: int = 10, emb_smooth: bool = True, kernel_size: int = 8, time_stride: int = 2, stride: int = 4, context: int = 1, context_enc: int = 0, norm_starts: int = 4, norm_groups: int = 4, dconv_depth: int = 2, dconv_comp: int = 4, dconv_attn: int = 4, dconv_lstm: int = 4, dconv_init: float = 0.0001)[源码]¶

Hybrid Demucs 模型，出自 Hybrid Spectrogram and Waveform Source Separation [Défossez, 2021]。

另请参阅

torchaudio.pipelines.SourceSeparationBundle: 带有预训练模型的源分离流水线。

参数

sources (List[str]) – 源名称列表。列表可包含以下源选项：["bass", "drums", "other", "mixture", "vocals"]。
audio_channels (int, optional) – 输入/输出音频通道数。(默认值: 2)
channels (int, optional) – 初始隐藏通道数。(默认值: 48)
growth (int, optional) – 每个层增加隐藏通道数的因子。(默认值: 2)
nfft (int, optional) – FFT bin 的数量。请注意，更改此参数需要仔细计算各种形状参数，对于混合模型将无法直接使用。(默认值: 4096)
depth (int, optional) – 编码器和解码器中的层数 (默认值: 6)
freq_emb (float, optional) – 如果大于 0，则在第一个频率层后添加频率嵌入，实际值控制嵌入的权重。(默认值: 0.2)
emb_scale (int, optional) – 相当于缩放嵌入的学习率 (默认值: 10)
emb_smooth (bool, optional) – 使用平滑（相对于频率）的嵌入进行初始化。(默认值: True)
kernel_size (int, optional) – 编码器和解码器层的核大小。(默认值: 8)
time_stride (int, optional) – 合并后的最后一个时间层的步幅。(默认值: 2)
stride (int, optional) – 编码器和解码器层的步幅。(默认值: 4)
context (int, optional) – 解码器中 1x1 卷积的上下文。(默认值: 4)
context_enc (int, optional) – 编码器中 1x1 卷积的上下文。(默认值: 0)
norm_starts (int, optional) – 开始使用分组归一化的层。解码器层的编号是反向的。(默认值: 4)
norm_groups (int, optional) – 分组归一化的组数。(默认值: 4)
dconv_depth (int, optional) – 残差 DConv 分支的深度。(默认值: 2)
dconv_comp (int, optional) – DConv 分支的压缩率。(默认值: 4)
dconv_attn (int, optional) – 在 DConv 分支的该层开始添加注意力层。(默认值: 4)
dconv_lstm (int, optional) – 在 DConv 分支的该层开始添加 LSTM 层。(默认值: 4)
dconv_init (float, optional) – DConv 分支 LayerScale 的初始缩放。(默认值: 1e-4)

使用 HDemucs 的教程: 使用 Hybrid Demucs 进行音乐源分离

使用 Hybrid Demucs 进行音乐源分离

方法¶

forward¶

HDemucs.forward(input: Tensor)[源码]¶

HDemucs 前向调用

参数

input (torch.Tensor) – 输入混合张量，形状为 (batch_size, channel, num_frames)

返回

张量: 输出张量按源分割，形状为 (batch_size, num_sources, channel, num_frames)

工厂函数¶

`hdemucs_low`	构建低 nfft (1024) 版本的 `HDemucs`，适用于约 8 kHz 的采样率。
`hdemucs_medium`	构建中等 nfft (2048) 版本的 `HDemucs`，适用于 16-32 kHz 的采样率。
`hdemucs_high`	构建中等 nfft (4096) 版本的 `HDemucs`，适用于 44.1-48 kHz 的采样率。

HDemucs¶

方法¶

forward¶

工厂函数¶

文档

教程

资源