SpeedPerturbation¶

class torchaudio.transforms.SpeedPerturbation(orig_freq: int, factors: Sequence[float])[source]¶

应用了在 Audio augmentation for speech recognition 中引入的速度扰动增强 [Ko et al., 2015]。对于给定的输入，该模块会从 factors 中均匀地随机采样一个加速因子，并以此因子调整输入的播放速度。

参数

orig_freq (int) – 信号在 waveform 中的原始频率。
factors (Sequence[float]) – 用于调整输入速度的因子。大于 1.0 的值会压缩 waveform 的时间，而小于 1.0 的值会拉伸 waveform 的时间。

示例

>>> speed_perturb = SpeedPerturbation(16000, [0.9, 1.1, 1.0, 1.0, 1.0])
>>> # waveform speed will be adjusted by factor 0.9 with 20% probability,
>>> # 1.1 with 20% probability, and 1.0 (i.e. kept the same) with 60% probability.
>>> speed_perturbed_waveform = speed_perturb(waveform, lengths)

forward(waveform: Tensor, lengths: Optional[Tensor] = None) → Tuple[Tensor, Optional[Tensor]][source]¶

参数

waveform (torch.Tensor) – 输入信号，形状为 (…, time)。
lengths (torch.Tensor 或 None, optional) – waveform 中信号的有效长度，形状为 (…)。如果为 None，则 waveform 中的所有元素都将被视为有效。 (默认： None)

返回

torch.Tensor: 速度调整后的波形，形状为 (…, new_time)。
torch.Tensor 或 None: 如果 lengths 不为 None，则为速度调整后的信号在 waveform 中的有效长度，形状为 (…)；否则为 None。

返回类型

(torch.Tensor, torch.Tensor 或 None)

SpeedPerturbation¶

文档

教程

资源