TACOTRON2_WAVERNN_PHONE_LJSPEECH¶

torchaudio.pipelines.TACOTRON2_WAVERNN_PHONE_LJSPEECH¶

基于音素的 TTS 流水线，其中 Tacotron2 在 LJSpeech 上训练了 1,500 个 epoch [Ito and Johnson, 2017]，以及 WaveRNN 声码器在 8 位深度的 LJSpeech 波形上训练了 10,000 个 epoch [Ito and Johnson, 2017]。

文本处理器根据音素对输入文本进行编码。它使用 DeepPhonemizer 将字素转换为音素。该模型（*en_us_cmudict_forward*）是在 CMUDict 上训练的。

您可以在此处找到 Tacotron2 的训练脚本。使用了以下参数；win_length=1100，hop_length=275，n_fft=2048，mel_fmin=40，以及 mel_fmax=11025。

您可以在此处找到 WaveRNN 的训练脚本。

示例 - “Hello world! T T S stands for Text to Speech!”

示例 - “The examination and testimony of the experts enabled the Commission to conclude that five shots may have been fired,”

文档