speechbrain.inference.TTS 模块
指定文本到语音(TTS)模块的推理接口。
- Authors:
阿库·罗赫 2021
彼得·普兰廷加 2021
洛伦·卢戈斯奇 2020
Mirco Ravanelli 2020
Titouan Parcollet 2021
阿卜杜勒·赫巴 2021
安德烈亚斯·诺茨 2022, 2023
Pooneh Mousavi 2023
Sylvain de Langen 2023
阿德尔·穆门 2023
普拉迪亚·坎达尔卡 2023
摘要
类:
一个即用型的Fastspeech2封装器(文本 -> 梅尔频谱)。 |
|
一个即用型的Fastspeech2封装,带有内部对齐功能(文本 -> 梅尔频谱)。 |
|
一个即用型的Zero-Shot多说话者Tacotron2封装器。 |
|
Tacotron2 的即用型封装器(文本 -> 梅尔频谱)。 |
参考
- class speechbrain.inference.TTS.Tacotron2(*args, **kwargs)[source]
基础类:
PretrainedTacotron2(文本 -> 梅尔频谱)的即用型封装器。
Example
>>> tmpdir_tts = getfixture('tmpdir') / "tts" >>> tacotron2 = Tacotron2.from_hparams(source="speechbrain/tts-tacotron2-ljspeech", savedir=tmpdir_tts) >>> mel_output, mel_length, alignment = tacotron2.encode_text("Mary had a little lamb") >>> items = [ ... "A quick brown fox jumped over the lazy dog", ... "How much wood would a woodchuck chuck?", ... "Never odd or even" ... ] >>> mel_outputs, mel_lengths, alignments = tacotron2.encode_batch(items)
>>> # One can combine the TTS model with a vocoder (that generates the final waveform) >>> # Initialize the Vocoder (HiFIGAN) >>> tmpdir_vocoder = getfixture('tmpdir') / "vocoder" >>> from speechbrain.inference.vocoders import HIFIGAN >>> hifi_gan = HIFIGAN.from_hparams(source="speechbrain/tts-hifigan-ljspeech", savedir=tmpdir_vocoder) >>> # Running the TTS >>> mel_output, mel_length, alignment = tacotron2.encode_text("Mary had a little lamb") >>> # Running Vocoder (spectrogram-to-waveform) >>> waveforms = hifi_gan.decode_batch(mel_output)
- HPARAMS_NEEDED = ['model', 'text_to_sequence']
- class speechbrain.inference.TTS.MSTacotron2(*args, **kwargs)[source]
基础类:
Pretrained一个即用型的Zero-Shot Multi-Speaker Tacotron2封装器。 用于语音克隆:(text, reference_audio) -> (mel_spec)。 用于生成随机说话者语音:(text) -> (mel_spec)。
Example
>>> tmpdir_tts = getfixture('tmpdir') / "tts" >>> mstacotron2 = MSTacotron2.from_hparams(source="speechbrain/tts-mstacotron2-libritts", savedir=tmpdir_tts) >>> # Sample rate of the reference audio must be greater or equal to the sample rate of the speaker embedding model >>> reference_audio_path = "tests/samples/single-mic/example1.wav" >>> input_text = "Mary had a little lamb." >>> mel_output, mel_length, alignment = mstacotron2.clone_voice(input_text, reference_audio_path) >>> # One can combine the TTS model with a vocoder (that generates the final waveform) >>> # Initialize the Vocoder (HiFIGAN) >>> tmpdir_vocoder = getfixture('tmpdir') / "vocoder" >>> from speechbrain.inference.vocoders import HIFIGAN >>> hifi_gan = HIFIGAN.from_hparams(source="speechbrain/tts-hifigan-libritts-22050Hz", savedir=tmpdir_vocoder) >>> # Running the TTS >>> mel_output, mel_length, alignment = mstacotron2.clone_voice(input_text, reference_audio_path) >>> # Running Vocoder (spectrogram-to-waveform) >>> waveforms = hifi_gan.decode_batch(mel_output) >>> # For generating a random speaker voice, use the following >>> mel_output, mel_length, alignment = mstacotron2.generate_random_voice(input_text)
- HPARAMS_NEEDED = ['model']
- class speechbrain.inference.TTS.FastSpeech2(*args, **kwargs)[source]
基础类:
Pretrained一个即用型的Fastspeech2封装器(文本 -> 梅尔频谱)。
Example
>>> tmpdir_tts = getfixture('tmpdir') / "tts" >>> fastspeech2 = FastSpeech2.from_hparams(source="speechbrain/tts-fastspeech2-ljspeech", savedir=tmpdir_tts) >>> mel_outputs, durations, pitch, energy = fastspeech2.encode_text(["Mary had a little lamb."]) >>> items = [ ... "A quick brown fox jumped over the lazy dog", ... "How much wood would a woodchuck chuck?", ... "Never odd or even" ... ] >>> mel_outputs, durations, pitch, energy = fastspeech2.encode_text(items) >>> >>> # One can combine the TTS model with a vocoder (that generates the final waveform) >>> # Initialize the Vocoder (HiFIGAN) >>> tmpdir_vocoder = getfixture('tmpdir') / "vocoder" >>> from speechbrain.inference.vocoders import HIFIGAN >>> hifi_gan = HIFIGAN.from_hparams(source="speechbrain/tts-hifigan-ljspeech", savedir=tmpdir_vocoder) >>> # Running the TTS >>> mel_outputs, durations, pitch, energy = fastspeech2.encode_text(["Mary had a little lamb."]) >>> # Running Vocoder (spectrogram-to-waveform) >>> waveforms = hifi_gan.decode_batch(mel_outputs)
- HPARAMS_NEEDED = ['spn_predictor', 'model', 'input_encoder']
- class speechbrain.inference.TTS.FastSpeech2InternalAlignment(*args, **kwargs)[source]
基础类:
Pretrained一个即用型的Fastspeech2封装器,带有内部对齐功能(文本 -> 梅尔频谱)。
Example
>>> tmpdir_tts = getfixture('tmpdir') / "tts" >>> fastspeech2 = FastSpeech2InternalAlignment.from_hparams(source="speechbrain/tts-fastspeech2-internal-alignment-ljspeech", savedir=tmpdir_tts) >>> mel_outputs, durations, pitch, energy = fastspeech2.encode_text(["Mary had a little lamb."]) >>> items = [ ... "A quick brown fox jumped over the lazy dog", ... "How much wood would a woodchuck chuck?", ... "Never odd or even" ... ] >>> mel_outputs, durations, pitch, energy = fastspeech2.encode_text(items) >>> # One can combine the TTS model with a vocoder (that generates the final waveform) >>> # Initialize the Vocoder (HiFIGAN) >>> tmpdir_vocoder = getfixture('tmpdir') / "vocoder" >>> from speechbrain.inference.vocoders import HIFIGAN >>> hifi_gan = HIFIGAN.from_hparams(source="speechbrain/tts-hifigan-ljspeech", savedir=tmpdir_vocoder) >>> # Running the TTS >>> mel_outputs, durations, pitch, energy = fastspeech2.encode_text(["Mary had a little lamb."]) >>> # Running Vocoder (spectrogram-to-waveform) >>> waveforms = hifi_gan.decode_batch(mel_outputs)
- HPARAMS_NEEDED = ['model', 'input_encoder']