Glow-TTS: A Generative Flow for Text-to-Speech via Monotonic Alignment Search

Name: Glow-TTS: A Generative Flow for Text-to-Speech via Monotonic Alignment Search
Rating: 4.5 (206 reviews)
Author: scramble6163

上传者：scramble6163 2021-01-24 04:02:20上传 .PDF文件 1.16 MB 热度 206次

Glow-TTS: A Generative Flow for Text-to-Speech via Monotonic Alignment Search

Recently, text-to-speech (TTS) models such as FastSpeech and ParaNet have been proposed to generate mel-spectrograms from text in parallel. Despite the advantages, the parallel TTS models cannot be trained without guidance from autoregressive TTS models as their external aligners.In this work, we propose Glow-TTS, a flow-based generative model for parallel TTS that does not require any external aligner. We introduce Monotonic Alignment Search (MAS), an internal alignment search algorithm for training Glow-TTS. By leveraging the properties of flows, MAS searches for the most probable monotonic alignment between text and the latent representation of speech. Glow-TTS obtains an order-of-magnitude speed-up over the autoregressive TTS model, Tacotron 2, at synthesis with comparable speech quality, requiring only 1.5 seconds to synthesize one minute of speech in end-to-end. We further show that our model can be easily extended to a multi-speaker setting. Our demo page and code are available at public.

Glow-TTS：通过单调对齐搜索从文本到语音的生成流

最近，已经提出了文本到语音（TTS）模型，例如FastSpeech和ParaNet，用于从文本中并行生成频谱图。尽管有这些优点，但是如果没有自回归TTS模型作为外部对齐器的指导，则无法训练并行TTS模型。.. 在这项工作中，我们提出了Glow-TTS，这是一种基于流的并行TTS生成模型，不需要任何外部对准器。我们介绍了单调对齐搜索（MAS），这是一种用于训练Glow-TTS的内部对齐搜索算法。通过利用流的属性，MAS在文本和语音的潜在表示之间寻找最可能的单调对齐方式。Glow-TTS在自动回归TTS模型Tacotron 2上获得了语音质量可比的数量级加速，合成时具有可比的语音质量，仅需1.5秒即可端到端合成一分钟的语音。我们进一步表明，我们的模型可以轻松扩展到多扬声器设置。我们的演示页面和代码可公开获取。（阅读更多）

下载地址

用户评论

更多下载

下载地址

 立即下载

用户评论

发表评论

Glow_TTS A Generative Flow for Text_to_Speech via Monotonic Alignment Search

最近，已经提出了文本到语音（TTS）模型，例如FastSpeech和ParaNet，用于从文本中并行...

大小：1.16 MB | 2021-01-24 04:02:20

本语音Text-to-Speech以下简称TTS

大小：0B | 2019-01-22 01:21:39

Text to Speech with the Microsoft Speech Library and SDK version5.1_TTS

TexttoSpeechwiththeMicrosoftSpeechLibraryandSDK

大小：0B | 2020-05-13 22:14:33

glow_tts

A Generative Flow for Text-to-Speech via Monotonic...

大小：1.62 MB | 2021-01-24 04:02:38

delphi_xe5_android_tts Text_To_Speech.ZIP

delphi_xe5_android_tts(Text_To_Speech).ZIP

大小：0B | 2019-09-09 22:11:15

speech to text

speech-to-text简单示例代码

大小：1.16MB | 2020-08-08 19:30:15

Text to speech

Text-to-speech is a technology for taking written ...

大小：50KB | 2020-07-19 22:21:04

研究如何在程序中使用Microsoft Speech SDK的TTS Text To Speech功能

对SDK的TTS功能的介绍，应用程序使用ISpVoice接口来控制TTS，通过调用其中的Speak...

大小：0B | 2018-12-21 13:44:14

Scene Alignment by SIFT Flow for Video

Abstract—Video summarization is an efficient and f...

大小：617KB | 2021-04-19 22:19:42

text_to_speech

没有允许不可转载

大小：0B | 2019-09-09 22:11:16

Labview Text to Speech

Labview 利用微軟聲音API實現TTS功能

大小：0B | 2018-12-28 19:38:13

Text to Speech源码

文字转语音 | | :laptop: 普罗耶托请在使用通用的视频,播放和录制内容的网站上添加内容。...

大小：503KB | 2021-04-01 11:57:31

gcp tts.cr Crystal Text To Speech API客户端源码

gcp-tts.cr:Crystal Text-To-Speech API客户端

大小：61KB | 2021-02-21 01:18:35

text to speech2

texttospeech将文本转换成wav，mp3格式的文件

大小：0B | 2019-07-27 12:59:22

C#speech to text

利用speech调用默认的语音输入设备进行，语音识别。

大小：0B | 2019-06-04 15:30:09

Free Speech Calculus Text

The study of calculus focuses on understanding fun...

大小：115B | 2025-05-25 01:39:30