[Submitted on 25 Apr 2022]

SyntaSpeech: Syntax-Aware Generative Adversarial Text-to-Speech

The recent progress in non-autoregressive text-to-speech (NAR-TTS) has made fast and high-quality speech synthesis possible. However, current NAR-TTS models usually use phoneme sequence as input and thus cannot understand the tree-structured syntactic information of the input sequence, which hurts the prosody modeling. To this end, we propose SyntaSpeech, a syntax-aware and light-weight NAR-TTS model, which integrates tree-structured syntactic information into the prosody modeling modules in PortaSpeech \cite{ren2021portaspeech}. Specifically, 1) We build a syntactic graph based on the dependency tree of the input sentence, then process the text encoding with a syntactic graph encoder to extract the syntactic information. 2) We incorporate the extracted syntactic encoding with PortaSpeech to improve the prosody prediction. 3) We introduce a multi-length discriminator to replace the flow-based post-net in PortaSpeech, which simplifies the training pipeline and improves the inference speed, while keeping the naturalness of the generated audio. Experiments on three datasets not only show that the tree-structured syntactic information grants SyntaSpeech the ability to synthesize better audio with expressive prosody, but also demonstrate the generalization ability of SyntaSpeech to adapt to multiple languages and multi-speaker text-to-speech. Ablation studies demonstrate the necessity of each component in SyntaSpeech. Source code and audio samples are available at this https URL

Comments:	Accepted by IJCAI-2022. 12 pages
Subjects:	Sound (cs.SD); Audio and Speech Processing (eess.AS)
Cite as:	arXiv:2204.11792 [cs.SD]
	(or arXiv:2204.11792v1 [cs.SD] for this version)
	https://doi.org/10.48550/arXiv.2204.11792

[2204.11792] SyntaSpeech: Syntax-Aware Generative Adversarial Text-to-Speech

SyntaSpeech: Syntax-Aware Generative Adversarial Text-to-Speech

Recommend

What Happened To Scan From Shark Tank Season 5?

有没有比 `Roboto Mono` 更好看的字体？

王者荣耀成立女装品牌，一条雪纺裙卖1680

AMD has a brilliant plan for its next-gen CPUs: just copy what Intel is doing

腾讯第一季度本土市场游戏收入351亿元，同比增长6%

Millennials and Gen Z’s rebellion against their parents’ rules is spawning a $18...

Twitter Employees Got a Saudi Dissident Kidnapped, Lawsuit Says

大疆在美遭遇专利陷阱，拒赔20亿罚款将退出美国市场？官方否认

无糖茶霸占冰柜，在这4个方面卷死

5 steps to work smarter and successfully

About Joyk