Computer Science > Sound

[Submitted on 3 Dec 2023 (v1), last revised 21 Dec 2023 (this version, v4)]

OpenVoice: Versatile Instant Voice Cloning

We introduce OpenVoice, a versatile voice cloning approach that requires only a short audio clip from the reference speaker to replicate their voice and generate speech in multiple languages. OpenVoice represents a significant advancement in addressing the following open challenges in the field: 1) Flexible Voice Style Control. OpenVoice enables granular control over voice styles, including emotion, accent, rhythm, pauses, and intonation, in addition to replicating the tone color of the reference speaker. The voice styles are not directly copied from and constrained by the style of the reference speaker. Previous approaches lacked the ability to flexibly manipulate voice styles after cloning. 2) Zero-Shot Cross-Lingual Voice Cloning. OpenVoice achieves zero-shot cross-lingual voice cloning for languages not included in the massive-speaker training set. Unlike previous approaches, which typically require extensive massive-speaker multi-lingual (MSML) dataset for all languages, OpenVoice can clone voices into a new language without any massive-speaker training data for that language. OpenVoice is also computationally efficient, costing tens of times less than commercially available APIs that offer even inferior performance. To foster further research in the field, we have made the source code and trained model publicly accessible. We also provide qualitative results in our demo website. Prior to its public release, our internal version of OpenVoice was used tens of millions of times by users worldwide between May and October 2023, serving as the backend of MyShell.

Comments:	Technical Report
Subjects:	Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
Cite as:	arXiv:2312.01479 [cs.SD]
	(or arXiv:2312.01479v4 [cs.SD] for this version)
	https://doi.org/10.48550/arXiv.2312.01479

Submission history

From: Zengyi Qin [view email]
[v1] Sun, 3 Dec 2023 18:41:54 UTC (109 KB)
[v2] Wed, 13 Dec 2023 02:25:42 UTC (110 KB)
[v3] Sat, 16 Dec 2023 17:22:45 UTC (234 KB)
[v4] Thu, 21 Dec 2023 22:56:45 UTC (234 KB)

OpenVoice: Versatile Instant Voice Cloning

Computer Science > Sound

OpenVoice: Versatile Instant Voice Cloning

Submission history

Recommend

蔚来汽车12月交付新车18012台 2023全年交付超16万台

如何快速清除 OpenAI ChatGPT 上下文？

Woman plugging in electric car (1912)

现在农村怎么看电视？

2024's public domain is a banger

India To Study Black Holes With First Satellite Launch After US - Slashdot

29 years ago today I went online. Netscape Navigator 1.0 was the tool I loved

2023年过去了，我们选出了最难忘的10款消费级产品-品玩

24小时无休轮班，谁在演唱会淘金｜请回答2023

安踏收购74天，MAIA创始人出走

About Joyk