字节跳动推出PolyVoice：语音到语音翻译的语言模型

1 year ago

source link: https://fanyi.news/tiktok-parent-unveils-polyvoice-speech-to-speech-translation-language-models
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

Jun 26, 2023

字节跳动推出PolyVoice：语音到语音翻译的语言模型

字节跳动正凭借新提出的语言模型框架PolyVoice进军语音到语音翻译（speech to speech translation，简称S2ST) 领域。

2023年6月13日的研究论文显示，字节跳动使用仅解码（decoder-only）框架来实现直接的语音翻译，这与语音建模中传统的两步编码器-解码器（encoder-decoder）框架不同，无需中间表示即可将源语言翻译成目标语言，是简化翻译过程的尝试。

PolyVoice由两个模块组成，一是Speech-to-Unit（S2UT）翻译模块，负责将源语言语音的离散单元转换为目标语言语音的离散单元；一是Unit-to-Speech（U2S）合成模块，确保在保留源语言说话人风格的同时合成目标语言语音。

从全球通信的角度来看，PolyVoice最显著的优点在于它对非书面语言的支持能力，可以为以口头语言为主的社区创造新的交流视角。

此外，PolyVoice的高级音频语言模型可以保留源语言说话人的声音和风格，使翻译感觉更加自然和个性化。

从建模的角度来看，创新的仅解码器模型会对整个语音翻译过程产生持久影响，消除传统建模相关的各种普遍问题，比如错误传播、延迟、副语言信息丢失等。

动态

Recommend

字节跳动推出PolyVoice：语音到语音翻译的语言模型

字节跳动推出PolyVoice：语音到语音翻译的语言模型

Recommend

Transforming Healthcare Technology: The Powerful Collaboration between AI and Nu...

C++面试八股文：什么是空指针/野指针/悬垂指针？ - 二进制架构

No, Prime Minister! The UK digital exclusion crisis is getting worse - and no-on...

使用树莓派连接摄像头与实时视频 - AiFly

Docs: search dropdown shows wrong return types for some functions · Issue #35 ·...

图森未来公告：出售美国业务，重心转亚洲

Pride Month - why LGBTQIA+ tech workers still need more than just a poster on th...

Social media and "parasocial media"

Boosting Merchandising Analytics User Experience - DZone

俩兄弟收获一个IPO，市值88亿，曾亲手卖掉公司股权

About Joyk