GitHub - babysor/Realtime-Voice-Clone-Chinese: AI拟声: 克隆您的声音并生成任意语...
source link: https://github.com/babysor/Realtime-Voice-Clone-Chinese
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
This repository is forked from Real-Time-Voice-Cloning which only support English.
English | 中文
Features
Chinese supported mandarin and tested with multiple datasets: aidatatang_200zh, SLR68
PyTorch worked for pytorch, tested in version of 1.9.0(latest in August 2021), with GPU Tesla T4 and GTX 2060
Windows + Linux tested in both Windows OS and linux OS after fixing nits
Easy & Awesome effect with only newly-trained synthesizer, by reusing the pretrained encoder/vocoder
DEMO VIDEO
Quick Start
1. Install Requirements
Follow the original repo to test if you got all environment ready. **Python 3.7 or higher ** is needed to run the toolbox.
- Install PyTorch.
- Install ffmpeg.
- Run
pip install -r requirements.txt
to install the remaining necessary packages.
2. Reuse the pretrained encoder/vocoder
- Download the following models and extract to the root directory of this project. Don't use the synthesizer https://github.com/CorentinJ/Real-Time-Voice-Cloning/wiki/Pretrained-models
Note that we need to specify the newly trained synthesizer model, since the original model is incompatible with the Chinese sympols. It means the demo_cli is not working at this moment.
3. Train synthesizer with aidatatang_200zh
-
Download aidatatang_200zh dataset and unzip: make sure you can access all .wav in train folder
-
Preprocess with the audios and the mel spectrograms:
python synthesizer_preprocess_audio.py <datasets_root>
Allow parameter--dataset {dataset}
to support adatatang_200zh, SLR68 -
Preprocess the embeddings:
python synthesizer_preprocess_embeds.py <datasets_root>/SV2TTS/synthesizer
-
Train the synthesizer:
python synthesizer_train.py mandarin <datasets_root>/SV2TTS/synthesizer
-
Go to next step when you see attention line show and loss meet your need in training folder synthesizer/saved_models/.
FYI, my attention came after 18k steps and loss became lower than 0.4 after 50k steps.
4. Launch the Toolbox
You can then try the toolbox:
python demo_toolbox.py -d <datasets_root>
orpython demo_toolbox.py
- Add demo video
- Add support for more dataset
- Upload pretrained model
- Welcome to add more
Recommend
About Joyk
Aggregate valuable and interesting links.
Joyk means Joy of geeK