GitHub - freewym/espresso: Espresso: A Fast End-to-End Neural Speech Recognition...
source link: https://github.com/freewym/espresso
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
README.md
Espresso
Espresso is an open-source, modular, extensible end-to-end neural automatic speech recognition (ASR) toolkit based on the deep learning library PyTorch and the popular neural machine translation toolkit fairseq
. Espresso supports distributed training across GPUs and computing nodes, and features various decoding approaches commonly employed in ASR, including look-ahead word-based language model fusion, for which a fast, parallelized decoder is implemented.
We provide state-of-the-art training recipes for the following speech datasets:
Requirements and Installation
- PyTorch version >= 1.1.0
- Python version >= 3.5
- For training new models, you'll also need an NVIDIA GPU and NCCL
- For faster training install NVIDIA's apex library with the
--cuda_ext
option
Currently Espresso only support installing from source.
To install fairseq from source and develop locally:
git clone https://github.com/freewym/espresso cd espresso pip install --editable . pip install kaldi_io pip install sentencepiece cd speech_tools; make KALDI=<path/to/a/compiled/kaldi/directory>
add your Python path to PATH
variable in examples/asr_<dataset>/path.sh
, the current default is ~/anaconda3/bin
.
kaldi_io is required for reading kaldi scp files. sentencepiece is required for subword pieces training/encoding. Kaldi is required for data preparation, feature extraction and scoring for some datasets (e.g., Switchboard).
License
Espresso is MIT-licensed.
Citation
Please cite Espresso as:
@inproceedings{wang2019espresso, title = {Espresso: A Fast End-to-end Neural Speech Recognition Toolkit}, author = {Yiming Wang and Tongfei Chen and Hainan Xu and Shuoyang Ding and Hang Lv and Yiwen Shao and Nanyun Peng and Lei Xie and Shinji Watanabe and Sanjeev Khudanpur}, booktitle = {2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)}, year = {2019}, }
Recommend
About Joyk
Aggregate valuable and interesting links.
Joyk means Joy of geeK