nvidia/parakeet
概要
- Nvidia & SunoからparakeetというASRモデルがリリースされた
- FastConformer アーキテクチャをベースにしてRNNT (1.1B, 0.6B), CTC(1.1B, 0.6B)の4モデル
- parakeet-rnnt-1.1b はOpen ASRリーダーボードではOpenAI Whisperを上回る
- FastConformer Transducer1.1B
- The model was trained on 64K hours of English speech collected and prepared by NVIDIA NeMo and Suno teams.
- The training dataset consists of private subset with 40K hours of English speech plus 24K hours from the following public datasets:
- Librispeech 960 hours of English speech
- Fisher Corpus
- Switchboard-1 Dataset
- WSJ-0 and WSJ-1
- National Speech Corpus (Part 1, Part 6)
- VCTK
- VoxPopuli (EN)
- Europarl-ASR (EN)
- Multilingual Librispeech (MLS EN) - 2,000 hour subset
- Mozilla Common Voice (v7.0)
- People's Speech - 12,000 hour subset
- https://huggingface.co/spaces/hf-audio/open_asr_leaderboard

- CC-BY-4.0 ライセンス