nvidia/parakeet

概要

  • Nvidia & SunoからparakeetというASRモデルがリリースされた
  • FastConformer アーキテクチャをベースにしてRNNT (1.1B, 0.6B), CTC(1.1B, 0.6B)の4モデル
  • parakeet-rnnt-1.1b はOpen ASRリーダーボードではOpenAI Whisperを上回る
    • FastConformer Transducer1.1B
    • The model was trained on 64K hours of English speech collected and prepared by NVIDIA NeMo and Suno teams.
    • The training dataset consists of private subset with 40K hours of English speech plus 24K hours from the following public datasets:
      • Librispeech 960 hours of English speech
      • Fisher Corpus
      • Switchboard-1 Dataset
      • WSJ-0 and WSJ-1
      • National Speech Corpus (Part 1, Part 6)
      • VCTK
      • VoxPopuli (EN)
      • Europarl-ASR (EN)
      • Multilingual Librispeech (MLS EN) - 2,000 hour subset
      • Mozilla Common Voice (v7.0)
      • People's Speech - 12,000 hour subset
    • https://huggingface.co/spaces/hf-audio/open_asr_leaderboard
  • CC-BY-4.0 ライセンス