In the English speech-to-text (STT) machine learning task, acoustic models are conventionally trained on uncased Latin characters . Here we propose a new STT task: end to-end neural transcription with fully formatted text for target labels . We present baselineConformer-based models trained on a corpus of 5,000 hours of professionallytranscribed earnings calls, achieving a CER of 1.7 . As a contribution to theSTT research community, we release the corpus free for non-commercial use .

Author(s) : Patrick K. O'Neill, Vitaly Lavrukhin, Somshubra Majumdar, Vahid Noroozi, Yuekai Zhang, Oleksii Kuchaiev, Jagadeesh Balam, Yuliya Dovzhenko, Keenan Freyberg, Michael D. Shulman, Boris Ginsburg, Shinji Watanabe, Georg Kucsko

Links : PDF - Abstract

Code :

Keywords : corpus - speech - text - stt - task -

Leave a Reply

Your email address will not be published. Required fields are marked *