Speech recognition in Indic languages
Vakyansh
Vakyansh is an open source project under the Sunbird umbrella. It is a framework to democratise speech recognition in Indic languages. It has been open sourced by EkStep Foundation and developed with contributions from a team from Thoughtworks. It is modular, well-documented and easy to use.
Some of the key features are :
- End to end training and experimentation platform built on top of Wav2Vec 2.0
- State of the art pretrained and fine tuned models in 8 Indic languages including some low resource languages. (Hindi, Indian English, Kannada, Marathi, Odia, Tamil, Telugu and Gujarati)
- KenLM based language models including text data for all the above languages
- Intelligent data pipelines to generate training data for any end to end speech recognition framework (recipes include language identification, speaker clustering and gender identification)
- Inference service to host models using wav2vec 2.0 in real time and in batch mode