Speech recognition in Indic languages

Vakyansh

Vakyansh is an open source project under the Sunbird umbrella. It is a framework to democratise speech recognition in Indic languages. It has been open sourced by EkStep Foundation and developed with contributions from a team from Thoughtworks. It is modular, well-documented and easy to use.
Some of the key features are :

  • End to end training and experimentation platform built on top of Wav2Vec 2.0
  • State of the art pretrained and fine tuned models in 8 Indic languages including some low resource languages. (Hindi, Indian English, Kannada, Marathi, Odia, Tamil, Telugu and Gujarati)
  • KenLM based language models including text data for all the above languages
  • Intelligent data pipelines to generate training data for any end to end speech recognition framework (recipes include language identification, speaker clustering and gender identification)
  • ¬†Inference service to host models using wav2vec 2.0 in real time and in batch mode

Current Team Members

Dr. Vivek Raghavan

EkStep Foundation

Rajat Singhal

ThoughtWorks

Rishabh Gaur

ThoughtWorks

Sreejith V

ThoughtWorks

Harveen Chadha

ThoughtWorks

Priyanshi Shah

ThoughtWorks

Soujyo Sen

ThoughtWorks

Umair Manzoor

ThoughtWorks

Anirudh Gupta

ThoughtWorks

Neeraj Chhimwal

ThoughtWorks

Heera Ballabh

ThoughtWorks

Priyanshu Pal

ThoughtWorks

Nikita Tiwari

ThoughtWorks

Ankur Dhuriya

ThoughtWorks

Niresh Kumar R

ThoughtWorks

Amulya Ahuja

ThoughtWorks

Contributors

Soma Siddhartha

IIT Dharwad

Achala Ramu

ThoughtWorks

Shrish Singhal

IIT Guwahati

Raunak Kothari

ThoughtWorks

Parth Shukla

ThoughtWorks

Dinesh Bharule

ThoughtWorks

Alumni

Ira Kaushik

ThoughtWorks

Astha Agarwal

ThoughtWorks

Navaneeth Krishnan

ThoughtWorks

Avneesh Chadha

ThoughtWorks

Faizan Khan

ThoughtWorks

Gaurav Gupta

ThoughtWorks

Aman Tiwari

ThoughtWorks

Manish Singh

ThoughtWorks