Quality of E2E Audio - Major Indian Languages

Language #Speakers (Millions) Comments
Hindi 530 Works well on Yella's suite of random prompts
Bengali 100 Whisper performs very poorly in speech analysis
Marathi 83 Whisper intermittently emits Hindi text for Marathi speech & fails
Telugu 81 Whisper performs very poorly in speech analysis
Tamil 69 Works well on Yella's suite of random prompts
Kannada 44 Works well on Yella's suite of random prompts
Malayalam 34 Whisper intermittently translates output text into English

Introducing audio conversatons with ChatGPT within WhatsApp

Why type when you can just say what you want from ChatGPT? This is the basic premise of what we are introducing today - an audio interface in WhatsApp to ChatGPT. Its not just speaking to ChatGPT - we also read out the responses in your spoken language. And show you the responses in text form too.

  • WhatsApp converstions with ChatGPT by adding 14087570747 as a contact
  • Or click on https://wa.me/14087570747 to converse within WhatsApp
  • A sample Hindi conversation
  • A sample Kannada conversation
  • A sample Tamil conversation

Conclusions from Yella's tests

OpenAI mentions Hindi, Tamil & Kannada as the only 3 Indian languages with decent error rates (low ie). Our internal tests matches that conclusion. It should be pointed out that these 3 languages not only perform well during speech to text synthesis. But the ChatGPT models also accept text in those languages. And return results in the same language. We have noticed some Indian languages are recognized correctly by Whisper. But the final ChatGPT results are emitted in a different language. We classify these as failures. Because in our target demographics, the results need to be presented in the input spoken language. The target demographics may not comprehend English e.g

So, what's next?

Given the above test results, as engineers, we have our goals clearly spelt out it seems. Yella is committed to Bhashini initiative for Indian languages by the Indian government. And enhancing the base Whisper models for lower WER. However, this by itself may not guarantee the final emitted audio will be in the native language. If ChatGPT emits results in English or any other language, the final TTS step needs to have the ability to translate to the native language. These gaps are essentially Yella's mission. Yella means "All" in Kannada. By this, we mean - all languages.