STT models include OpenAI's Whisper, Google's Cloud Speech, Deepgram, AssemblyAI, and India-specific ones like Krutrim and Reverie. Quality varies by language, accent, and noise — Indian-accented English varies significantly across models.
For Indian voice agents: Whisper handles 100+ languages including Hindi, Tamil, Telugu, Bengali, Marathi reasonably. Deepgram has specialized Indian-accented English models. Reverie focuses entirely on Indian languages with strong dialect support.
Latency matters — sub-200ms STT enables real-time conversation; 500ms+ feels stilted. Streaming STT (partial results during speech) improves perceived latency dramatically.