AI & Voice

STT (Speech-to-Text)

Technology that converts spoken audio to written text — foundation of voice agents and call transcription.

STT models include OpenAI's Whisper, Google's Cloud Speech, Deepgram, AssemblyAI, and India-specific ones like Krutrim and Reverie. Quality varies by language, accent, and noise — Indian-accented English varies significantly across models.

For Indian voice agents: Whisper handles 100+ languages including Hindi, Tamil, Telugu, Bengali, Marathi reasonably. Deepgram has specialized Indian-accented English models. Reverie focuses entirely on Indian languages with strong dialect support.

Latency matters — sub-200ms STT enables real-time conversation; 500ms+ feels stilted. Streaming STT (partial results during speech) improves perceived latency dramatically.

India context

Indian voice agents must handle code-switching (Hindi-English in same sentence), tier-2 city accents, and bilingual mid-conversation. Single-language STT models miss 15-30% of meaning in Indian conversations; multilingual auto-detection is essential.

Examples

  • A Doggu voice agent uses Whisper + custom code-switching layer for Hindi-English caller flows.
  • Deepgram excels at Indian-accented English call transcription for clinic + salon use cases.

FAQ

Is Whisper good for Indian languages?

Reasonably good for Hindi, Tamil, Telugu, Bengali. Weaker for less-resourced languages like Odia, Punjabi (under-trained data). Combination with Indian-specific models gives best coverage.

What's the latency target for STT?

Under 200ms for real-time conversational AI. Under 500ms for IVR-style. Streaming STT can deliver partial results in 50-100ms while audio is still arriving.

How does STT handle code-switching?

Better models auto-detect mid-conversation language switch (Whisper, Deepgram Nova-2). Older models lock to one language and miss switches. Critical for Indian SMB use cases.

Related concepts

TTSvoice agentWhisperDeepgramcode-switchingASR

Doggu handles STT (Speech-to-Text) compliance for you.

Whether it's automating the workflow above, Doggu was built specifically for the Indian SMB regulatory environment. One platform, all the requirements.

Try Doggu free for 14 days

Related glossary entries

More in AI & Voice

← All glossary entriesBlogWhatsApp TemplatesFree tools