Enterprise Voice AI · India

Every voice agent
built for India
fails at the seam.

Between hearing and understanding. Between what the caller said and what the system did with it.

Voicema closes that seam — for BFSI, healthcare, and BPO teams running voice operations in Hindi, Tamil, Telugu and 19 more Indian languages. Designed for DPDP compliance. India data residency options available. TTFB under 1.4s on WebSocket — telephony benchmarks in Book 2.

Free PDF. No pitch deck. No demo theatre.

34%

of enterprise voice calls abandoned before resolution

8 min

average handle time. Industry target: 3 minutes.

Indian languages your current agent does not understand

<1.4s

TTFB after optimisation. Below the 3-second Jio threshold.

Built on the Vak FrameworkVAK-01 · Perception · Reasoning · Expression

Workflows

Enterprise buyers
think in workflows.

Voicema is not a chatbot with a microphone. It is a voice operations layer for teams running real customer conversations at scale.

BFSI

Insurance Claim Status

Callers ask about claim 7734-B in Hindi. The agent extracts the claim number, queries the database, and responds in under 1.4 seconds — in the caller's language.

claim statusHindi-EnglishIRDA compliantDeepgram nova-3

BFSI

Loan Collection Calls

Outbound calls that speak regional languages, handle objections, and transfer to a human at the right moment — without sounding like a robot reading a script.

outboundmultilingualhuman handoffinterruption handling

Healthcare

Appointment Reminders

Patients confirm, reschedule, or cancel in Tamil, Kannada, or Bengali. The system updates the booking automatically and logs the interaction.

vernacularintent extractioncalendar integrationDPDP-ready

Healthcare

Post-Discharge Follow-up

Structured health check calls that detect when a patient's response needs clinical escalation, and transfer immediately — with the conversation transcript attached.

clinical routingtranscriptescalationsession memory

BPO / Support

Inbound Customer Support

First-call resolution for Tier 1 queries across 22 Indian languages. The agent knows when it cannot answer and transfers — with context, not silence.

22 languagesTier 1 resolutionnoise handlingMeera-tested

Sales

Lead Qualification

Outbound calls that qualify intent, capture product preference, and score leads — before a human sales agent spends 8 minutes on a cold conversation.

outboundintent classificationCRM integrationlatency <1.5s

The Vak Framework

Every voice agent fails at the seam between hearing and understanding.
The Vak Framework closes that seam.

Vak (वाक्) is Sanskrit for speech — the faculty that allows thought to become communication. The framework maps to the three failure points that cause 34% of enterprise calls to end without resolution.

Each layer is independently testable, independently optimisable, and independently deployable. The entire architecture is documented across 8 books — free.

Perception Layer

Hearing the caller correctly — regardless of accent, noise, or language.

Deepgram Nova-3Sarvam Saaras V3 (Indic)hi-en code-switching22 Indian languagesRMS noise filterHindi digit normaliser

Reasoning Layer

Understanding intent — not just transcribing words. Context, memory, disambiguation.

GPT-4o-miniIntent classificationSession memoryPostgreSQL statetemp=0 JSON outputIRDA-safe routing

Expression Layer

Responding in the caller's language, at human speed, with natural prosody.

ElevenLabs Turbo< 1.4s TTFBStreaming audioInterruption handlingHindi-English TTSHuman handoff

Production Specifications

Built for production.
Not for demos.

1.4s

Average TTFB

Time to first spoken word from end of caller utterance. Measured on WebSocket (Book 1). Telephony benchmarks in Book 2.

Indian Languages

Hindi, Tamil, Telugu, Kannada, Marathi, Bengali and 16 more. Code-switching handled natively.

94%

Intent Accuracy

LLM classifier at temperature 0. Structured JSON output. Rejects fabrication by design.

WER (Bengaluru)

Word Error Rate after Chapter 7 optimisation. Tamil Nadu: 15.7% — fine-tuning in Book 6.

75%

Noisy Environment

Correct transcriptions in 60dB ambient noise after two-stage VAD + RMS client filter.

₹0.36

Per ASR Minute

Deepgram Nova-3 streaming at ₹83/$. Full pipeline: ₹1,55,446/mo at 3,400 calls/day.

99.9%

Target Uptime

AWS ap-south-1. EC2 + Nginx + Let's Encrypt. Docker restart policy. Health-check monitoring.

Fabricated Answers

Intent classification routes to data functions. The system says 'I don't know' rather than inventing.

DPDP Rules 2025

Designed for DPDP compliance. Rules notified November 2025 — full deadline May 2027. India-resident ASR available via Sarvam Saaras V3. Compute in ap-south-1.

Telephony Integration

Exotel, LiveKit WebRTC, and standard SIP. Bring your existing telephony stack. Voicema sits between the call and the logic.

Human Handoff

Every call has an escalation path. After two failed intent classifications — or on explicit request — the call transfers with the full conversation transcript attached.

8 Books · 1 Platform

The Vak Framework
is open knowledge.

Every architectural decision is documented. The difference from API platforms: when your BFSI compliance team asks why, you have an answer. Read the books. Build it yourself. Or deploy Voicema when you cannot afford 18 months.

Book 1 of 8 · Free

The Listening Machine

Build a complete voice agent from scratch — streaming ASR, real-time TTS, and the first call that answers back. Every architectural decision explained. No black boxes, no hand-waving.

Get Book 1 Free →Get Series Updates

Coming Next — Books 2 through 8

Book 2

Faster Than Silence

INTERMEDIATE

Handle interruption, stream everything, and speak Hindi and Tamil before your caller hangs up.

Book 3

The Orchestrator

ADVANCED

Multi-agent routing, live tool calls, persistent memory, and the voice graph that never loses the thread.

Book 4

One Platform, Many Voices

EXPERT

Multi-tenant architecture, Exotel telephony, DPDP Act compliance, and SLAs built for Indian enterprise.

Book 5

Bharat Speaks

SPECIALIST

Vernacular-first voice AI for 900 million Indians who do not speak English at home.

Book 6

Your Words, Your Model

SPECIALIST

Fine-tune ASR, LLM, and TTS for your domain — from 34% WER to 6% in one training run.

Book 7

What the Dashboard Missed

CRAFT

Voice persona design, dialog architecture, and the nine rasas that determine whether callers trust you.

Book 8

Now Sell It

FOUNDER

Price, contract, and sell your voice AI to Indian enterprises — and build the platform that becomes someone else's Book 1.

Every voice agentbuilt for Indiafails at the seam.

Enterprise buyersthink in workflows.