
Ever wished your AI assistant could speak like your local chai-wallah or sound just like your Gujarati aunt? The gap between robotic AI voices and authentic Indian speech has finally been bridged!
Sarvam AI's Bulbul-V2 is making waves across India's tech scene with its remarkable ability to generate natural-sounding speech in 11 Indian languages.
This breakthrough TTS system isn't just another tech toy-it's bringing AI closer to India's diverse linguistic landscape and creating exciting opportunities for developers, content creators, and businesses nationwide.
Let us explore how Bulbul-V2 works, test its capabilities across different languages, examine practical applications, and see how it stacks up against global competitors.
What’s Bulbul V2?
Bulbul V2 is Sarvam AI’s flagship text-to-speech model, built specifically for the Indian market. Unlike the usual robotic-sounding TTS tools, Bulbul V2 delivers speech that’s natural, expressive, and-here’s the clincher-regionally authentic. We’re talking about voices that sound like your next-door neighbour, not a machine from Silicon Valley.
Key Features at a Glance:
- Supports 11 Indian languages: Hindi, Tamil, Telugu, Marathi, Bengali, Punjabi, Odia, Kannada, Malayalam, Gujarati, and Oriya
- Authentic regional accents: Not just the language, but the flavour of the region
- Lightning-fast performance: P90 latency of just 0.398 seconds (that’s more than twice as fast as ElevenLabs)
- Affordable pricing: ₹15 per 10,000 characters-up to 5x cheaper than global rivals
- Customisable voice options: Six distinct personalities for different industries and vibes
- Fine-grained control: Tweak pitch, pace, loudness, and sample rate
- Smart text processing: Handles numbers, dates, code-mixed text, and more
Why Bulbul V2 Is a Big Deal for India
India’s got over 20 official languages and hundreds of dialects. Most global TTS models, like ElevenLabs, barely scratch the surface-usually offering generic Hindi or at best, a couple of regional variants. Bulbul V2 flips the script by:
- Covering more Indian languages than any major competitor
- Delivering voices that feel local, not just “Indian”
- Making voice tech affordable and accessible for startups, enterprises, and indie devs alike
The Brains Behind the Bird: Sarvam AI

Sarvam AI isn’t just another AI startup. Founded in Bengaluru by Vivek Raghavan and Pratyush Kumar (ex-AI4Bharat), Sarvam’s mission is bold: build AI that speaks India’s languages, for India’s people. And they’re not just talking the talk-Sarvam was picked by the Indian government to build the country’s first homegrown AI foundational model. That’s a serious vote of confidence.
Backed by the Big Guns
In December 2023, Sarvam AI raised a whopping $41 million in Series A funding, led by Lightspeed Ventures, with Peak XV Partners and Khosla Ventures jumping in. This isn’t just hype-it’s a sign that investors see real potential in Indian-centric AI solutions.
How Bulbul V2 Works: Under the Hood
Training Data That Gets India
Bulbul V2 was trained on diverse, high-quality audio datasets, featuring multiple speakers, code-mixed inputs, proper nouns, abbreviations, and a mix of conversational and professional tones. This means the model doesn’t just “read” text-it understands the context, the emotion, and the quirks of Indian speech.
Voice Personalities for Every Need
Sarvam AI offers six unique voice personas:
You can also create custom voices for your brand-think consistent auditory branding across all your platforms.
API and Developer Goodies
- Python SDK: Easy integration for devs
- API access: Fast, reliable, and comes with free credits for new users
- Control parameters: Adjust pitch, pace, loudness, and sample rate (8kHz to 24kHz)
- Smart preprocessing: Auto-normalises numbers, dates, and mixed-language text
Sample Code to Get You Started
python
from sarvamai import SarvamAI
from sarvamai.play import play, save
client = SarvamAI(
api_subscription_key="YOUR_API_SUBSCRIPTION_KEY"
)
response = client.text_to_speech.convert(
inputs=["Hello, how are you today?"],
target_language_code="en-IN",
enable_preprocessing=True
)
play(response)
Save the output as a WAV file for your app, bot, or IVR system.

Performance: Speed, Quality, and Cost
Let’s get real-no one likes lag or robotic voices. Bulbul V2’s P90 latency clocks in at just 0.398 seconds, which is blazing fast compared to ElevenLabs’ 0.945 seconds. For businesses, that means snappier interactions and happier users.
Cost Comparison
| Model | Price per 10,000 Characters | Languages Supported | P90 Latency (sec) |
|---|---|---|---|
| Bulbul V2 | ₹15 | 11 (Indian) | 0.398 |
| ElevenLabs | ~₹75 | 2 (Indian) | 0.945 |
Bulbul V2 is five times cheaper and more than twice as fast as its global rival.
Hands-On: Testing Bulbul V2
1. Humour and Expressiveness
- Prompt: A funny Hindi joke about computers and viruses
- Result: Clear and fluent, but emotional delivery (like laughter) could use a boost. Still, miles ahead of the competition in clarity and naturalness.
Sample Output:
2. Multilingual Input
- Prompt: Punjabi text, output in Tamil
- Result: The model reads the text as-is, doesn’t translate. So, for now, translation must be handled externally.
3. Code-Mixed and Complex Text
- Prompt: Malayalam text, output in Gujarati
- Result: Model outputs in the source language, not the target. Again, translation isn’t built-in yet-combine with a translation API for full workflow.
Pro Tip: For seamless translation + TTS, plug in Google Translate or another translation API before sending text to Bulbul V2.
What Sets Bulbul V2 Apart?
- Regional authenticity: Voices that actually sound like your city or state
- Speed and cost: Faster and cheaper than global TTS leaders
- Developer-friendly: Python SDK, easy API, free trial credits
- Customisation: Build your own brand voice
- India-first approach: Designed with local users, businesses, and content creators in mind
Limitations and What’s Next
- No built-in translation: You’ll need an external tool for language conversion
- Expressiveness: While natural, some emotional tones (like humour) are still a work in progress
- Continuous improvement: Sarvam AI is actively working on making voices more lively and expressive

Why Marketers, Developers, and AI Buffs Should Care
If you’re building for India, you can’t ignore language diversity. Bulbul V2 bridges the gap, letting you reach millions in their own voice-literally. Whether you’re scaling a SaaS platform, launching a regional podcast, or building the next-gen chatbot, this tool is a game-changer.
- For marketers: Localise campaigns, boost engagement, and build trust with authentic voices.
- For developers: Plug-and-play API, fine-tune voices, and deliver fast, natural speech.
- For AI enthusiasts: See Indian AI matching (and beating) global giants on home turf.
Conclusion: Bulbul-V2's Place in India's AI Ecosystem
Bulbul-V2 marks a significant leap forward in India's AI development journey, particularly in the domain of text-to-speech technology. By delivering fast, natural, and regionally authentic voices, it's helping bridge the linguistic divide that has often made technology less accessible to non-English speakers across the country.

While the system isn't perfect-particularly in handling complex emotions and cross-language translation-its exceptional speed, affordability, and language-specific optimization make it an impressive achievement and a valuable tool for developers and businesses targeting the Indian market.
For anyone working on applications that target Indian users, this homegrown TTS solution deserves serious consideration as an alternative to Western-focused options that often struggle with Indian languages and contexts.

