VeenaMax TTS
VeenaMAX transforms text into expressive, real-time speech across multiple Indian languages for seamless communication.
API
If you're looking for an API, you can choose from your desired programming language.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
import requests
import base64
# Use this function to convert an image file from the filesystem to base64
def image_file_to_base64(image_path):
with open(image_path, 'rb') as f:
image_data = f.read()
return base64.b64encode(image_data).decode('utf-8')
# Use this function to fetch an image from a URL and convert it to base64
def image_url_to_base64(image_url):
response = requests.get(image_url)
image_data = response.content
return base64.b64encode(image_data).decode('utf-8')
# Use this function to convert a list of image URLs to base64
def image_urls_to_base64(image_urls):
return [image_url_to_base64(url) for url in image_urls]
api_key = "YOUR_API_KEY"
url = "https://api.segmind.com/v1/veena-max-tts"
# Request payload
data = {
"text": "Segmind lagao, model chalao, itna tez ki result aane se pehle chai bhi tthandi na ho.",
"speaker_id": "vinaya_assist",
"normalize": True
}
headers = {'x-api-key': api_key}
response = requests.post(url, json=data, headers=headers)
print(response.content) # The response is the generated image
Attributes
Provide the text to convert into speech. Use greetings or instructions, like 'Welcome to VeenaMAX, your TTS solution.'
Choose a voice for your text. For a calm tone, select 'soumya_calm'; for impact, select 'agastya_impact'.
Allowed values:
Enable text normalization for better pronunciation. Use this for complex texts or mixed languages.
To keep track of your credit usage, you can inspect the response headers of each API call. The x-remaining-credits property will indicate the number of remaining credits in your account. Ensure you monitor this value to avoid any disruptions in your API usage.
Resources to get you started
Everything you need to know to get the most out of VeenaMax TTS
VeenaMAX: Effective Usage Guide
VeenaMAX is Maya Researchâs cutting-edge Text-to-Speech (TTS) solution, purpose-built for Indian languages and Hinglish content. With eight unique voice personalities, real-time streaming, automatic script detection, and domain-specific terminology support, VeenaMAX delivers natural, expressive, studio-quality audio for diverse applicationsâfrom IVR systems and customer support to e-learning and healthcare.
1. Quick Setup
- â˘Sign up for the Maya Research TTS API.
- â˘Retrieve your API key and endpoint information.
- â˘Choose between streaming and non-streaming modes:
- â˘Streaming: Real-time audio for chatbots, voice assistants, and live translations.
- â˘Non-Streaming: Batch generation of full audio files for downloadable content.
2. Core Parameters
Use the following JSON payload for every request:
{
"text": "Your content here.",
"speaker_id": "vinaya_assist",
"normalize": true
}
- â˘text (string, required): Input text to convert.
- â˘speaker_id (enum, required): Select one of eight voices.
- â˘normalize (bool, optional): Enable for better pronunciation of numbers, acronyms, and mixed languages.
3. Voice Selection by Use Case
â IVR & Customer Support
⢠speaker_id: vinaya_assist
(neutral, helpful)
⢠normalize: true
Ideal for troubleshooting prompts, account inquiries, and telephony menus.
â E-Learning & Educational Content
⢠speaker_id: soumya_calm
(steady, clear)
⢠normalize: true
Ensures attention retention and accurate reading of technical terms.
â Marketing & Announcements
⢠speaker_id: agastya_impact
(dynamic, engaging)
⢠normalize: false (retain branded stylizations)
Delivers high-energy calls to action and promotional scripts.
â Conversational Chatbots & Accessibility
⢠speaker_id: charu_soft
or mohini_whispers
(soothing, gentle)
⢠normalize: true
Creates warm, empathetic interactions for visually impaired users.
4. Best Practices & Tips
- â˘Punctuation & Pauses: Use commas, periods, and ellipses to introduce natural breaks.
- â˘Phonetic Spellings: Spell uncommon names or jargon phonetically to improve accuracy.
- â˘Language Mixing: Rely on VeenaMAXâs auto script detectionâno need for manual tagging.
- â˘Terminology Handling: Enable
normalize
for domain-heavy content (finance, healthcare).
5. Optimizing Output Quality
- â˘Streaming Latency: Reduce buffer size in your audio player for faster playback.
- â˘Volume Normalization: Post-process with a limiter for consistent loudness in multi-segment audio.
- â˘Batch Generation: Use non-streaming mode for producing large libraries of pre-recorded voiceovers.
By following this guide and selecting the right speaker_id
plus the normalize
flag, you can harness VeenaMAXâs full potentialâtransforming text into rich, context-aware speech that resonates with Indian language audiences.
Other Popular Models
Discover other models you might be interested in.
sadtalker
Audio-based Lip Synchronization for Talking Head Video

fooocus
Fooocus enables high-quality image generation effortlessly, combining the best of Stable Diffusion and Midjourney.

face-to-many
Turn a face into 3D, emoji, pixel art, video game, claymation or toy

sd2.1-faceswapper
Take a picture/gif and replace the face in it with a face of your choice. You only need one image of the desired face. No dataset, no training
