Veena TTS
Veena transforms text into high-fidelity, expressive speech in Hindi and English for real-time applications.
API
If you're looking for an API, you can choose from your desired programming language.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
import requests
import base64
# Use this function to convert an image file from the filesystem to base64
def image_file_to_base64(image_path):
with open(image_path, 'rb') as f:
image_data = f.read()
return base64.b64encode(image_data).decode('utf-8')
# Use this function to fetch an image from a URL and convert it to base64
def image_url_to_base64(image_url):
response = requests.get(image_url)
image_data = response.content
return base64.b64encode(image_data).decode('utf-8')
# Use this function to convert a list of image URLs to base64
def image_urls_to_base64(image_urls):
return [image_url_to_base64(url) for url in image_urls]
api_key = "YOUR_API_KEY"
url = "https://api.segmind.com/v1/veena-tts"
# Request payload
data = {
"text": "Kya tumne kabhi socha hai... ki hum sab sirf waqt ke musafir hain?",
"speaker": "kavya",
"temperature": 0.4,
"top_p": 0.9,
"repetition_penalty": 1.05
}
headers = {'x-api-key': api_key}
response = requests.post(url, json=data, headers=headers)
print(response.content) # The response is the generated image
Attributes
Provide input text for speech synthesis. Use simple phrases for clarity, complex for detailed expressions.
Choose speaker for voice style. Kavya for warmth, Agastya for depth.
Set speech variation. Use 0.2 for monotone, 0.7 for lively expression.
min : 0,
max : 2
Control output randomness. Set 0.5 for focused, 0.95 for diverse speech.
min : 0,
max : 1
Minimize word repetition. Use 1.2 for minimal repeats.
min : 1,
max : 2
To keep track of your credit usage, you can inspect the response headers of each API call. The x-remaining-credits property will indicate the number of remaining credits in your account. Ensure you monitor this value to avoid any disruptions in your API usage.
Resources to get you started
Everything you need to know to get the most out of Veena TTS
# Veena TTS: Effective Usage Guide
Veena is a high-fidelity, bilingual Hindi–English text-to-speech model with four distinct speaker personas. This guide walks you through getting the best audio quality and response times by tuning core parameters and selecting voice styles for different applications.
## 1. Getting Started
1. **Input Text**
– Keep sentences clear and declarative for consistent pacing.
– Inject emotive or domain-specific terms to convey nuance.
2. **Speaker Persona**
– **kavya**: Warm, friendly – ideal for podcasts, e-learning
– **agastya**: Deep, authoritative – best for IVR, announcements
– **maitri**: Neutral, clear – suited for screen readers, accessibility
– **vinaya**: Bright, youthful – perfect for marketing, tutorials
## 2. Core Parameters
| Parameter | Range | Default | Effect |
|---------------------|------------|---------|---------------------------------------------------|
| **temperature** | 0.0 – 2.0 | 0.4 | Controls expressiveness; lower=flat, higher=lively |
| **top_p** | 0.0 – 1.0 | 0.9 | Nucleus sampling; lower=focused, higher=diverse |
| **repetition_penalty** | 1.0 – 2.0| 1.05 | Penalizes repeats; higher=fewer looped phrases |
## 3. Use-Case Presets
1. **Interactive Voice Response (IVR)**
– speaker: **agastya**
– temperature: **0.2** (steady, formal)
– top_p: **0.5** (focused clarity)
– repetition_penalty: **1.1**
2. **Audio Narration / Audiobooks**
– speaker: **kavya** or **maitri**
– temperature: **0.6** (natural variation)
– top_p: **0.95** (rich intonation)
– repetition_penalty: **1.05**
3. **Accessibility & Screen Readers**
– speaker: **maitri**
– temperature: **0.2** (consistent pacing)
– top_p: **0.6** (reliable phrasing)
– repetition_penalty: **1.2**
4. **Marketing & E-Learning Clips**
– speaker: **vinaya**
– temperature: **0.7** (energetic)
– top_p: **0.9** (varied tone)
– repetition_penalty: **1.05**
## 4. Tips for Optimal Quality
- **Code-Switching**: Embed Hindi and English naturally; Veena handles mixed scripts seamlessly.
- **Sentence Length**: Break very long sentences into shorter ones for stable prosody.
- **Pauses & Punctuation**: Use commas, ellipses, and line breaks to guide breath and rhythm.
- **Latency**: On high-end GPUs, expect <80 ms per synthesis; quantized mode reduces memory footprint.
## 5. Troubleshooting
- **Choppy Output**: Lower temperature or top_p.
- **Monotone Speech**: Increase temperature to 0.6–0.8.
- **Repetition Artifacts**: Bump repetition_penalty up to 1.2.
By tuning these parameters and choosing the right persona, you can deploy Veena in real-time chatbots, immersive audiobooks, or assistive devices—delivering natural and engaging speech every time.
Other Popular Models
Discover other models you might be interested in.
faceswap-v2
Take a picture/gif and replace the face in it with a face of your choice. You only need one image of the desired face. No dataset, no training

sdxl-inpaint
This model is capable of generating photo-realistic images given any text input, with the extra capability of inpainting the pictures by using a mask

codeformer
CodeFormer is a robust face restoration algorithm for old photos or AI-generated faces.

sd2.1-faceswapper
Take a picture/gif and replace the face in it with a face of your choice. You only need one image of the desired face. No dataset, no training
