Dia (Text to Speech)
Dia by Nari Labs is an advanced open-weights TTS model that brings scripts to life with natural speech, emotions, and nonverbal cues. Easily control tone, voice, and delivery. Great alternative to ElevenLabs.
1import requests
2import json
3
4url = "https://api.segmind.com/v1/dia"
5headers = {
6 "x-api-key": "YOUR_API_KEY",
7 "Content-Type": "application/json"
8}
9
10data = {
11 "text": "[S1] Segmind lets you build powerful image and video workflows — no code needed. \n [S2] Over 200 open and closed models. Just drag, drop, and deploy. \n [S1] Wait, seriously? Even custom models? \n [S2] Yup. Even fine-tuned ones. (chuckles) \n [S1] That's wild. I’ve spent weeks writing code for this. \n [S2] Now you can do it in minutes. Go try Segmind on the cloud. \n [S1] I'm sold. Let’s go. (laughs)",
12 "top_p": 0.95,
13 "cfg_scale": 4,
14 "temperature": 1.3,
15 "input_audio": "https://segmind-resources.s3.amazonaws.com/input/7d11a77b-366c-406e-b6af-eefaec4f8574-fa3123db-56cf-4212-9cc9-ebc49e692202-04fc4d16-25df-44cb-9b7c-37aa7543e6d2.wav",
16 "speed_factor": 0.94,
17 "max_new_tokens": 3072,
18 "cfg_filter_top_k": 35
19}
20
21response = requests.post(url, headers=headers, json=data)
22
23if response.status_code == 200:
24 result = response.json()
25 print(json.dumps(result, indent=2))
26else:
27 print(f"Error: {response.status_code}")
28 print(response.text) 1import requests
2import json
3
4url = "https://api.segmind.com/v1/dia"
5headers = {
6 "x-api-key": "YOUR_API_KEY",
7 "Content-Type": "application/json"
8}
9
10data = {
11 "text": "[S1] Segmind lets you build powerful image and video workflows — no code needed. \n [S2] Over 200 open and closed models. Just drag, drop, and deploy. \n [S1] Wait, seriously? Even custom models? \n [S2] Yup. Even fine-tuned ones. (chuckles) \n [S1] That's wild. I’ve spent weeks writing code for this. \n [S2] Now you can do it in minutes. Go try Segmind on the cloud. \n [S1] I'm sold. Let’s go. (laughs)",
12 "top_p": 0.95,
13 "cfg_scale": 4,
14 "temperature": 1.3,
15 "input_audio": "https://segmind-resources.s3.amazonaws.com/input/7d11a77b-366c-406e-b6af-eefaec4f8574-fa3123db-56cf-4212-9cc9-ebc49e692202-04fc4d16-25df-44cb-9b7c-37aa7543e6d2.wav",
16 "speed_factor": 0.94,
17 "max_new_tokens": 3072,
18 "cfg_filter_top_k": 35
19}
20
21response = requests.post(url, headers=headers, json=data)
22
23if response.status_code == 200:
24 result = response.json()
25 print(json.dumps(result, indent=2))
26else:
27 print(f"Error: {response.status_code}")
28 print(response.text)API Endpoint
https://api.segmind.com/v1/diaParameters
textrequiredstringInput text for speech generation. Use [S1], [S2] for speakers and ( ) for actions like (laughs) or (whispers). Verbal tags will be recognized, but might result in unexpected output.
"[S1] Segmind lets you build powerful image and video workflows — no code needed. \n [S2] Over 200 open and closed models. Just drag, drop, and deploy. \n [S1] Wait, seriously? Even custom models? \n [S2] Yup. Even fine-tuned ones. (chuckles) \n [S1] That's wild. I’ve spent weeks writing code for this. \n [S2] Now you can do it in minutes. Go try Segmind on the cloud. \n [S1] I'm sold. Let’s go. (laughs)"cfg_filter_top_koptionalintegerFilters audio tokens. Higher values = more diverse sounds, lower = more consistent. Values can be 10 to 100.
35Range: 10 - 100cfg_scaleoptionalnumberControls how strictly audio follows text. Higher = more accurate, lower = more natural. (1 to 5)
4Range: 1 - 5input_audiooptionalstring (uri)Audio file in: .wav .mp3 .flac, for voice cloning. Model will clone this voice style.
nullmax_new_tokensoptionalintegerControls audio length. Higher values = longer audio (≈86 tokens per second). Values can be 500 to 4096
3072Range: 500 - 4096seedoptionalintegerUse a seed for reproducible results. Leave blank for random output.
nullspeed_factoroptionalnumberControls playback speed. 1.0 = normal, below 1.0 = slower. Values can be 0.5 to 1.5
0.94Range: 0.5 - 1.5temperatureoptionalnumberControls randomness. Higher (1.4–2.0) = more variety, lower (0.1–1.0) = more consistency. Values can be 0.1 to 2.
1.3Range: 0.1 - 2top_poptionalnumberControls word variety. Higher values allow rarer words. Most users can leave this as is.
0.95Range: 0.1 - 1Response Type
Returns: Audio
Common Error Codes
The API returns standard HTTP status codes. Detailed error messages are provided in the response body.
Bad Request
Invalid parameters or request format
Unauthorized
Missing or invalid API key
Forbidden
Insufficient permissions
Not Found
Model or endpoint not found
Insufficient Credits
Not enough credits to process request
Rate Limited
Too many requests
Server Error
Internal server error
Bad Gateway
Service temporarily unavailable
Timeout
Request timed out