Dia (Text to Speech)

Dia by Nari Labs is an advanced open-weights TTS model that brings scripts to life with natural speech, emotions, and nonverbal cues. Easily control tone, voice, and delivery. Great alternative to ElevenLabs.

~91.26s
~$0.101
 1import requests
 2import json
 3
 4url = "https://api.segmind.com/v1/dia"
 5headers = {
 6    "x-api-key": "YOUR_API_KEY",
 7    "Content-Type": "application/json"
 8}
 9
10data = {
11    "text": "[S1] Segmind lets you build powerful image and video workflows — no code needed. \n [S2] Over 200 open and closed models. Just drag, drop, and deploy. \n [S1] Wait, seriously? Even custom models? \n [S2] Yup. Even fine-tuned ones. (chuckles) \n [S1] That's wild. I’ve spent weeks writing code for this. \n [S2] Now you can do it in minutes. Go try Segmind on the cloud. \n  [S1] I'm sold. Let’s go. (laughs)",
12    "top_p": 0.95,
13    "cfg_scale": 4,
14    "temperature": 1.3,
15    "input_audio": "https://segmind-resources.s3.amazonaws.com/input/7d11a77b-366c-406e-b6af-eefaec4f8574-fa3123db-56cf-4212-9cc9-ebc49e692202-04fc4d16-25df-44cb-9b7c-37aa7543e6d2.wav",
16    "speed_factor": 0.94,
17    "max_new_tokens": 3072,
18    "cfg_filter_top_k": 35
19}
20
21response = requests.post(url, headers=headers, json=data)
22
23if response.status_code == 200:
24    result = response.json()
25    print(json.dumps(result, indent=2))
26else:
27    print(f"Error: {response.status_code}")
28    print(response.text)

API Endpoint

POSThttps://api.segmind.com/v1/dia

Parameters

textrequired
string

Input text for speech generation. Use [S1], [S2] for speakers and ( ) for actions like (laughs) or (whispers). Verbal tags will be recognized, but might result in unexpected output.

Default: "[S1] Segmind lets you build powerful image and video workflows — no code needed. \n [S2] Over 200 open and closed models. Just drag, drop, and deploy. \n [S1] Wait, seriously? Even custom models? \n [S2] Yup. Even fine-tuned ones. (chuckles) \n [S1] That's wild. I’ve spent weeks writing code for this. \n [S2] Now you can do it in minutes. Go try Segmind on the cloud. \n [S1] I'm sold. Let’s go. (laughs)"
cfg_filter_top_koptional
integer

Filters audio tokens. Higher values = more diverse sounds, lower = more consistent. Values can be 10 to 100.

Default: 35Range: 10 - 100
cfg_scaleoptional
number

Controls how strictly audio follows text. Higher = more accurate, lower = more natural. (1 to 5)

Default: 4Range: 1 - 5
input_audiooptional
string (uri)

Audio file in: .wav .mp3 .flac, for voice cloning. Model will clone this voice style.

Default: null
max_new_tokensoptional
integer

Controls audio length. Higher values = longer audio (≈86 tokens per second). Values can be 500 to 4096

Default: 3072Range: 500 - 4096
seedoptional
integer

Use a seed for reproducible results. Leave blank for random output.

Default: null
speed_factoroptional
number

Controls playback speed. 1.0 = normal, below 1.0 = slower. Values can be 0.5 to 1.5

Default: 0.94Range: 0.5 - 1.5
temperatureoptional
number

Controls randomness. Higher (1.4–2.0) = more variety, lower (0.1–1.0) = more consistency. Values can be 0.1 to 2.

Default: 1.3Range: 0.1 - 2
top_poptional
number

Controls word variety. Higher values allow rarer words. Most users can leave this as is.

Default: 0.95Range: 0.1 - 1

Response Type

Returns: Audio

Common Error Codes

The API returns standard HTTP status codes. Detailed error messages are provided in the response body.

400

Bad Request

Invalid parameters or request format

401

Unauthorized

Missing or invalid API key

403

Forbidden

Insufficient permissions

404

Not Found

Model or endpoint not found

406

Insufficient Credits

Not enough credits to process request

429

Rate Limited

Too many requests

500

Server Error

Internal server error

502

Bad Gateway

Service temporarily unavailable

504

Timeout

Request timed out