20

models

Speech Generation

The best AI text-to-speech and voice synthesis models, all available on Segmind via a single pay-per-use API. This collection includes models from ElevenLabs, Google Gemini TTS, Chatterbox, and more — covering everything from natural conversational voices to expressive character speech and multilingual narration. Whether you need realistic voiceovers for video content, interactive voice agents, podcast production, e-learning narration, or audiobook creation, these models deliver production-quality audio from plain text. Key models include ElevenLabs Turbo for ultra-low latency streaming TTS, Gemini 2.5 Flash TTS and Gemini 2.5 Pro TTS for high-fidelity multilingual speech, and Chatterbox Turbo for rapid, expressive voice generation. The collection also includes voice cloning, voice design, dialogue generation with timestamps, and speech-to-speech conversion models for complete voice production workflows. On Segmind, generate professional audio with a single API call and chain TTS models with video generation or lipsync tools in Workflows to automate complete multimedia content pipelines.

Text to Audio
Text To Audio

Gemini 3.1 Flash TTS

0.0s
Text To Audio

Sam Audio Large

12.9s
Text To Audio

Gemini TTS 2.5 Flash

17.6s
Text To Audio

Gemini TTS 2.5 Pro

32.6s
Text To Audio

Chatterbox Turbo TTS

13.4s
Text To Audio

Elevenlabs Dialogue

6.8s
Text To Audio

VeenaMax TTS

13.0s
Text To Audio

Veena TTS

45.2s
Text To Audio

Chatterbox TTS

18.0s
Text To Audio

Lyria 2

27.2s
Text To Audio

Ace Step Music

11.8s
Text To Audio

Dia (Text to Speech)

89.5s
Text To Audio

Minimax Music-01

44.3s
Text To Audio

3B Orpheus TTS (0.1)

117.6s
Text To Audio

Meta MusicGen Medium

22.3s
Text To Audio

MyShell Text To Speech

7.0s
Text To Audio

Openvoice

10.2s
Text To Audio

ElevenLabs Dubbing

92.7s
Text To Audio

Elevenlabs Sound Generation

7.8s
Text To Audio

Elevenlabs Text To Speech

12.3s