models

Speech Generation

The best AI text-to-speech and voice synthesis models, all available on Segmind via a single pay-per-use API. This collection includes models from ElevenLabs, Google Gemini TTS, Chatterbox, and more — covering everything from natural conversational voices to expressive character speech and multilingual narration. Whether you need realistic voiceovers for video content, interactive voice agents, podcast production, e-learning narration, or audiobook creation, these models deliver production-quality audio from plain text. Key models include ElevenLabs Turbo for ultra-low latency streaming TTS, Gemini 2.5 Flash TTS and Gemini 2.5 Pro TTS for high-fidelity multilingual speech, and Chatterbox Turbo for rapid, expressive voice generation. The collection also includes voice cloning, voice design, dialogue generation with timestamps, and speech-to-speech conversion models for complete voice production workflows. On Segmind, generate professional audio with a single API call and chain TTS models with video generation or lipsync tools in Workflows to automate complete multimedia content pipelines.

All Models Image Generation Image Editing Video Models Audio Models Nano Banana Veo Models Kling Models Higgsfield Models ElevenLabs SeeDance Video

Text to Audio

Speech Generation

Seed Audio 1.0

Grok Text-to-Speech

Gemini 3.1 Flash TTS

Sam Audio Large

Gemini TTS 2.5 Flash

Gemini TTS 2.5 Pro

Chatterbox Turbo TTS

Elevenlabs Dialogue

VeenaMax TTS

Veena TTS

Chatterbox TTS

Lyria 2

Ace Step Music

Dia (Text to Speech)

3B Orpheus TTS (0.1)

Meta MusicGen Medium

MyShell Text To Speech

Openvoice

ElevenLabs Dubbing

Elevenlabs Sound Generation

Elevenlabs Text To Speech