19
models
Speech Generation
The best AI text-to-speech and voice synthesis models, all available on Segmind via a single pay-per-use API. This collection includes models from ElevenLabs, Google Gemini TTS, Chatterbox, and more — covering everything from natural conversational voices to expressive character speech and multilingual narration. Whether you need realistic voiceovers for video content, interactive voice agents, podcast production, e-learning narration, or audiobook creation, these models deliver production-quality audio from plain text. Key models include ElevenLabs Turbo for ultra-low latency streaming TTS, Gemini 2.5 Flash TTS and Gemini 2.5 Pro TTS for high-fidelity multilingual speech, and Chatterbox Turbo for rapid, expressive voice generation. The collection also includes voice cloning, voice design, dialogue generation with timestamps, and speech-to-speech conversion models for complete voice production workflows. On Segmind, generate professional audio with a single API call and chain TTS models with video generation or lipsync tools in Workflows to automate complete multimedia content pipelines.
Sam Audio Large
Isolates any described sound from mixed audio for enhanced editing and analysis.
Gemini TTS 2.5 Flash
Gemini 2.5 TTS transforms text into lifelike speech with expressive tones and consistent character voices.
Gemini TTS 2.5 Pro
Gemini 2.5 TTS delivers human-like speech synthesis with expressive emotional delivery across multiple languages.
Chatterbox Turbo TTS
Chatterbox-Turbo delivers ultra-fast, high-quality speech synthesis with human-like expressiveness for real-time applications.
Elevenlabs Dialogue
Transforms text into immersive, emotionally expressive multi-speaker audio dialogues for various media applications.
VeenaMax TTS
VeenaMAX transforms text into expressive, real-time speech across multiple Indian languages for seamless communication.
Veena TTS
Veena transforms text into high-fidelity, expressive speech in Hindi and English for real-time applications.
Chatterbox TTS
Chatterbox transforms text into rich, natural speech with adjustable emotional expressiveness for diverse applications.
Lyria 2
Lyria 2 by Google DeepMind is an advanced model that generates high-fidelity 48kHz stereo instrumental music from text prompts or lyrics, offering precise control over tempo, key, mood, and structure.
Ace Step Music
ACE-Step generates high-quality music rapidly, enhancing the creative process for developers and artists worldwide.
Dia (Text to Speech)
Dia by Nari Labs is an advanced open-weights TTS model that brings scripts to life with natural speech, emotions, and nonverbal cues. Easily control tone, voice, and delivery. Great alternative to ElevenLabs.
Minimax Music-01
Generate up to 60 seconds of music with both accompaniment and vocals in a single pass, with vocals from lyrics and a reference track.
3B Orpheus TTS (0.1)
Orpheus TTS is an open-source text-to-speech (TTS) system powered by the Llama 3B language model, designed for high-quality and customizable speech synthesis.
Meta MusicGen Medium
MusicGen: Transform text into music with AI. Create unique, high-quality audio from simple descriptions. Experience the future of music generation with this innovative AI model.
MyShell Text To Speech
MyShell's Voice Cloning and Text to Speech - Transform your audio content with realistic, personalized voices. Experience high-quality, efficient, and cost-effective audio synthesis.
Openvoice
OpenVoice is a versatile voice cloning model that supports multiple languages and offers precise tone replication, flexible style control, and zero-shot cross-lingual capabilities
ElevenLabs Dubbing
Instantly dubs audio and video into 29 languages while preserving each speaker's original voice.
Elevenlabs Sound Generation
Eleven Labs' Sound Generation API provides a robust development tool for programmatically generating audio content using artificial intelligence. This API empowers developers and creators to integrate sound generation functionalities into their applications and workflows.
Elevenlabs Text To Speech
ElevenLabs TTS transforms text into captivating, human-like speech for diverse applications.