All Audio Models [27]
Browse All Audio AI models on Segmind. Compare pricing, latency, and capabilities.
Sam Audio Large
Isolate any described sound from mixed audio tracks.
Gemini TTS 2.5 Flash
Fast, lifelike text-to-speech with expressive emotional tones.
Gemini TTS 2.5 Pro
Human-like speech synthesis with rich expressive emotional depth.
Chatterbox Turbo TTS
Ultra-fast, human-quality TTS with emotional expression.
TTS Elevenlabs With Timing
Emotionally expressive TTS with word-level timestamp output.
Elevenlabs Forced Alignment
Precise audio-text synchronization with word-level timestamps.
Elevenlabs Audio Isolation
Extract clear speech from noisy audio and video.
Elevenlabs Dialogue With Timing
Multi-speaker dialogue with expressive timestamps included.
Elevenlabs Voice Design
Generate unique synthetic voices without audio samples.
Elevenlabs Voice Cloning
Hyper-realistic voice cloning from short audio samples.
Elevenlabs Dialogue
Immersive, emotionally expressive multi-speaker audio dialogue.
VeenaMax TTS
VeenaMAX transforms text into expressive, real-time speech across multiple Indian languages for seamless communication.
Veena TTS
Veena transforms text into high-fidelity, expressive speech in Hindi and English for real-time applications.
Chatterbox TTS
Chatterbox transforms text into rich, natural speech with adjustable emotional expressiveness for diverse applications.
Lyria 2
Lyria 2 by Google DeepMind is an advanced model that generates high-fidelity 48kHz stereo instrumental music from text prompts or lyrics, offering precise control over tempo, key, mood, and structure.
Ace Step Music
ACE-Step generates high-quality music rapidly, enhancing the creative process for developers and artists worldwide.
Dia (Text to Speech)
Dia by Nari Labs is an advanced open-weights TTS model that brings scripts to life with natural speech, emotions, and nonverbal cues. Easily control tone, voice, and delivery. Great alternative to ElevenLabs.
Minimax Music-01
Generate up to 60 seconds of music with both accompaniment and vocals in a single pass, with vocals from lyrics and a reference track.
3B Orpheus TTS (0.1)
Orpheus TTS is an open-source text-to-speech (TTS) system powered by the Llama 3B language model, designed for high-quality and customizable speech synthesis.
Elevenlabs Transcript
Transcribe audio to accurate text in 99 languages with speaker diarization and word-level timestamps.
Meta MusicGen Medium
MusicGen: Transform text into music with AI. Create unique, high-quality audio from simple descriptions. Experience the future of music generation with this innovative AI model.
MyShell Text To Speech
MyShell's Voice Cloning and Text to Speech - Transform your audio content with realistic, personalized voices. Experience high-quality, efficient, and cost-effective audio synthesis.
Openvoice
OpenVoice is a versatile voice cloning model that supports multiple languages and offers precise tone replication, flexible style control, and zero-shot cross-lingual capabilities
ElevenLabs Dubbing
Instantly dubs audio and video into 29 languages while preserving each speaker's original voice.
Elevenlabs Sound Generation
Eleven Labs' Sound Generation API provides a robust development tool for programmatically generating audio content using artificial intelligence. This API empowers developers and creators to integrate sound generation functionalities into their applications and workflows.
Elevenlabs Speech To Speech
Eleven Labs Speech-to-Speech offers AI-powered voice conversion for content creators, media professionals, and anyone seeking to modify or translate audio speech.
Elevenlabs Text To Speech
ElevenLabs TTS transforms text into captivating, human-like speech for diverse applications.