All Audio Models [27]
Browse All Audio AI models on Segmind. Compare pricing, latency, and capabilities.
Sam Audio Large
Isolates any described sound from mixed audio for enhanced editing and analysis.
Gemini TTS 2.5 Flash
Gemini 2.5 TTS transforms text into lifelike speech with expressive tones and consistent character voices.
Gemini TTS 2.5 Pro
Gemini 2.5 TTS delivers human-like speech synthesis with expressive emotional delivery across multiple languages.
Chatterbox Turbo TTS
Chatterbox-Turbo delivers ultra-fast, high-quality speech synthesis with human-like expressiveness for real-time applications.
TTS Elevenlabs With Timing
Transforms text into emotionally expressive audio with unparalleled realism and versatility across languages.
Elevenlabs Forced Alignment
Achieves precise audio-text synchronization with word-level timestamps for enhanced media accessibility and production.
Elevenlabs Audio Isolation
AI model expertly extracts clear speech from noisy audio and video, enhancing professional audio quality.
Elevenlabs Dialogue With Timing
Transforms text into emotionally expressive multi-speaker dialogue for immersive audio experiences.
Elevenlabs Voice Design
Generate unique synthetic voices tailored to specific attributes without needing voice samples.
Elevenlabs Voice Cloning
ElevenLabs Voice Cloning creates hyper-realistic voice replicas that express emotion and personality.
Elevenlabs Dialogue
Transforms text into immersive, emotionally expressive multi-speaker audio dialogues for various media applications.
VeenaMax TTS
VeenaMAX transforms text into expressive, real-time speech across multiple Indian languages for seamless communication.
Veena TTS
Veena transforms text into high-fidelity, expressive speech in Hindi and English for real-time applications.
Chatterbox TTS
Chatterbox transforms text into rich, natural speech with adjustable emotional expressiveness for diverse applications.
Lyria 2
Lyria 2 by Google DeepMind is an advanced model that generates high-fidelity 48kHz stereo instrumental music from text prompts or lyrics, offering precise control over tempo, key, mood, and structure.
Ace Step Music
ACE-Step generates high-quality music rapidly, enhancing the creative process for developers and artists worldwide.
Dia (Text to Speech)
Dia by Nari Labs is an advanced open-weights TTS model that brings scripts to life with natural speech, emotions, and nonverbal cues. Easily control tone, voice, and delivery. Great alternative to ElevenLabs.
Minimax Music-01
Generate up to 60 seconds of music with both accompaniment and vocals in a single pass, with vocals from lyrics and a reference track.
3B Orpheus TTS (0.1)
Orpheus TTS is an open-source text-to-speech (TTS) system powered by the Llama 3B language model, designed for high-quality and customizable speech synthesis.
Elevenlabs Transcript
Transcribe audio to accurate text in 99 languages with speaker diarization and word-level timestamps.
Meta MusicGen Medium
MusicGen: Transform text into music with AI. Create unique, high-quality audio from simple descriptions. Experience the future of music generation with this innovative AI model.
MyShell Text To Speech
MyShell's Voice Cloning and Text to Speech - Transform your audio content with realistic, personalized voices. Experience high-quality, efficient, and cost-effective audio synthesis.
Openvoice
OpenVoice is a versatile voice cloning model that supports multiple languages and offers precise tone replication, flexible style control, and zero-shot cross-lingual capabilities
ElevenLabs Dubbing
Instantly dubs audio and video into 29 languages while preserving each speaker's original voice.
Elevenlabs Sound Generation
Eleven Labs' Sound Generation API provides a robust development tool for programmatically generating audio content using artificial intelligence. This API empowers developers and creators to integrate sound generation functionalities into their applications and workflows.
Elevenlabs Speech To Speech
Eleven Labs Speech-to-Speech offers AI-powered voice conversion for content creators, media professionals, and anyone seeking to modify or translate audio speech.
Elevenlabs Text To Speech
ElevenLabs TTS transforms text into captivating, human-like speech for diverse applications.