ElevenLabs Voice Clone + Speak API
Clone any voice in seconds and use it to speak any text. This Pixelflow workflow chains two powerful ElevenLabs capabilities together into a single, callable API: voice cloning from audio samples, followed by text-to-speech synthesis using the cloned voice. The result is a high-quality MP3 of your custom voice reading whatever text you provide.
Whether you are building a personalised content platform, a branded AI assistant, or a localisation pipeline, this workflow gives you a clean two-input API you can call from anywhere.
How It Works
The workflow has two nodes running in sequence:
ElevenLabs Voice Cloning takes one or more audio samples you supply and creates an instant voice clone. The remove_background_noise parameter is enabled by default, so even noisy recordings produce clean clones. The node outputs a voice ID that is automatically passed downstream.
ElevenLabs Text To Speech receives the cloned voice ID and your input text, then synthesises speech using the ElevenLabs eleven_v3 model, the latest and highest-quality TTS model available. The output is a full-quality MP3 audio file.
The API surface is simple:
| Input | Description |
|---|---|
voice_samples | Audio file(s) to clone the voice from (upload or URL) |
text | The text you want spoken in the cloned voice |
| Output | Description |
|---|---|
audio | MP3 audio of the cloned voice speaking the input text |
Customization Guide
- •Voice samples: Supply a clear, 30-second or longer recording for best clone quality. Multiple samples improve accuracy further.
- •Text: Any length of text works. For longer content, keep sentences natural to improve prosody.
- •Model ID: The workflow defaults to
eleven_v3. You can switch toeleven_multilingual_v2oreleven_turbo_v2_5in the TTS node for different latency and quality trade-offs. - •Language Code: Set a language code in the TTS node to lock the output to a specific language, useful for multilingual deployments.
- •Speed and Stability: Fine-tune the TTS node's Speed, Stability, and Similarity Boost sliders under Advanced settings to match your desired vocal character.
Who It's For
- •Product and content teams building personalised video or audio narration
- •Developers integrating custom brand voices into apps and platforms
- •Localisation pipelines that need a consistent voice across multiple languages
- •Podcast and media producers automating voice-matched content generation
- •Enterprise teams exploring voice AI for customer service, training, or IVR applications
Segmind is an authorised channel partner of ElevenLabs. Connect with our sales team to integrate ElevenLabs API and models from 40 more providers including Google, Bytedance, Alibaba, OpenAI, Kling and more.