ElevenLabs Voice Clone + Speak API

Clone any voice in seconds and use it to speak any text. This Pixelflow workflow chains two powerful ElevenLabs capabilities together into a single, callable API: voice cloning from audio samples, followed by text-to-speech synthesis using the cloned voice. The result is a high-quality MP3 of your custom voice reading whatever text you provide.

Whether you are building a personalised content platform, a branded AI assistant, or a localisation pipeline, this workflow gives you a clean two-input API you can call from anywhere.

How It Works

The workflow has two nodes running in sequence:

ElevenLabs Voice Cloning takes one or more audio samples you supply and creates an instant voice clone. The remove_background_noise parameter is enabled by default, so even noisy recordings produce clean clones. The node outputs a voice ID that is automatically passed downstream.

ElevenLabs Text To Speech receives the cloned voice ID and your input text, then synthesises speech using the ElevenLabs eleven_v3 model, the latest and highest-quality TTS model available. The output is a full-quality MP3 audio file.

The API surface is simple:

Input	Description
`voice_samples`	Audio file(s) to clone the voice from (upload or URL)
`text`	The text you want spoken in the cloned voice

Output	Description
`audio`	MP3 audio of the cloned voice speaking the input text

Customization Guide

•Voice samples: Supply a clear, 30-second or longer recording for best clone quality. Multiple samples improve accuracy further.
•Text: Any length of text works. For longer content, keep sentences natural to improve prosody.
•Model ID: The workflow defaults to eleven_v3. You can switch to eleven_multilingual_v2 or eleven_turbo_v2_5 in the TTS node for different latency and quality trade-offs.
•Language Code: Set a language code in the TTS node to lock the output to a specific language, useful for multilingual deployments.
•Speed and Stability: Fine-tune the TTS node's Speed, Stability, and Similarity Boost sliders under Advanced settings to match your desired vocal character.

Who It's For

•Product and content teams building personalised video or audio narration
•Developers integrating custom brand voices into apps and platforms
•Localisation pipelines that need a consistent voice across multiple languages
•Podcast and media producers automating voice-matched content generation
•Enterprise teams exploring voice AI for customer service, training, or IVR applications

Segmind is an authorised channel partner of ElevenLabs. Connect with our sales team to integrate ElevenLabs API and models from 40 more providers including Google, Bytedance, Alibaba, OpenAI, Kling and more.

ElevenLabs Voice Clone + Speak API

Inputs

More Like This

Luxury Product UGC Reel Creation with Seedance 2.0 and GPT Image 2

Exploded View Video Creation with Seedance 2.0 and GPT Image 2

UGC Tutorial Video Creation with Seedance 2.0

Luxury Chocolate Commercial Creation with Seedance 2.0

About ElevenLabs Voice Clone + Speak API