Elevenlabs Transcript

Experience unmatched accuracy with ElevenLabs Transcript, the leading model for AI speech-to-text.


API

If you're looking for an API, you can choose from your desired programming language.

POST
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 import requests import base64 # Use this function to convert an image file from the filesystem to base64 def image_file_to_base64(image_path): with open(image_path, 'rb') as f: image_data = f.read() return base64.b64encode(image_data).decode('utf-8') # Use this function to fetch an image from a URL and convert it to base64 def image_url_to_base64(image_url): response = requests.get(image_url) image_data = response.content return base64.b64encode(image_data).decode('utf-8') # Use this function to convert a list of image URLs to base64 def image_urls_to_base64(image_urls): return [image_url_to_base64(url) for url in image_urls] api_key = "YOUR_API_KEY" url = "https://api.segmind.com/v1/eleven-labs-transcript" # Request payload data = { "audio_url": "https://segmind-sd-models.s3.amazonaws.com/display_images/sad_talker/sad_talker_audio_input.mp3", "language_code": "en", "tag_audio_events": False, "timestamp_granularity": "none", "diarize": False } headers = {'x-api-key': api_key} response = requests.post(url, json=data, headers=headers) print(response.content) # The response is the generated image
RESPONSE
image/jpeg
HTTP Response Codes
200 - OKImage Generated
401 - UnauthorizedUser authentication failed
404 - Not FoundThe requested URL does not exist
405 - Method Not AllowedThe requested HTTP method is not allowed
406 - Not AcceptableNot enough credits
500 - Server ErrorServer had some issue with processing

Attributes


audio_urlstr *

Input Audio URL


language_codeenum:str ( default: en )

An ISO-639-1 or ISO-639-3 language_code corresponding to the language of the audio file. Can sometimes improve transcription performance if known beforehand. Defaults to null, in this case the language is predicted automatically.

Allowed values:


model_idenum:str *

Model identifier

Allowed values:


tag_audio_eventsboolean ( default: 1 )

Whether to tag audio events like (laughter), (footsteps), etc. in the transcription.


num_speakersint ( default: 1 )

Number of speakers in audio (for diarization)

min : 1,

max : 32


timestamp_granularityenum:str ( default: none )

Timestamp level for transcription

Allowed values:


diarizeboolean ( default: 1 )

Whether to annotate which speaker is currently talking in the uploaded file.


diarization_thresholdfloat ( default: 0.1 )

Diarization threshold to apply during speaker diarization. A higher value means there will be a lower chance of one speaker being diarized as two different speakers but also a higher chance of two different speakers being diarized as one speaker (less total speakers predicted). A low value means there will be a higher chance of one speaker being diarized as two different speakers but also a lower chance of two different speakers being diarized as one speaker (more total speakers predicted). Can only be set when diarize=True and num_speakers=None. Defaults to None, in which case we will choose a threshold based on the model_id (0.22 usually).

min : 0.1,

max : 0.4

To keep track of your credit usage, you can inspect the response headers of each API call. The x-remaining-credits property will indicate the number of remaining credits in your account. Ensure you monitor this value to avoid any disruptions in your API usage.

Resources to get you started

Everything you need to know to get the most out of Elevenlabs Transcript

ElevenLabs Transcript

ElevenLabs Transcript is the premier AI transcription for professionals needing flawless audio to text. With industry-leading accuracy, elevenLabs transcript is perfect for films, podcasts, meetings, and medical dictations. Experience unmatched precision and seamless integration with this advanced ASR (automatic speech recognition) technology.

Key Features

  • Industry-Leading Accuracy - Achieve the lowest word error rate for perfectly accurate English transcription, outperforming Google Gemini and OpenAI Whisper in testing.

  • Smart Speaker Diarization - Intuitively distinguishes and labels every speaker in any conversation for clear, organized transcripts.

  • Precise Word-Level Timestamps - Capture the exact moment each word is spoken, enabling seamless subtitle syncing and interactive audio experiences.

  • Dynamic Audio Tagging - Enriches your English transcripts with the full context of your audio by tagging every sound event, from laughter to footsteps.

  • Global Language Support - Break language barriers with support for English and 98 other language

Use Cases

  • Media & Entertainment - Generate accurate subtitles and closed captions for films and videos with precise timestamps.

  • Business Meetings - Get clear, organized transcripts of meetings with speaker diarization, perfect for record-keeping and follow-up actions.

  • Medical Dictations - Transcribe medical dictations with industry-leading accuracy, ensuring precision in healthcare documentation.

  • Podcast Production - Transform audio content into text for show notes, scripts, and enhanced accessibility.

Other Popular Models

Discover other models you might be interested in.

Cookie settings

We use cookies to enhance your browsing experience, analyze site traffic, and personalize content. By clicking "Accept all", you consent to our use of cookies.