Elevenlabs Transcript

Experience unmatched accuracy with ElevenLabs Transcript, the leading model for AI speech-to-text.

~8.15s
~$0.002
 1import requests
 2import json
 3
 4url = "https://api.segmind.com/v1/eleven-labs-transcript"
 5headers = {
 6    "x-api-key": "YOUR_API_KEY",
 7    "Content-Type": "application/json"
 8}
 9
10data = {
11    "audio_url": "https://segmind-sd-models.s3.amazonaws.com/display_images/sad_talker/sad_talker_audio_input.mp3",
12    "language_code": "en",
13    "tag_audio_events": false,
14    "timestamp_granularity": "none",
15    "diarize": false
16}
17
18response = requests.post(url, headers=headers, json=data)
19
20if response.status_code == 200:
21    result = response.json()
22    print(json.dumps(result, indent=2))
23else:
24    print(f"Error: {response.status_code}")
25    print(response.text)

API Endpoint

POSThttps://api.segmind.com/v1/eleven-labs-transcript

Parameters

audio_urlrequired
string (uri)

Input Audio URL

Default: "https://segmind-sd-models.s3.amazonaws.com/display_images/sad_talker/sad_talker_audio_input.mp3"
model_idrequired
string

Model identifier

Default: "scribe_v1"
Allowed values :
"scribe_v1""scribe_v1_experimental"
diarization_thresholdoptional
number

Diarization threshold to apply during speaker diarization. A higher value means there will be a lower chance of one speaker being diarized as two different speakers but also a higher chance of two different speakers being diarized as one speaker (less total speakers predicted). A low value means there will be a higher chance of one speaker being diarized as two different speakers but also a lower chance of two different speakers being diarized as one speaker (more total speakers predicted). Can only be set when diarize=True and num_speakers=None. Defaults to None, in which case we will choose a threshold based on the model_id (0.22 usually).

Default: 0.1Range: 0.1 - 0.4
diarizeoptional
boolean

Whether to annotate which speaker is currently talking in the uploaded file.

Default: false
language_codeoptional
string

An ISO-639-1 or ISO-639-3 language_code corresponding to the language of the audio file. Can sometimes improve transcription performance if known beforehand. Defaults to null, in this case the language is predicted automatically.

Default: "en"
Allowed values (194 total):
Abkhazian"ab"
Achinese"ace"
Acoli"ach"
Afrikaans"af"
Akan"ak"
Albanian"sq"
Alur"alz"
Amharic"am"
Arabic"ar"
Armenian"hy"
+184 more
num_speakersoptional
integer

Number of speakers in audio (for diarization)

Default: 1Range: 1 - 32
tag_audio_eventsoptional
boolean

Whether to tag audio events like (laughter), (footsteps), etc. in the transcription.

Default: false
timestamp_granularityoptional
string

Timestamp level for transcription

Default: "none"
Allowed values :
"none""word""character"

Response Type

Returns: Audio

Common Error Codes

The API returns standard HTTP status codes. Detailed error messages are provided in the response body.

400

Bad Request

Invalid parameters or request format

401

Unauthorized

Missing or invalid API key

403

Forbidden

Insufficient permissions

404

Not Found

Model or endpoint not found

406

Insufficient Credits

Not enough credits to process request

429

Rate Limited

Too many requests

500

Server Error

Internal server error

502

Bad Gateway

Service temporarily unavailable

504

Timeout

Request timed out