Elevenlabs Transcript
Experience unmatched accuracy with ElevenLabs Transcript, the leading model for AI speech-to-text.
1import requests
2import json
3
4url = "https://api.segmind.com/v1/eleven-labs-transcript"
5headers = {
6 "x-api-key": "YOUR_API_KEY",
7 "Content-Type": "application/json"
8}
9
10data = {
11 "audio_url": "https://segmind-sd-models.s3.amazonaws.com/display_images/sad_talker/sad_talker_audio_input.mp3",
12 "language_code": "en",
13 "tag_audio_events": false,
14 "timestamp_granularity": "none",
15 "diarize": false
16}
17
18response = requests.post(url, headers=headers, json=data)
19
20if response.status_code == 200:
21 result = response.json()
22 print(json.dumps(result, indent=2))
23else:
24 print(f"Error: {response.status_code}")
25 print(response.text)
1import requests
2import json
3
4url = "https://api.segmind.com/v1/eleven-labs-transcript"
5headers = {
6 "x-api-key": "YOUR_API_KEY",
7 "Content-Type": "application/json"
8}
9
10data = {
11 "audio_url": "https://segmind-sd-models.s3.amazonaws.com/display_images/sad_talker/sad_talker_audio_input.mp3",
12 "language_code": "en",
13 "tag_audio_events": false,
14 "timestamp_granularity": "none",
15 "diarize": false
16}
17
18response = requests.post(url, headers=headers, json=data)
19
20if response.status_code == 200:
21 result = response.json()
22 print(json.dumps(result, indent=2))
23else:
24 print(f"Error: {response.status_code}")
25 print(response.text)
API Endpoint
https://api.segmind.com/v1/eleven-labs-transcript
Parameters
audio_url
requiredstring (uri)
Input Audio URL
"https://segmind-sd-models.s3.amazonaws.com/display_images/sad_talker/sad_talker_audio_input.mp3"
model_id
requiredstring
Model identifier
"scribe_v1"
"scribe_v1"
"scribe_v1_experimental"
diarization_threshold
optionalnumber
Diarization threshold to apply during speaker diarization. A higher value means there will be a lower chance of one speaker being diarized as two different speakers but also a higher chance of two different speakers being diarized as one speaker (less total speakers predicted). A low value means there will be a higher chance of one speaker being diarized as two different speakers but also a lower chance of two different speakers being diarized as one speaker (more total speakers predicted). Can only be set when diarize=True and num_speakers=None. Defaults to None, in which case we will choose a threshold based on the model_id (0.22 usually).
0.1
Range: 0.1 - 0.4diarize
optionalboolean
Whether to annotate which speaker is currently talking in the uploaded file.
false
language_code
optionalstring
An ISO-639-1 or ISO-639-3 language_code corresponding to the language of the audio file. Can sometimes improve transcription performance if known beforehand. Defaults to null, in this case the language is predicted automatically.
"en"
"ab"
"ace"
"ach"
"af"
"ak"
"sq"
"alz"
"am"
"ar"
"hy"
num_speakers
optionalinteger
Number of speakers in audio (for diarization)
1
Range: 1 - 32tag_audio_events
optionalboolean
Whether to tag audio events like (laughter), (footsteps), etc. in the transcription.
false
timestamp_granularity
optionalstring
Timestamp level for transcription
"none"
"none"
"word"
"character"
Response Type
Returns: Audio
Common Error Codes
The API returns standard HTTP status codes. Detailed error messages are provided in the response body.
Bad Request
Invalid parameters or request format
Unauthorized
Missing or invalid API key
Forbidden
Insufficient permissions
Not Found
Model or endpoint not found
Insufficient Credits
Not enough credits to process request
Rate Limited
Too many requests
Server Error
Internal server error
Bad Gateway
Service temporarily unavailable
Timeout
Request timed out