Higgsfield Speech 2 Video Serverless API
Transform images and audio into dynamic, lip-synced videos for engaging digital content.
POST /v2/higgsfield-speech2video · submit + poll 1# pip install "segmind>=1.1.0"
2# export SEGMIND_API_KEY="YOUR_API_KEY"
3from segmind import SegmindClient, InferenceFailed, InferenceTimeout
4
5# Async (v2) — recommended for long-running / video models.
6# run() blocks up to 600s; submit_async + job.wait(timeout=...) sets a longer
7# deadline and keeps the request_id so you can re-poll later.
8client = SegmindClient() # reads SEGMIND_API_KEY
9payload = {
10 "input_image": "https://segmind-resources.s3.amazonaws.com/input/03cea2dd-87e9-41d7-9932-fbe45d4b2dd5-434b7481-1ddb-43da-a2df-10928effc900.png",
11 "input_audio": "https://segmind-resources.s3.amazonaws.com/input/a846542c-c555-43ae-bdb0-8795ef78e0bb-8fe7c335-9e7f-4729-8230-b3eabc2af49c.wav",
12 "prompt": "Generate an educational video with clear articulation, gentle hand gestures, and warm facial expressions appropriate for teaching content. All transitions needs to be super realistic and smooth.",
13 "quality": "high",
14 "enhance_prompt": False,
15 "seed": 42,
16 "duration": 10,
17}
18job = client.submit_async("higgsfield-speech2video", **payload)
19print(job.request_id) # available immediately
20try:
21 result = job.wait(timeout=900, interval=2.0)
22 print(result["status"]) # COMPLETED
23 print(result.get("output")) # model output (e.g. video URL)
24except InferenceTimeout as e:
25 print("still running:", e.request_id) # re-poll later with this id
26except InferenceFailed as e:
27 print("failed:", e.detail)
28
29# Fast models (<=600s) can use the one-liner instead:
30# result = segmind.run("higgsfield-speech2video", **payload) 1# pip install "segmind>=1.1.0"
2# export SEGMIND_API_KEY="YOUR_API_KEY"
3from segmind import SegmindClient, InferenceFailed, InferenceTimeout
4
5# Async (v2) — recommended for long-running / video models.
6# run() blocks up to 600s; submit_async + job.wait(timeout=...) sets a longer
7# deadline and keeps the request_id so you can re-poll later.
8client = SegmindClient() # reads SEGMIND_API_KEY
9payload = {
10 "input_image": "https://segmind-resources.s3.amazonaws.com/input/03cea2dd-87e9-41d7-9932-fbe45d4b2dd5-434b7481-1ddb-43da-a2df-10928effc900.png",
11 "input_audio": "https://segmind-resources.s3.amazonaws.com/input/a846542c-c555-43ae-bdb0-8795ef78e0bb-8fe7c335-9e7f-4729-8230-b3eabc2af49c.wav",
12 "prompt": "Generate an educational video with clear articulation, gentle hand gestures, and warm facial expressions appropriate for teaching content. All transitions needs to be super realistic and smooth.",
13 "quality": "high",
14 "enhance_prompt": False,
15 "seed": 42,
16 "duration": 10,
17}
18job = client.submit_async("higgsfield-speech2video", **payload)
19print(job.request_id) # available immediately
20try:
21 result = job.wait(timeout=900, interval=2.0)
22 print(result["status"]) # COMPLETED
23 print(result.get("output")) # model output (e.g. video URL)
24except InferenceTimeout as e:
25 print("still running:", e.request_id) # re-poll later with this id
26except InferenceFailed as e:
27 print("failed:", e.detail)
28
29# Fast models (<=600s) can use the one-liner instead:
30# result = segmind.run("higgsfield-speech2video", **payload)API Endpoint
https://api.segmind.com/v1/higgsfield-speech2videoParameters
input_audiorequiredstring (uri)URL for the audio guiding avatar speech. Use articulate speech for clear lip-sync results.
input_imagerequiredstring (uri)Provide a URL of the image to drive animation. Use a clear, high-quality image for best results.
"https://segmind-resources.s3.amazonaws.com/input/03cea2dd-87e9-41d7-9932-fbe45d4b2dd5-434b7481-1ddb-43da-a2df-10928effc900.png"promptrequiredstringDescribe the video output scenario. Create an engaging, emotional prompt for vibrant expressions.
"Generate a captivating avatar video with fluent dialogue and lively facial gestures."durationoptionalintegerDecide video length in seconds. Choose longer durations for in-depth content.
1051015enhance_promptoptionalbooleanAutomatically refine your prompt. Enable to achieve a balanced expression across the video.
falsequalityoptionalstringChoose video quality preference. 'High' is best for detailed videos, while 'mid' helps with speed.
"high""high""mid"seedoptionalintegerSet a seed number for consistent outputs. Use different seeds for variation, 42 is common.
42Range: 1 - 1000000Response Type
Returns: Video
Asynchronous requests (v2)
Use Async for video, long-running (>~60s), or high-concurrency workloads; Sync is simplest for fast image & LLM calls. Async submits a request and you poll it to completion.
- 1
POST /v2/higgsfield-speech2videoSubmit — returns request_id, status_url, response_url
- 2
GET /v2/requests/{id}/statusPoll — until COMPLETED or FAILED
- 3
GET /v2/requests/{id}Result — final response body
Status states
- A FAILED request is served as HTTP 422 — the body still carries the error detail.
- An unknown or expired request_id returns HTTP 404.
- Results are retained for 1 hour, then expire.
- Content / RAI blocks surface as FAILED, not a separate state.
- Track completion by polling the status endpoint.
Common Error Codes
The API returns standard HTTP status codes. Detailed error messages are provided in the response body.
Bad Request
Invalid parameters or request format
Unauthorized
Missing or invalid API key
Forbidden
Insufficient permissions
Not Found
Model or endpoint not found
Insufficient Credits
Not enough credits to process request
Rate Limited
Too many requests
Server Error
Internal server error
Bad Gateway
Service temporarily unavailable
Timeout
Request timed out