Higgsfield Speech 2 Video

Transform images and audio into dynamic, lip-synced videos for engaging digital content.


API

If you're looking for an API, you can choose from your desired programming language.

POST
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 import requests import base64 # Use this function to convert an image file from the filesystem to base64 def image_file_to_base64(image_path): with open(image_path, 'rb') as f: image_data = f.read() return base64.b64encode(image_data).decode('utf-8') # Use this function to fetch an image from a URL and convert it to base64 def image_url_to_base64(image_url): response = requests.get(image_url) image_data = response.content return base64.b64encode(image_data).decode('utf-8') # Use this function to convert a list of image URLs to base64 def image_urls_to_base64(image_urls): return [image_url_to_base64(url) for url in image_urls] api_key = "YOUR_API_KEY" url = "https://api.segmind.com/v1/higgsfield-speech2video" # Request payload data = { "input_image": "https://segmind-resources.s3.amazonaws.com/input/03cea2dd-87e9-41d7-9932-fbe45d4b2dd5-434b7481-1ddb-43da-a2df-10928effc900.png", "input_audio": "https://segmind-resources.s3.amazonaws.com/input/a846542c-c555-43ae-bdb0-8795ef78e0bb-8fe7c335-9e7f-4729-8230-b3eabc2af49c.wav", "prompt": "Generate an educational video with clear articulation, gentle hand gestures, and warm facial expressions appropriate for teaching content. All transitions needs to be super realistic and smooth.", "quality": "high", "enhance_prompt": False, "seed": 42, "duration": 10 } headers = {'x-api-key': api_key} response = requests.post(url, json=data, headers=headers) print(response.content) # The response is the generated image
RESPONSE
video/mp4
HTTP Response Codes
200 - OKImage Generated
401 - UnauthorizedUser authentication failed
404 - Not FoundThe requested URL does not exist
405 - Method Not AllowedThe requested HTTP method is not allowed
406 - Not AcceptableNot enough credits
500 - Server ErrorServer had some issue with processing

Attributes


input_imagestr *

Provide a URL of the image to drive animation. Use a clear, high-quality image for best results.


input_audiostr *

URL for the audio guiding avatar speech. Use articulate speech for clear lip-sync results.


promptstr *

Describe the video output scenario. Create an engaging, emotional prompt for vibrant expressions.


qualityenum:str ( default: high ) Affects Pricing

Choose video quality preference. 'High' is best for detailed videos, while 'mid' helps with speed.

Allowed values:


enhance_promptboolean ( default: 1 )

Automatically refine your prompt. Enable to achieve a balanced expression across the video.


seedint ( default: 42 )

Set a seed number for consistent outputs. Use different seeds for variation, 42 is common.

min : 1,

max : 1000000


durationenum:int ( default: 10 ) Affects Pricing

Decide video length in seconds. Choose longer durations for in-depth content.

Allowed values:

To keep track of your credit usage, you can inspect the response headers of each API call. The x-remaining-credits property will indicate the number of remaining credits in your account. Ensure you monitor this value to avoid any disruptions in your API usage.

Resources to get you started

Everything you need to know to get the most out of Higgsfield Speech 2 Video

Speak v2: How to Use Effectively

Speak v2 is a state-of-the-art speech-to-video generation model that turns your static image and audio into a lifelike, lip-synced avatar video. Follow this guide to master the key parameters, tailor the settings to different scenarios, and produce polished results every time.

1. Preparing Your Inputs

  • •input_image (URL): Use a clear, high-resolution, front-facing portrait. Well-lit and centered faces yield the most natural animations.
  • •input_audio (URL, MP3): Choose clean, well-articulated recordings. Avoid background noise or abrupt volume changes to maintain precise lip-sync.

2. Core Parameters

ParameterDescriptionRecommendations
promptDescribe your scene and desired emotional tone.“Deliver a warm corporate greeting”
qualityVideo resolution/detail.high (client-facing); mid (rapid tests)
durationVideo length in seconds: 5, 10, or 15.5s (social snippets); 15s (training)
enhance_promptAuto-refine text prompt for balanced expressions (true/false).true (complex expressions)
seedNumerical seed for reproducibility (1–1,000,000).Use 42 for baseline; vary for new look

3. Use Case Presets

  1. •Corporate Spokesperson
    • •prompt: “Introduce our new product with confident tone.”
    • •quality: high, duration: 10s, enhance_prompt: true, seed: 1234
  2. •E-Learning Instructor
    • •prompt: “Explain the concept of photosynthesis with enthusiasm.”
    • •quality: high, duration: 15s, enhance_prompt: true, seed: 5678
  3. •Social Media Influencer
    • •prompt: “Share a quick style tip in a friendly voice.”
    • •quality: mid, duration: 5s, enhance_prompt: false, seed: 42

4. Optimization Tips

  • •Prompt Detail: Include emotion and pacing cues (e.g., “calmly,” “energetic”).
  • •Image Selection: Avoid glasses reflections or extreme head tilts.
  • •Audio Quality: Record in a quiet room with a pop filter to ensure clear consonants.
  • •Batch Testing: Run brief 5s clips (mid quality) to preview different seeds before finalizing.
  • •Consistency: Lock the seed parameter when creating multi-segment videos to keep style uniform.

5. Best Practices

  • •Start with default settings (quality=high, duration=5, seed=42).
  • •Gradually tweak one parameter at a time to understand its impact.
  • •Review outputs frame-by-frame to catch subtle lip-sync mismatches.
  • •Store successful parameter sets as templates for rapid reuse.

By following this guide, you’ll harness Speak v2’s full potential—creating professional, expressive avatar videos that captivate your audience.

Other Popular Models

Discover other models you might be interested in.

Cookie settings

We use cookies to enhance your browsing experience, analyze site traffic, and personalize content. By clicking "Accept all", you consent to our use of cookies.