Hallo

Hallo lets you create portrait videos from single images.

Playground

Try the model in real time below.

loading...

Click or Drag-n-Drop

PNG, JPG or GIF, Up-to 2048 x 2048 px

hallo-audio-2.mp3 selected

You can drop your own file here



Examples

Check out what others have created with Hallo

API

If you're looking for an API, you can choose from your desired programming language.

POST
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 import requests import base64 # Use this function to convert an image file from the filesystem to base64 def image_file_to_base64(image_path): with open(image_path, 'rb') as f: image_data = f.read() return base64.b64encode(image_data).decode('utf-8') # Use this function to fetch an image from a URL and convert it to base64 def image_url_to_base64(image_url): response = requests.get(image_url) image_data = response.content return base64.b64encode(image_data).decode('utf-8') api_key = "YOUR_API_KEY" url = "https://api.segmind.com/v1/hallo" # Request payload data = { "input_image": image_url_to_base64("https://segmind-sd-models.s3.amazonaws.com/display_images/hallo/hallo-input.png"), # Or use image_file_to_base64("IMAGE_PATH") "input_audio": "https://segmind-sd-models.s3.amazonaws.com/display_images/hallo/hallo-audio-2.mp3", "pose_weight": 1.4, "face_weight": 1, "lip_weight": 1.2, "face_expand_ratio": 1.2, "base64": False } headers = {'x-api-key': api_key} response = requests.post(url, json=data, headers=headers) print(response.content) # The response is the generated image
RESPONSE
image/jpeg
HTTP Response Codes
200 - OKImage Generated
401 - UnauthorizedUser authentication failed
404 - Not FoundThe requested URL does not exist
405 - Method Not AllowedThe requested HTTP method is not allowed
406 - Not AcceptableNot enough credits
500 - Server ErrorServer had some issue with processing

Attributes


input_imageimage *

Input image of a talking-head.


input_audiostr *

Input audio file. Avoid special symbol in the filename as it may cause ffmpeg erros.


pose_weightfloat ( default: 1.4 ) Affects Pricing

Weight of the Pose in output

min : 0,

max : 10


face_weightfloat ( default: 1 ) Affects Pricing

Weight of the Face in output

min : 0,

max : 10


lip_weightfloat ( default: 1.2 ) Affects Pricing

Weight of the Lip to apply

min : 0,

max : 10


face_expand_ratiofloat ( default: 1.2 ) Affects Pricing

Face Expand Ratio

min : 0,

max : 10


base64boolean ( default: 1 )

Base64 encoding of the output image.

To keep track of your credit usage, you can inspect the response headers of each API call. The x-remaining-credits property will indicate the number of remaining credits in your account. Ensure you monitor this value to avoid any disruptions in your API usage.


Pricing

Serverless Pricing

Buy credits that can be used anywhere on Segmind

$ 0.0015 /per second

Dedicated Cloud Pricing

For enterprise costs and dedicated endpoints

$ 0.0007 - $ 0.0031 /per second
FEATURES

PixelFlow allows you to use all these features

Unlock the full potential of generative AI with Segmind. Create stunning visuals and innovative designs with total creative control. Take advantage of powerful development tools to automate processes and models, elevating your creative workflow.

Segmented Creation Workflow

Gain greater control by dividing the creative process into distinct steps, refining each phase.

Customized Output

Customize at various stages, from initial generation to final adjustments, ensuring tailored creative outputs.

Layering Different Models

Integrate and utilize multiple models simultaneously, producing complex and polished creative results.

Workflow APIs

Deploy Pixelflows as APIs quickly, without server setup, ensuring scalability and efficiency.

Hallo

Hallo is novel technique for generating animated portraits that seamlessly blend audio with facial movements. Creating lifelike portrait animations presents a unique challenge. It's not just about lip syncing – the animation needs to capture the full spectrum of human expression, from subtle eyebrow raises to head tilts, while maintaining visual consistency and realism. Existing methods often struggle to achieve this, resulting in animations that appear uncanny or unnatural. Hallo tackles this challenge with a hierarchical audio-driven visual synthesis module. This module acts like a translator, interpreting audio features (speech) and translating them into corresponding visual cues for the lips, facial expressions, and head pose.

Under the hood of Hallo

Imagine two spotlights focusing on different aspects – the audio and the visuals. The cross-attention mechanism ensures these spotlights work together, pinpointing how specific audio elements correspond to specific facial movements. The animation process leverages the power of diffusion models, which excel at generating high-quality, realistic images and videos. Maintaining temporal coherence across the animation sequence is crucial. The method incorporates this by ensuring smooth transitions between frames. A "ReferenceNet" component acts as a guide, ensuring the generated animations align with the original portrait's unique features. The method offers control over expression and pose diversity, allowing creators to tailor the animations to their specific vision.

Use cases

Hallo significantly improves the quality of generated animations, creating more natural and realistic talking portraits. Additionally, the lip synchronization and overall motion diversity are vastly enhanced. This opens doors for captivating new forms of storytelling and content creation. With the ability to animate portraits and imbue them with speech, applications range from personalized avatars to interactive learning experiences.

F.A.Q.

Frequently Asked Questions

Take creative control today and thrive.

Start building with a free account or consult an expert for your Pro or Enterprise needs. Segmind's tools empower you to transform your creative visions into reality.

Pixelflow Banner