V Express
V-Express lets you create portrait videos from single images.
API
If you're looking for an API, you can choose from your desired programming language.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
import requests
import base64
# Use this function to convert an image file from the filesystem to base64
def image_file_to_base64(image_path):
with open(image_path, 'rb') as f:
image_data = f.read()
return base64.b64encode(image_data).decode('utf-8')
# Use this function to fetch an image from a URL and convert it to base64
def image_url_to_base64(image_url):
response = requests.get(image_url)
image_data = response.content
return base64.b64encode(image_data).decode('utf-8')
# Use this function to convert a list of image URLs to base64
def image_urls_to_base64(image_urls):
return [image_url_to_base64(url) for url in image_urls]
api_key = "YOUR_API_KEY"
url = "https://api.segmind.com/v1/v-express"
# Request payload
data = {
"input_image": image_url_to_base64("https://segmind-sd-models.s3.amazonaws.com/display_images/v_express/v-express-ip.jpg"), # Or use image_file_to_base64("IMAGE_PATH")
"input_audio": "https://segmind-sd-models.s3.amazonaws.com/display_images/v_express/v_express_audio.mp3",
"fps": 30,
"num_inference_steps": 20,
"guidance_scale": 2,
"retarget_strategy": "fix_face",
"base64": False
}
headers = {'x-api-key': api_key}
response = requests.post(url, json=data, headers=headers)
print(response.content) # The response is the generated image
Attributes
Input image of a talking-head.
Input audio file. Avoid special symbol in the filename as it may cause ffmpeg erros.
Output frames per second.
min : 10,
max : 60
Number of steps to generate.
min : 5,
max : 50
Scale for classifier-free guidance
min : 1,
max : 15
Retarget Strategy.
Base64 encoding of the output image.
To keep track of your credit usage, you can inspect the response headers of each API call. The x-remaining-credits property will indicate the number of remaining credits in your account. Ensure you monitor this value to avoid any disruptions in your API usage.
Resources to get you started
Everything you need to know to get the most out of V Express
V-Express
The V-Express model is a groundbreaking advancement in the realm of portrait video generation. It combines deep learning techniques with progressive training and conditional dropout operations. V-Express leverages generative models to create portrait videos from single images. It takes into account pose, input image, and audio, resulting in emotionally resonant videos. V-Express addresses the challenge of balancing different control signals. Whether it’s text, audio, pose, or image reference, V-Express ensures that weaker conditions contribute effectively to the final output.
Applications of V-Express
- •
Content Creation: Writers, filmmakers, and artists can harness V-Express to craft moving narratives. Imagine generating heartfelt monologues or poignant dialogues effortlessly.
- •
Chatbots with Empathy: Mental health chatbots powered by V-Express can empathize with users. When words alone aren’t enough, V-Express bridges the gap.
- •
Character Animation: Game designers and animators can breathe life into characters. V-Express infuses emotions into their expressions, making them relatable.
- •
Music Videos: V-Express isn’t limited to faces. It can create soul-stirring music videos, syncing lyrics with visuals.
Other Popular Models
Discover other models you might be interested in.
sdxl-controlnet
SDXL ControlNet gives unprecedented control over text-to-image generation. SDXL ControlNet models Introduces the concept of conditioning inputs, which provide additional information to guide the image generation process

faceswap-v2
Take a picture/gif and replace the face in it with a face of your choice. You only need one image of the desired face. No dataset, no training

sdxl-inpaint
This model is capable of generating photo-realistic images given any text input, with the extra capability of inpainting the pictures by using a mask

codeformer
CodeFormer is a robust face restoration algorithm for old photos or AI-generated faces.
