API
If you're looking for an API, you can choose from your desired programming language.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
import requests
import base64
# Use this function to convert an image file from the filesystem to base64
def image_file_to_base64(image_path):
with open(image_path, 'rb') as f:
image_data = f.read()
return base64.b64encode(image_data).decode('utf-8')
# Use this function to fetch an image from a URL and convert it to base64
def image_url_to_base64(image_url):
response = requests.get(image_url)
image_data = response.content
return base64.b64encode(image_data).decode('utf-8')
# Use this function to convert a list of image URLs to base64
def image_urls_to_base64(image_urls):
return [image_url_to_base64(url) for url in image_urls]
api_key = "YOUR_API_KEY"
url = "https://api.segmind.com/v1/v-express"
# Request payload
data = {
"input_image": image_url_to_base64("https://segmind-sd-models.s3.amazonaws.com/display_images/v_express/v-express-ip.jpg"), # Or use image_file_to_base64("IMAGE_PATH")
"input_audio": "https://segmind-sd-models.s3.amazonaws.com/display_images/v_express/v_express_audio.mp3",
"fps": 30,
"num_inference_steps": 20,
"guidance_scale": 2,
"retarget_strategy": "fix_face",
"base64": False
}
headers = {'x-api-key': api_key}
response = requests.post(url, json=data, headers=headers)
print(response.content) # The response is the generated image
Attributes
Input image of a talking-head.
Input audio file. Avoid special symbol in the filename as it may cause ffmpeg erros.
Output frames per second.
min : 10,
max : 60
Number of steps to generate.
min : 5,
max : 50
Scale for classifier-free guidance
min : 1,
max : 15
Retarget Strategy.
Base64 encoding of the output image.
To keep track of your credit usage, you can inspect the response headers of each API call. The x-remaining-credits property will indicate the number of remaining credits in your account. Ensure you monitor this value to avoid any disruptions in your API usage.
V-Express
The V-Express model is a groundbreaking advancement in the realm of portrait video generation. It combines deep learning techniques with progressive training and conditional dropout operations. V-Express leverages generative models to create portrait videos from single images. It takes into account pose, input image, and audio, resulting in emotionally resonant videos. V-Express addresses the challenge of balancing different control signals. Whether it’s text, audio, pose, or image reference, V-Express ensures that weaker conditions contribute effectively to the final output.
Applications of V-Express
-
Content Creation: Writers, filmmakers, and artists can harness V-Express to craft moving narratives. Imagine generating heartfelt monologues or poignant dialogues effortlessly.
-
Chatbots with Empathy: Mental health chatbots powered by V-Express can empathize with users. When words alone aren’t enough, V-Express bridges the gap.
-
Character Animation: Game designers and animators can breathe life into characters. V-Express infuses emotions into their expressions, making them relatable.
-
Music Videos: V-Express isn’t limited to faces. It can create soul-stirring music videos, syncing lyrics with visuals.
Other Popular Models
sdxl-controlnet
SDXL ControlNet gives unprecedented control over text-to-image generation. SDXL ControlNet models Introduces the concept of conditioning inputs, which provide additional information to guide the image generation process

faceswap-v2
Take a picture/gif and replace the face in it with a face of your choice. You only need one image of the desired face. No dataset, no training

sdxl-inpaint
This model is capable of generating photo-realistic images given any text input, with the extra capability of inpainting the pictures by using a mask

codeformer
CodeFormer is a robust face restoration algorithm for old photos or AI-generated faces.
