Luma Ray Image to Video
With Luma's Ray2 image-to-video, transform your static images into cinematic dynamic videos.
API
If you're looking for an API, you can choose from your desired programming language.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
import requests
import base64
# Use this function to convert an image file from the filesystem to base64
def image_file_to_base64(image_path):
with open(image_path, 'rb') as f:
image_data = f.read()
return base64.b64encode(image_data).decode('utf-8')
# Use this function to fetch an image from a URL and convert it to base64
def image_url_to_base64(image_url):
response = requests.get(image_url)
image_data = response.content
return base64.b64encode(image_data).decode('utf-8')
# Use this function to convert a list of image URLs to base64
def image_urls_to_base64(image_urls):
return [image_url_to_base64(url) for url in image_urls]
api_key = "YOUR_API_KEY"
url = "https://api.segmind.com/v1/luma-ray-img-2-video"
# Request payload
data = {
"prompt": "A couple locks eyes, gradually moving closer. Their expressions soften with affection. They lean in, sharing a gentle kiss, capturing a moment of genuine connection.",
"start_frame": "https://segmind-resources.s3.amazonaws.com/input/0bf45723-c1e2-4349-adcd-9dd48509622a-i2v_01_first_frame.jpg",
"loop": False,
"resolution": "720p",
"aspect_ratio": "1:1",
"concepts": [
"aerial",
"aerial_drone"
]
}
headers = {'x-api-key': api_key}
response = requests.post(url, json=data, headers=headers)
print(response.content) # The response is the generated image
Attributes
Prompt to render
The frame 0 of the generation
Whether to loop the video
An enumeration.
Allowed values:
An enumeration.
Allowed values:
List of camera concepts to apply
Allowed values:
To keep track of your credit usage, you can inspect the response headers of each API call. The x-remaining-credits property will indicate the number of remaining credits in your account. Ensure you monitor this value to avoid any disruptions in your API usage.
Resources to get you started
Everything you need to know to get the most out of Luma Ray Image to Video
Luma Ray2 Image-to-Video
Luma Ray2 Image-to-Video is a large-scale video generative model that produces realistic visuals with natural, coherent motion using image inputs. Ray2 is trained on Luma’s new multi-modal architecture and scaled to 10x compute of Ray1. Ray2 is capable of producing fast coherent motion, ultra-realistic details, and logical event sequences. This increases the success rate of usable generations and makes videos generated by Ray2 substantially more production-ready.
Key Features of Luma Ray2 Image-to-Video
- •
Realistic Visuals: Creates videos with high-quality, believable imagery.
- •
Coherent Motion: Generates natural and consistent movement within the video.
- •
Advanced Capabilities: Exhibits advanced capabilities as a result of being trained on Luma’s new multi-modal architecture scaled to 10x compute of Ray1.
- •
Production-Ready: Produces videos suitable for professional use due to increased success rates.
Functionality of Luma Ray2 Image-to-Video
- •
Text Instruction Understanding: Accurately interprets text instructions to generate relevant video content.
- •
Fast Coherent Motion: Produces videos with fast and coherent motion.
- •
Ultra-Realistic Details: Generates videos with ultra-realistic details.
- •
Logical Event Sequences: Creates videos with logical event sequences
Other Popular Models
Discover other models you might be interested in.
sdxl-img2img
SDXL Img2Img is used for text-guided image-to-image translation. This model uses the weights from Stable Diffusion to generate new images from an input image using StableDiffusionImg2ImgPipeline from diffusers

fooocus
Fooocus enables high-quality image generation effortlessly, combining the best of Stable Diffusion and Midjourney.

face-to-many
Turn a face into 3D, emoji, pixel art, video game, claymation or toy

sd2.1-faceswapper
Take a picture/gif and replace the face in it with a face of your choice. You only need one image of the desired face. No dataset, no training
