Seedance 2.0

Cinematic AI videos with native audio and multi-shot narratives.

~163.10s

~$0.261

Playground

API Pricing

~163.10s

~$0.261

Inputs

Prompt*

Text describing the video. Use Shot 1:, Shot 2: for multi-shot sequences. Reference uploaded assets as 'image 1', 'video 1', 'audio 1' in your prompt.

Generate a 16:9 aspect ratio video featuring top attractive Asian male and female leads, created according to the following shot script:
Shot 1 | 0s–1s
Medium shot, static camera. Camera is stable, no movement. The man and woman sit inside a red convertible, looking forward in profile, expressions cold and composed. No effects. Dark tones dominate, atmosphere heavy and somber.
Shot 2 | 1s–2s
Wide/full shot, static camera. Camera stable, no movement. The woman sits on the car door, the man leans against the car body removing his jacket, gesture bold and dramatic. Hard cut transition. Red and black contrast, full of street rebellious energy.
Shot 3 | 2s–3s
Close shot, static camera. No movement. The male lead wears sunglasses and stares directly into the camera; the female lead lightly brushes her hair aside. Hard cut transition. Red-black tones, mood cold and fierce.
Shot 4 | 3s–4s
Extreme close-up, static camera. Fixed position. The car's rearview mirror reflects the female lead's face, her gaze distant and dreamy. Mirror reflection effect. Cool color tones, emphasizing loneliness.
Shot 5 | 4s–5s
Medium shot, low-angle (upward tilt), static camera. Fixed position. The male lead in a white tank top smokes a cigarette, tilts his head looking into the distance, expression indifferent. No effects. Cool gray background, mood listless and detached.
Shot 6 | 5s–7s
Medium shot, side tracking shot. Camera moves laterally at the same speed as the vehicle, conveying a sense of motion. Both are on a night drive, the woman turns her head to look at the man, expression natural and relaxed. Background dynamic blur. Night scenery with neon lights, mood relaxed and easy.
Shot 7 | 7s–9s
Medium shot, front-facing tracking shot. Camera retreats at vehicle speed, strong sense of forward momentum. The female lead steers with one hand and makes a gesture; the male lead lounges casually in his seat. Background car light bokeh blur. Rich colors, mood free and uninhibited.
Shot 8 | 9s–14s
Close shot, static camera. Fixed position. The female lead tilts her head back against the seat; the male lead wears sunglasses and gazes ahead, expression distant and detached. Subtitles appear at the bottom of the frame. Red-black palette, mood returns to solitude and exhaustion.

First frame uRL

Starting frame image URL for image-to-video. Animates outward from this reference. Note: images with real human faces are blocked by ByteDance content policy. Cannot be used together with reference_images.

Drag & drop file or click to browse

Supports *

Last frame uRL

Ending frame; requires first_frame_url. Guides transitions between two frames.

Drag & drop file or click to browse

Supports *

Reference images

Up to 9 reference images for character/style consistency. Cite as 'image 1', 'image 2' in your prompt. Cannot be used together with first_frame_url.

Drag & drop image or click to browse

Supports image/*

💡 Each image you upload or URL you provide will be added to the array automatically.

Reference videos

Up to 3 reference videos for motion transfer. Cite as 'video 1' in your prompt.

Drag & drop video or click to browse

Supports video/*

💡 Each video you upload or URL you provide will be added to the array automatically.

Reference audios

Up to 3 reference audio files. Cite as 'audio 1' in your prompt.

Drag & drop audio or click to browse

Supports audio/*

💡 Each audio you upload or URL you provide will be added to the array automatically.

Duration

Video length in seconds. Supported values: 4, 5, 6, 8, 10, 12, 15. Use 5s for quick clips, 15s for cinematic narratives.

Aspect ratio

Output aspect ratio. Use 16:9 for landscape, 9:16 for vertical social video, 21:9 for ultrawide cinematic.

Advanced Parameters (4 options)

Examples

Seedance 2.0: AI Video Generation API

What is Seedance 2.0?

Seedance 2.0 is ByteDance's multimodal video generation model, launched February 2026. Built on a 4.5B parameter Dual-Branch Diffusion Transformer architecture, it generates cinematic-quality AI videos from text, images, audio, and video inputs — all simultaneously. It is the first model of its class to co-generate video and synchronized audio in the same latent space, producing dialogue, sound effects, ambient audio, and music without any post-processing. Seedance 2.0 currently leads the Artificial Analysis Elo leaderboard at 1,269, outperforming Google Veo 3, OpenAI Sora 2, and Runway Gen-4.5.

Key Features

Seedance 2.0 introduces several industry firsts. Native audio-video joint generation delivers perfectly synchronized audio from a single prompt — no separate audio pipeline required. Multi-shot storytelling lets you define Shot 1, Shot 2, etc. for cinematic sequences with natural cuts. The omni-reference system accepts up to 9 images, 3 videos, and 3 audio files per generation for precise character, style, and motion consistency. Phoneme-level lip sync operates across 8+ languages. Physics simulation renders realistic gravity, inertia, and fluid dynamics. Videos can be up to 15 seconds at 720p across 7 aspect ratios including 16:9, 9:16, and 21:9.

Best Use Cases

Marketing agencies use Seedance 2.0 to produce product showcase videos and social ads in any format. Film studios and VFX teams prototype pre-viz shots with director-level camera control — specifying movement, lighting, and physics behavior. Short-form content creators and MCNs generate TikTok, Reels, and YouTube Shorts at scale with audio already embedded. The omni-reference system is particularly powerful for character-consistent multi-scene storytelling without costly retakes.

Prompt Tips and Output Quality

Write detailed cinematic prompts specifying camera movement, lighting conditions, and subject behavior. Use the Shot 1:, Shot 2: syntax for multi-shot sequences and explicitly reference uploaded assets in your prompt (image 1 shows the protagonist, video 1 provides the motion style). Enable generate_audio for scenes with dialogue, music, or environmental soundscapes. Draft at 480p for rapid iteration; render finals at 720p.

FAQs

What inputs does Seedance 2.0 support? Text prompts, start/end frame images (first_frame_url / last_frame_url), up to 3 video clips, 3 audio files, and up to 9 reference images. Important: first_frame_url / last_frame_url and reference_images are mutually exclusive — use one mode per generation.

How long can generated videos be? 4 to 15 seconds. Supported durations: 4, 5, 6, 8, 10, 12, and 15 seconds.

Does it generate audio natively? Yes — audio-video co-generation produces dialogue, SFX, ambient sounds, and music synchronized with visual content. Enable by setting generate_audio to true.

What aspect ratios are available? 16:9, 9:16, 1:1, 4:3, 3:4, 21:9, and adaptive (matches the input media's dimensions automatically — useful with first_frame_url to preserve your source image's exact proportions).

How does it compare to Veo 3 and Sora 2? Seedance 2.0 holds the top Elo rating (1,269) on Artificial Analysis, ahead of Veo 3, Sora 2, and Runway Gen-4.5.

Can I chain multiple clips together? Yes. Use return_last_frame to capture the final frame, then pass it as first_frame_url in the next generation for seamless sequences beyond 15 seconds.

Are there any content restrictions on first_frame_url? Yes — ByteDance's content policy blocks images containing real human faces from being used as first_frame_url. Use illustrations, landscapes, product shots, or AI-generated images without identifiable people. Reference images (reference_images array) have the same restriction.

Popular Models

Discover other powerful AI models

Image To Image

Average Pricing$0.22Average Pricing $0.22