Seedance 2.0

Cinematic AI videos with native audio and multi-shot narratives.

~163.10s
~$0.261

Inputs

Text describing the video. Use Shot 1:, Shot 2: for multi-shot sequences. Reference uploaded assets as 'image 1', 'video 1', 'audio 1' in your prompt.

Starting frame image URL for image-to-video. Animates outward from this reference. Note: images with real human faces are blocked by ByteDance content policy. Cannot be used together with reference_images.

Drag & drop file or click to browse

Supports *

Ending frame; requires first_frame_url. Guides transitions between two frames.

Drag & drop file or click to browse

Supports *

Up to 9 reference images for character/style consistency. Cite as 'image 1', 'image 2' in your prompt. Cannot be used together with first_frame_url.

Drag & drop image or click to browse

Supports image/*

💡 Each image you upload or URL you provide will be added to the array automatically.

Up to 3 reference videos for motion transfer. Cite as 'video 1' in your prompt.

Drag & drop video or click to browse

Supports video/*

💡 Each video you upload or URL you provide will be added to the array automatically.

Up to 3 reference audio files. Cite as 'audio 1' in your prompt.

Drag & drop audio or click to browse

Supports audio/*

💡 Each audio you upload or URL you provide will be added to the array automatically.

Video length in seconds. Supported values: 4, 5, 6, 8, 10, 12, 15. Use 5s for quick clips, 15s for cinematic narratives.

Output aspect ratio. Use 16:9 for landscape, 9:16 for vertical social video, 21:9 for ultrawide cinematic.

Examples

--

Seedance 2.0: AI Video Generation API

What is Seedance 2.0?

Seedance 2.0 is ByteDance's multimodal video generation model, launched February 2026. Built on a 4.5B parameter Dual-Branch Diffusion Transformer architecture, it generates cinematic-quality AI videos from text, images, audio, and video inputs — all simultaneously. It is the first model of its class to co-generate video and synchronized audio in the same latent space, producing dialogue, sound effects, ambient audio, and music without any post-processing. Seedance 2.0 currently leads the Artificial Analysis Elo leaderboard at 1,269, outperforming Google Veo 3, OpenAI Sora 2, and Runway Gen-4.5.

Key Features

Seedance 2.0 introduces several industry firsts. Native audio-video joint generation delivers perfectly synchronized audio from a single prompt — no separate audio pipeline required. Multi-shot storytelling lets you define Shot 1, Shot 2, etc. for cinematic sequences with natural cuts. The omni-reference system accepts up to 9 images, 3 videos, and 3 audio files per generation for precise character, style, and motion consistency. Phoneme-level lip sync operates across 8+ languages. Physics simulation renders realistic gravity, inertia, and fluid dynamics. Videos can be up to 15 seconds at 720p across 7 aspect ratios including 16:9, 9:16, and 21:9.

Best Use Cases

Marketing agencies use Seedance 2.0 to produce product showcase videos and social ads in any format. Film studios and VFX teams prototype pre-viz shots with director-level camera control — specifying movement, lighting, and physics behavior. Short-form content creators and MCNs generate TikTok, Reels, and YouTube Shorts at scale with audio already embedded. The omni-reference system is particularly powerful for character-consistent multi-scene storytelling without costly retakes.

Prompt Tips and Output Quality

Write detailed cinematic prompts specifying camera movement, lighting conditions, and subject behavior. Use the Shot 1:, Shot 2: syntax for multi-shot sequences and explicitly reference uploaded assets in your prompt (image 1 shows the protagonist, video 1 provides the motion style). Enable generate_audio for scenes with dialogue, music, or environmental soundscapes. Draft at 480p for rapid iteration; render finals at 720p.

FAQs

What inputs does Seedance 2.0 support? Text prompts, start/end frame images (first_frame_url / last_frame_url), up to 3 video clips, 3 audio files, and up to 9 reference images. Important: first_frame_url / last_frame_url and reference_images are mutually exclusive — use one mode per generation.

How long can generated videos be? 4 to 15 seconds. Supported durations: 4, 5, 6, 8, 10, 12, and 15 seconds.

Does it generate audio natively? Yes — audio-video co-generation produces dialogue, SFX, ambient sounds, and music synchronized with visual content. Enable by setting generate_audio to true.

What aspect ratios are available? 16:9, 9:16, 1:1, 4:3, 3:4, 21:9, and adaptive (matches the input media's dimensions automatically — useful with first_frame_url to preserve your source image's exact proportions).

How does it compare to Veo 3 and Sora 2? Seedance 2.0 holds the top Elo rating (1,269) on Artificial Analysis, ahead of Veo 3, Sora 2, and Runway Gen-4.5.

Can I chain multiple clips together? Yes. Use return_last_frame to capture the final frame, then pass it as first_frame_url in the next generation for seamless sequences beyond 15 seconds.

Are there any content restrictions on first_frame_url? Yes — ByteDance's content policy blocks images containing real human faces from being used as first_frame_url. Use illustrations, landscapes, product shots, or AI-generated images without identifiable people. Reference images (reference_images array) have the same restriction.