Seedance 2.0 Mini

Fast text-to-video and image-to-video with synchronized audio.

~146.75s

Inputs

Text prompt describing the video; cite uploaded assets as image 1, video 1, audio 1. Number shots (Shot 1, Shot 2) for multi-shot sequences.

Starting frame image URL for image-to-video; real human faces are blocked and it cannot combine with reference_images. Use to animate a still.

Preview

Clip length in seconds; allowed values 4, 5, 6, 8, 10, 12, 15. Use 5 for drafts, 15 for narratives.

Frame shape; options 16:9, 9:16, 1:1, 4:3, 3:4, 21:9, adaptive. Use 9:16 for social, 16:9 for landscape.

Examples

--

Seedance 2.0 Mini — Multimodal AI Video Generation (Text-to-Video & Image-to-Video)

What is Seedance 2.0 Mini?

Seedance 2.0 Mini is ByteDance's lightweight, efficiency-optimized tier of the Seedance 2.0 video family. It is a distilled version of the flagship model: it keeps the same unified multimodal architecture but runs roughly twice as fast as Seedance 2.0 Fast, making it built for high-volume, iteration-heavy video generation. With one API call you can run text-to-video, image-to-video, or reference-to-video — turning a written prompt, a first-frame image, or attached reference media into a short clip with motion, camera movement, lighting, and optional synchronized audio.

Key Features

  • Multimodal inputs: Generate from text alone, animate a still via first_frame_url, or steer output with up to 9 reference images, 3 reference videos, and 3 reference audio clips.
  • @-style reference tags: Cite uploaded assets as image 1, video 1, audio 1 in your prompt to lock character identity, transfer motion, and align sound.
  • Native audio co-generation: Enable generate_audio to produce dialogue, sound effects, ambient sound, and music in the same pass as the visuals.
  • Multi-shot direction: Number shots (Shot 1, Shot 2) and use plain-language camera language — dolly, pan, orbit, tracking — within a single generation.
  • Flexible output: Durations of 4–15 seconds; aspect ratios 16:9, 9:16, 1:1, 4:3, 3:4, 21:9, and adaptive; resolutions from 480p to 720p for fast iteration.

Best Use Cases

Mini is the default choice for short-form vertical video (TikTok, Reels, Shorts), e-commerce and product demos, marketing ad variations, anime and illustration work, and UGC. Its speed makes it ideal for testing many prompt, framing, and motion variants quickly. The recommended professional workflow is to iterate on Mini, then re-render only your final approved shots on standard Seedance 2.0 when peak fidelity matters.

Prompt Tips and Output Quality

Write a structured shot list rather than prose: subject + action + environment + camera + style, plus an optional sound line. Seedance does not use negative prompts — use positive constraints like "stable face, consistent outfit, natural anatomy." Keep one main action verb per shot, limit scenes to one or two characters, and prefer gentle motion words (slow, smooth, continuous). Always close with a global style line for consistent look. Note that real human faces are blocked across all Seedance tiers.

FAQs

Is Seedance 2.0 Mini faster than Seedance 2.0 Fast? Yes — it is reported to run roughly twice as fast at comparable or better quality, which is why it largely supersedes the older Fast tier for drafts and iteration.

What is the difference between Mini and standard Seedance 2.0? Mini is the lightweight iteration tier optimized for speed and volume; standard Seedance 2.0 is the highest-fidelity tier for final, polished, higher-resolution deliverables.

Does Seedance 2.0 Mini support text-to-video and image-to-video? Yes. It supports text-to-video, image-to-video (via first_frame_url), and reference-to-video with image, video, and audio references.

Can it generate real human faces? No. Face-blocking content policy applies across all Seedance tiers, so images with real human faces are rejected.

Does Mini generate audio? Yes. Turn on generate_audio to co-generate synchronized dialogue, sound effects, ambient sound, and music alongside the video.

How long can the videos be? Clips range from 4 to 15 seconds per generation, selectable via the duration parameter.