Wan 2.7 Text to Video

Generate cinematic 1080P videos from text with audio sync, multi-shot control, and 15-second duration via Wan 2.7.

~351.22s
$0.625 - $0.938 per generation

Inputs

Describe the scene, subjects, motion, and camera style (e.g., cinematic drone shot, slow pan). Include lighting and mood for richer outputs. Max 5,000 characters.

URL to an audio file (MP3/WAV) to synchronize with the video.

Drag & drop audio or click to browse

Supports audio/*

Length of the generated video in seconds (2-15).

Examples

--

Wan 2.7 Text to Video — AI Video Generation API

What is Wan 2.7?

Wan 2.7 is Alibaba's most capable text-to-video model, designed for developers, filmmakers, and content creators who need precise control over AI-generated video. Building on the highly regarded Wan 2.x lineage, version 2.7 delivers a substantial leap in visual quality, motion coherence, and audio synchronization. It generates videos up to 15 seconds long at up to 1080P resolution, supporting five aspect ratios and native audio integration — all through a simple API call.

Unlike earlier versions that treated audio as an afterthought, Wan 2.7 supports audio-driven generation from the start: supply an audio URL and the model synchronizes character motion and lip movements with the provided track. This makes it particularly powerful for branded spokesperson content, dubbed video workflows, and music-timed visuals.

Key Features

  • Up to 1080P resolution at 15 seconds duration, with 720P available for faster iteration
  • Native audio synchronization — provide an audio URL to drive lip-sync and motion timing
  • Five aspect ratios — 16:9, 9:16, 1:1, 4:3, and 3:4 for cross-platform publishing
  • Improved motion coherence — characters and objects move with greater physical plausibility and fewer flickering artifacts
  • Cinematic visual quality — skin textures, fabric movement, and lighting gradients reach commercial-grade 1080P standards
  • Reproducible outputs via seed — lock in a seed to regenerate identical results across iterations

Best Use Cases

Brand and marketing video: Generate consistent spokesperson or product demo clips with audio sync, ideal for agencies producing high volumes of branded content.

Social media content: Use 9:16 ratio at 5-10 seconds for TikTok, Instagram Reels, and YouTube Shorts. 16:9 works for YouTube intros, explainers, and pre-roll ads.

Film and narrative production: Multi-shot storytelling at 15 seconds with cinematic camera descriptions in the prompt — slow dolly, aerial drone, handheld chase — produces broadcast-adjacent results.

Prototyping and storyboarding: 720P at 5 seconds for fast iteration on creative concepts before committing to 1080P renders.

Prompt Tips and Output Quality

Wan 2.7 rewards structured, descriptive prompts. Include: (1) the subject and scene, (2) camera movement (e.g., slow pan left, aerial drone descending), (3) lighting and mood (golden hour, soft overcast), and (4) motion details (waves crashing, hair blowing in wind). The model handles cinematic, anime, illustrated, and photorealistic styles — specify your intended aesthetic explicitly.

Use the negative_prompt field to suppress common artifacts like blurry, distorted face, watermark, text overlay. For audio-synced content, ensure your audio URL is publicly accessible and under 15 seconds.

FAQs

What resolutions does Wan 2.7 support? 720P and 1080P. 720P is faster and costs less; 1080P is suited for final deliverables and high-quality publishing.

Can I generate longer videos? Yes, up to 15 seconds. Set the duration parameter anywhere from 2 to 15 seconds.

How does audio-driven generation work? Pass a publicly accessible audio file URL in the audio_url parameter. The model synchronizes character movements and lip motion with the audio track during generation.

What aspect ratios are available? 16:9 (landscape), 9:16 (portrait), 1:1 (square), 4:3, and 3:4. Choose based on your target platform.

Does the API return a video file directly? Yes. The response is binary video/mp4 data on HTTP 200. No polling required — the call is synchronous.

How do I get reproducible outputs? Set a fixed integer seed value. The same seed with identical parameters will reproduce the same video.