Wan 2.2 Image to Video Fast
Transforms simple text prompts into breathtaking cinematic-quality videos in minutes.
Pricing
Pricing
Resolution | Cost |
---|---|
480p | 0.0625 |
720p | 0.138 |
Resources to get you started
Everything you need to know to get the most out of Wan 2.2 Image to Video Fast
Wan 2.2: A Mixture-of-Experts Model for Video Generation
Wan 2.2 is the latest and greatest in open-source AI video generation, developed by Alibaba's Tongyi Lab. This model introduces new architectural innovations that significantly advance the field of text-to-video and image-to-video generation while maintaining computational efficiency, making it affordable for many use cases. This is the A14B model that can output 480p and 720p videos. There is also a smaller 5B model that is consumer GPU friendly.
About the tech: Mixture-of-Experts (MoE) Architecture
The model leverages Mixture of Experts or MoE architecture and uses 2 expert models under the hood for the diffusion denoising process.
- β’High-noise expert: Processes early denoising stages, focusing on overall layout and structure
- β’Low-noise expert: Manages final stages, refining video details and reducing artifacts
This two model approach results in 14 billion active parameters per inference step and a total of 27 billion over all parameters combining both the models. The transition between the experts is decided based on the signal to noise ration (SNR) helping the pipeline intelligently transition between the two experts without sacrificing output quality
Wan 2.2 also brings substantial improvements over its predecessor through expanded training data, featuring 65% more images and 80% more videos. This enables advance motion generations helping generate complex body movements, dynamic scene transitions and fluid camera controls. It can also simulate realistic physics scenarios and object interactions. This makes it effective for character animations, sport scenes and other cimenatic sequences.
Another big leap due the added training data is tight control over lighting, composition, contrast and color tone. Wan 2.2 offers over 60 controllable parameters that enale control for camera aware prompting like "aerial orbit," "handheld tracking shot," or specific lighting requirements.
Use cases
Just like other Text to Video and Image to video models this model can be used for a range of use cases. Use to generate cinematic visuals for a project you are working on or generate short social media ads with a product in focus. You can also use the model to create simple animations that can be used as a website background or on a slide deck. The lower costs compared to a lot of open models out there makes it the first choice before trying other video generator model.
License details
Wan 2.2's open-source nature under the Apache 2.0 license makes it the best choice for a range of use cases including commercial and educational purposes.
Other Popular Models
Discover other models you might be interested in.
sdxl-img2img
SDXL Img2Img is used for text-guided image-to-image translation. This model uses the weights from Stable Diffusion to generate new images from an input image using StableDiffusionImg2ImgPipeline from diffusers

faceswap-v2
Take a picture/gif and replace the face in it with a face of your choice. You only need one image of the desired face. No dataset, no training

sdxl-inpaint
This model is capable of generating photo-realistic images given any text input, with the extra capability of inpainting the pictures by using a mask

sd2.1-faceswapper
Take a picture/gif and replace the face in it with a face of your choice. You only need one image of the desired face. No dataset, no training
