Indie Anime Maker - Wan 2.2 + Qwen Image

Create smooth animated sequences with consistent character using Wan 2.2 and Qwen Image models.

If you're looking for an API, here is a sample code in NodeJS to help you out.

const axios = require('axios');
   
   const api_key = "YOUR API KEY";
   const url = "https://api.segmind.com/workflows/689b0f4dc49c91c2edbbab44-v5";
   const data = {
     Scene_details: "the user input string"
   };
    
   axios.post(url, data, {
     headers: {
       'x-api-key': api_key,
       'Content-Type': 'application/json'
     }
   }).then((response) => {
     console.log(response.data);
   });

Response

application/json

{
  "poll_url": "<base_url>/requests/<some_request_id>",
  "request_id": "some_request_id",
  "status": "QUEUED"
}

You can poll the above link to get the status and output of your request.

Response

application/json

1
2
3

{
  "video_orabq": "any user input string"
}

Attributes

Scene_detailsstr*

To keep track of your credit usage, you can inspect the response headers of each API call. The x-remaining-credits property will indicate the number of remaining credits in your account. Ensure you monitor this value to avoid any disruptions in your API usage.

Indie Animation powered by Wan 2.2

Last Updated: 12 Aug, 2025

Looks like two state-of-the-art open source models from Alibaba, Wan 2.2 and Qwen Image have changed the image and video generation game all together. Wan's potential for helping independent artists and studios in the indie animation space got us excited. We created a Pixelflow to go from a simple prompt to a scene animation along with background music to help animators and content creators create high quality videos with minimal effort. Here is how it works

How it works

First we as the creator to share a few sentences about the scene. Eg. A 15 year boy, standing in rain, looking at Mount Fuji.
We then as Claude Sonnet to create a bunch of prompts that includes the prompt to generate the character with all the detailing, including their attire, prompt to generate the first frame for two cuts and prompts to animate the first frame.
Once we have all the prompts, we pass the character generation prompt to Qwen Image to generate the entire body, face and attire.
We then pass this character to Flux Kontext to generate the first frame for Cut 1 and Cut 2. These will act as first frame for video animation model.
We use the first frames for each of the cuts and pass it to Wan 2.2 to animate. We use the prompts generated by Claude to instruct Wan to animate it in a particular style and details.
We also create background audio using Google Lyra to add the audio component to make it richer.
We finally the two videos and the audio to create the final video.

Features

Flux Kontext helps maintain character consistency across scenes. The initial face is generated by Qwen image which can be substituted with a face that is already generated.
Smooth motion generation for animated sequences. We use 16:9 24FPS and 100 frames setting on Wan 2.2 to generate high quality video.
This workflow lets you create cost-effective animation production without large studio resources.

Fork this Pixelflow

For users who want to create just two scenes with the background music, this is great. But if you are a professional, you can fork this Pixelflow to control the exact character (instead of leaving it to AI to generate the face), add dialogues using Eleven Labs or other TTS models, post process the video using Topaz Video Upscaler for higher resolution studio quality color grading and so on.

Models Used in the Pixelflow

qwen-image

Qwen-Image revolutionizes image generation and editing with seamless multilingual text integration and photorealistic detail.

wan-2.2-i2v-fast

Transforms simple text prompts into breathtaking cinematic-quality videos in minutes.

flux-kontext-pro

FLUX.1 Kontext Pro transforms text prompts into high-quality, customized images with remarkable efficiency and precision.

lyria-2

Lyria 2 by Google DeepMind is an advanced model that generates high-fidelity 48kHz stereo instrumental music from text prompts or lyrics, offering precise control over tempo, key, mood, and structure.

llama4-scout-instruct-basic

Unlock powerful multimodal AI with Llama 4 Scout basic, a 17 billion active parameters model offering leading text & image understanding.

claude-3.7-sonnet

Claude 3.7 Sonnet is a large language model (LLM) launched by Anthropic AI. It is considered state-of-the-art, outperforming previous versions of Claude and competing models in a variety of tasks

video-audio-merge

Effortlessly merge audio and video with our intuitive Video Audio Merge model. Create stunning multimedia content with precise timing, fade effects, and customizable audio options. Perfect for content creators, filmmakers, and marketers.

video-stitch

Revolutionize your video editing with the Video Stitch Model. Seamlessly stitch clips, add captivating audio, and create professional-looking videos in minutes.