Fashion Video Generator - Wan 2.2 + SegFit 1.3

Transform static fashion images into captivating, cinematic videos with AI-driven automation.

If you're looking for an API, here is a sample code in NodeJS to help you out.

const axios = require('axios');
   
   const api_key = "YOUR API KEY";
   const url = "https://api.segmind.com/workflows/68999e68c49c91c2edbbaa3e-v5";
   const data = {
     Attire_image: "publicly accessible image link"
   };
    
   axios.post(url, data, {
     headers: {
       'x-api-key': api_key,
       'Content-Type': 'application/json'
     }
   }).then((response) => {
     console.log(response.data);
   });

Response

application/json

{
  "poll_url": "<base_url>/requests/<some_request_id>",
  "request_id": "some_request_id",
  "status": "QUEUED"
}

You can poll the above link to get the status and output of your request.

Response

application/json

1
2
3

{
  "Wan2.2_Output": "any user input string"
}

Attributes

Attire_imageimage*

To keep track of your credit usage, you can inspect the response headers of each API call. The x-remaining-credits property will indicate the number of remaining credits in your account. Ensure you monitor this value to avoid any disruptions in your API usage.

Fashion video creator - Powered by Wan 2.2 and SegFit 1.3

Last updated: 11 Aug 2025

This workflows generates 720p fashion videos with just one image, no prompts needed. It is great for fashion brands and marketers looking to present their designs in an engaging way. This workflow used Flux Kontext Max to create an image that acts as a starting point for SegFit 1.3 to then replace the attire with the exact attire shared by the user. SegFit is used to make sure that the attire is 100% consistent with the one shared by the user as input. The output from SegFit is then given as starting frame for Wan 2.2 image to video model to generate high quality video output at 720p.

Key Models

Image Transformation: This flow utilizes Flux Kontext Max to convert flat lays and mannequin images into high-quality photo-realistic visuals.
Virtual Try-On: Uses Segmind SegFit v1.3 for realistic outfit visualization on different models.
Prompt Generation: We use Claude 3.7 Sonnet to create scene-appropriate text prompts that guide video creation, ensuring the output aligns with desired visual storytelling. It analyzes the image to determine optimal camera movements like push-ins, pans, and tracking shots to convey the best narrative for fashion presentations
Video Creation: Converts processed images into engaging fashion videos using Wan 2.2, generating cinematic content that captivates audiences.

Use Cases

This is perfect for fashion brands to generate captivating videos for website and social media, helping them increase engagement and reach. Realistic videos with models helps attract attention to the product and makes ads and social media posts more effective. Brands and agencies can also experiment with different model ethnicities, backgrounds and add audio tracks to improve the content's effectiveness. Influencers can also use this workflow to generate content for them without having to physically shoot for each attire.

Models Used in the Pixelflow

wan-2.2-i2v-fast

Transforms simple text prompts into breathtaking cinematic-quality videos in minutes.

segfit-v1.3

SegFit v1.3 enables hyper-realistic virtual try-ons, enhancing online fashion retail experiences without physical photoshoots.

flux-kontext-max

FLUX.1 Kontext [max] transforms textual descriptions into stunning, high-fidelity images with seamless typography integration.

claude-3.7-sonnet

Claude 3.7 Sonnet is a large language model (LLM) launched by Anthropic AI. It is considered state-of-the-art, outperforming previous versions of Claude and competing models in a variety of tasks