167
models
Vidu Models
Vidu is Shengshu Technology's advanced AI video generation model, distinguished by its exceptional subject consistency and creative control. The Vidu collection includes Vidu Q1 Reference to Video — which generates high-quality video from a reference image while maintaining precise subject identity and appearance throughout the sequence, making it ideal for character-driven content, product showcases, and any workflow where visual consistency matters — and Vidu Template for standardized, template-driven video creation that enables repeatable, on-brand content at scale. Vidu models are particularly strong at human motion quality, facial expression rendering, and natural body dynamics — producing videos where subjects look and move authentically rather than like AI-generated approximations. The reference-to-video capability is especially valuable for e-commerce brands (animate product photos consistently), content creators (maintain character identity across multiple videos), and marketing teams (produce personalized video content from a single reference image at scale). On Segmind, all Vidu models are available as pay-per-use APIs — generate reference-consistent video with a single endpoint call, no subscriptions or model hosting required. Integrate into Segmind Workflows to automate complete video production pipelines from image to finished video.
Luma Ray 3.2
Cinematic text-to-video and image-to-video clips up to 1080p.
Grok Imagine Video 1.5 (Preview)
Image-to-video with native synchronized audio, up to 720p.
Grok Imagine Video
Text-to-video and image-to-video with native synchronized audio.
HeyGen Avatar V — Create Avatar
Train a Digital Twin avatar from reference video.
HeyGen Avatar V
Studio-quality talking-avatar videos from text or audio.
Pixverse Mimic
Transfer motion from reference videos onto still images.
Gemini Embedding 2
Natively multimodal embeddings — text, image, audio, video and PDF mapped into one vector space, with 8 task-specific modes.
Gemini 3.1 Pro
Frontier reasoning across text, images, video, and code.
HappyHorse 1.0
Cinematic 1080p text-to-video with native audio and lip-sync.
Seedance 2.0 Fast
Professional-grade video creation model with native audio, similar to SeeDance 2.0 but faster and cheaper.
Seedance 2.0
Cinematic AI videos with native audio and multi-shot narratives.
Wan 2.7 Video Editing
Edit existing videos precisely using natural language text instructions.
Wan 2.7 Reference to Video
Character-consistent multi-subject videos from reference images.
Wan 2.7 Image to Video
Animate any image into cinematic 1080P video with audio.
Wan 2.7 Text to Video
1080P cinematic videos with audio sync and multi-shot control.
Pixverse V6
15-second AI videos with native audio and cinematic controls.
Veo 3.1 Lite
Affordable text-to-video with audio, powered by Google.
QVQ Max
Chain-of-thought visual reasoning for math, charts, and diagrams.
Qwen 3 VL Plus
Powerful visual QA and document analysis from images.
Qwen 3.5 Plus
Multimodal 1M context AI for image, video, and text.
Qwen 3.5 Flash
Fast multimodal AI processing text, images, and video affordably.
Kling O3 Image To Video
Images to cinematic videos with precise motion control.
Kling O3 Video To Video Edit
Text-based video editor — swap backgrounds, characters, restyle scenes.
Kling V3 Image 2 Image
Transform images into photorealistic, production-ready visuals.
Kling O3 Video To Video Reference
Swap characters and restyle videos using reference images.
Kling O3 Text-to-Video
15-second cinematic AI videos with native audio.
HyperSwap: Video Faceswap by FaceFusion Labs
Realistic face swapping in videos from a single image.
Wan 2.2 Image to Video Flash
Convert a single image into a coherent dynamic video.
Kling 3.0 Pro Image-to-Video
Animated 1080p videos from images with dynamic motion.
Kling 3.0 Standard Image-to-Video
Controlled cinematic 1080p videos from starting images.
Kling 3.0 Pro Text-to-Video
Cinematic 1080p videos with realistic audio from text.
Kling 3.0 Standard Text-to-Video
Stunning 1080p cinematic videos from simple text prompts.
LTX-2-19B I2V
Synchronized 4K audio-video generation from images, fast.
LTX-2-19B T2V
Synchronized video and audio from text, multiple input types.
Kling O1 Reference Image 2 Video
Identity-preserving videos from static images with character reference.
Kling O1 Video 2 Video Reference
Video style transfer using reference character images.
Kling O1 Image 2 Video
Physics-driven animations from images for creative storytelling.
Kling O1 Video 2 Video Edit
Edit any video with precise natural language commands.
Kling O1
Text-to-video creation with precise AI-driven motion control.
Kling V2 Pro Avatar
Talking avatar videos from image and audio, high quality.
Kling Avatar V2 Standard
Lifelike video avatars with precise lip synchronization.
Kling 2.6 Pro Motion Control
Transfer motion from videos to animate custom characters.
Kling 2.6 Standard Motion Control
Precise motion transfer from reference videos to characters.
Kling 2.6
Still images into immersive cinematic videos with synchronized audio.
Heygen Avatar IV
Single photo into a lifelike talking avatar video.
Seedance 1.5 Pro
Synchronized video and audio generation for dynamic storytelling.
LTX Retake Video
Precise segment-level video edits maintaining full scene continuity.
Bria Video Eraser
Remove unwanted objects from videos while preserving audio.
Wan 2.6 Image To Video
Transform images into high-quality videos with audio sync.
Sync.so React 1
Edit video actors' emotions with realistic re-expression.