Pruna P Video Avatar

Animate any portrait into a lip-synced talking avatar.

~98.10s

Inputs

First-frame portrait in jpg, jpeg, png or webp. Use clean, front-facing headshots.

Preview

Text the avatar speaks when no audio is supplied. Keep sentences short.

Audio URL to lip-sync; overrides voice_script. Upload clean, music-free voice recordings.

Drag & drop audio or click to browse

Supports audio/*

Output resolution, 720p or 1080p. Use 720p to iterate, 1080p for finals.

Examples

--

Pruna P Video Avatar: Talking Avatar Video Generation

What is Pruna P Video Avatar?

Pruna P Video Avatar is an image-to-video model that turns a single portrait into a lip-synced talking avatar. Give it one photo plus either a text script or an audio clip, and it returns an MP4 of that person speaking, with mouth movements, natural head motion, and expressions matched to the speech. Speech comes from built-in text-to-speech (30 voices, 10 languages) or your own uploaded audio, which takes priority when both are supplied. It works with real photos, illustrated characters, and stylized avatars, and outputs 720p or 1080p video.

Key Features

  • Single portrait to talking head: one image in, a speaking video out.
  • Two speech paths: text-to-speech from a voice script, or lip-sync to uploaded audio.
  • 30 built-in voices and 10 languages (English US/UK, Spanish, French, German, Italian, Portuguese BR, Japanese, Korean, Hindi).
  • Performance controls: video_prompt for framing and motion, voice_prompt for tone and pace.
  • 720p and 1080p output, seeded generation for reproducibility, and dynamic backgrounds instead of a floating head.

Best Use Cases

P Video Avatar fits spokesperson-style content where you need a face to deliver a message without a studio or voice actor. Common uses include marketing and UGC-style ad variations, multilingual localization (one portrait, the same script in many languages), product walkthroughs and explainers, education, customer support avatars, and game character or NPC dialogue. In testing, a clean front-facing headshot with a plain background produced accurate lip-sync and preserved identity, making it well suited to virtual presenters and social-media avatars generated at scale.

Prompt Tips and Output Quality

Use a clear, well-lit, front-facing headshot; extreme angles or heavy shadows reduce lip-sync quality. Write scripts to be spoken, not read: short sentences and deliberate punctuation control pacing. Always set voice_language to match your script to avoid broken pronunciation. Keep video_prompt simple and the camera fixed for the tightest sync, and use voice_prompt for delivery. For custom audio, upload a clean, music-free recording. Test a short 5-to-10-second clip before scaling up.

FAQs

What input do I need? A portrait image (jpg, jpeg, png, or webp) plus either a voice script or an audio file.

What if I provide both audio and a script? The uploaded audio takes priority and drives the lip-sync.

Does it work with illustrated or stylized characters? Yes, photorealistic photos, illustrated game characters, and stylized avatars all work.

How long can the video be? Pruna recommends keeping clips under three minutes; longer clips can show consistency drift.

How do I control the aspect ratio? Output matches the input image, so use a portrait image for vertical clips and a landscape image for 16:9.

Can I reproduce a result? Yes, set the same seed with the same inputs for repeatable output.