Reference mode
Up to 9 images, 3 videos & 3 audio clips at once. Use @ tags to direct each source.
15s native
Single-shot 15-second generation with coherent motion and physical realism.
Face consistency
Built-in face-preservation algorithm keeps characters consistent across frames.
Native audio
Generates matching sound effects and lip sync in the same pass.
Reference @Image1 character features, replicate the spinning kick from @Video1, set in the rainy cyberpunk street from @Image2.