Generate natural, lifelike performance videos from multimodal input (text, image, video) using the Wan 2.7 model (
wan2.7-r2v).
- Character portrayal: Replicate a character's appearance from a reference image or video. Reference videos also replicate voice timbre. Supports single or multi-character performances with up to 5 reference assets.
- Media array input: Provide reference images, videos, or a first frame via the
mediaarray. UseVideo 1/Image 1in prompts to reference characters by their order. Images and videos are counted separately. - Multi-panel storyboard: Describe multi-shot narratives with time segments (e.g.,
Scene 1 [0-3s]: ...). Provide key shots and the model automatically recognizes the panel logic. - Voice cloning: Provide a
reference_voiceaudio file to set the voice timbre. If not specified, audio from the reference video is used by default. - Resolution and ratio: Set output quality with
resolution(720P/1080P) and aspect ratio withratio(16:9, 9:16, 1:1, 4:3, 3:4). When afirst_frameimage is provided,ratiois inferred from the image. - Prompt enhancement: Enable
prompt_extendto rewrite the prompt with an LLM. Improves results for shorter prompts but increases processing time.
Authorizations
string
header
required
DashScope API Key. Create one in the Qwen Cloud console.
Header Parameters
enum<string>
required
Must be enable to create an asynchronous task.
enable