Replicate motion and look
- Character portrayal: Replicate a character's appearance from a reference image or video. If the reference is a video, the model can also replicate voice timbre.
- Multi-character interaction: Compose scenes with up to five characters for natural dialogue and interaction.
- Multi-shot narrative: Maintain character consistency across shots with intelligent multi-shot scheduling.
- Voice cloning (wan2.7): Provide a
reference_voiceaudio file to set the voice timbre independently from reference videos.
Supported models
| Model | Features | Input | Output |
|---|---|---|---|
wan2.7-r2v Recommended | Audio sync, multi-character, voice cloning, first-frame control, media array input | Text, image, video, audio | 720P or 1080P, 2-15s, 30 fps, MP4 (H.264) |
| wan2.6-r2v-flash | Video with or without audio, single or multi-character, multi-shot narrative, audio-video sync. Fast and cost-effective. | Text, image, video | 720P or 1080P, 2-10s, 30 fps, MP4 (H.264) |
| wan2.6-r2v | Video with audio, multi-role reference-to-video, multi-shot narrative, audio-video sync | Text, image, video | 720P or 1080P, 2-10s, 30 fps, MP4 (H.264) |
Prerequisites
Get a DashScope API key from the Qwen Cloud. Set it as an environment variable:
How it works
Reference-to-video tasks run asynchronously:
1
Submit a task
POST your model, prompt, reference URLs, and parameters. The API returns a
task_id.2
Poll for results
GET the task status using the
task_id. When the status is SUCCEEDED, the response includes a URL to the generated video.| Task status | Description |
|---|---|
PENDING | Task is queued |
RUNNING | Video is being generated |
SUCCEEDED | Generation complete; video URL available in output.video_url |
FAILED | Generation failed; check message for details |
Getting started
Submit a reference-to-video task and poll for the result.
- curl (wan2.7)
- curl (wan2.6)
Wan 2.7 uses the Step 2: Get the result using the task IDReplace
media array to provide references. Use Video N or Image N in the prompt to reference characters (numbered separately by type).Step 1: Submit the task{task_id} with the task_id value returned by the previous API call.Key differences from wan2.6: wan2.7 uses
media array instead of reference_urls, characters are referenced as Video 1/Image 1 (not character1), uses resolution+ratio instead of size, supports reference_voice for voice cloning, and watermark defaults to false.Parameters
wan2.7-r2v
| Parameter | Type | Required | Description |
|---|---|---|---|
model | string | Yes | "wan2.7-r2v" |
input.prompt | string | Yes | Up to 5,000 characters. Use Video 1/Image 1 to reference characters by their order in the media array (images and videos are counted separately). |
input.media | array | Yes | 1-5 items. Each has type (reference_image, reference_video, or first_frame) and url. |
input.negative_prompt | string | No | Up to 500 characters. Content to exclude. |
input.reference_voice | string | No | URL to an audio file (WAV/MP3, 1-10s, max 15 MB) for voice cloning. Overrides audio from reference videos. |
parameters.resolution | string | No | "720P" or "1080P" (default). |
parameters.ratio | string | No | "16:9" (default), "9:16", "1:1", "4:3", "3:4". Ignored when first_frame is provided. |
parameters.duration | integer | No | 2-15 (images only) or 2-10 (with video). Default: 5. |
parameters.prompt_extend | boolean | No | Rewrite prompt via LLM. Default: true. |
parameters.watermark | boolean | No | Default: false. |
parameters.seed | integer | No | 0 to 2,147,483,647. |
Character referencing (wan2.7)
Each media item maps to a character identifier based on its position. Images and videos are counted separately:
wan2.6 models
| Parameter | Type | Required | Description |
|---|---|---|---|
model | string | Yes | "wan2.6-r2v-flash" or "wan2.6-r2v" |
input.prompt | string | Yes | Text prompt describing the scene. Reference characters as character1, character2, etc., matching the order of reference_urls. |
input.reference_urls | array | Yes | Up to 5 URLs pointing to reference images or videos. Each reference must contain a single character. |
parameters.size | string | No | Output resolution. Example: "1280*720" (16:9). See the API reference for all options. |
parameters.duration | integer | No | Output video length in seconds. |
parameters.audio | boolean | No | true for audio, false for silent video. Default: true. Silent video is supported only by wan2.6-r2v-flash. |
parameters.shot_type | string | No | "multi" for multi-shot switching or "single" for a fixed perspective. |
parameters.watermark | boolean | No | true to add a watermark. |
Reference input limits
| Type | Maximum count | Notes |
|---|---|---|
| Images | 5 | People, objects, or backgrounds |
| Videos | 3 | Best for character or object references. Avoid videos of backgrounds or empty scenes. |
| Total (images + videos) | 5 | Combined limit across all references |
Character referencing (wan2.6)
Each reference URL maps to a character identifier based on its position in the reference_urls array:
Multi-character interaction
Compose scenes with up to five characters for natural dialogue and interactions -- interviews, conversations, tutorials, and more.
Supported models: All models.
Set shot_type to multi for dynamic multi-shot switching, or single for a fixed perspective.
Example: Four-reference scene
Prompt: "character2 sits on a chair by the window, holding character3, and plays a soothing American country folk song next to character4. character1 says to character2: 'that sounds great'"
| Input | Type | Reference |
|---|---|---|
| wan-r2v-role1.mp4 | Video | character1 (person) |
| wan-r2v-role2.mp4 | Video | character2 (person) |
![]() | Image | character3 (object) |
![]() | Image | character4 (background) |
Example: Two-character dialogue
Prompt: "character1 says to character2: 'I'll rely on you tomorrow morning!' character2 replies: 'You can count on me!'"
| Input | Type | Reference |
|---|---|---|
| compressed-video (1).mp4 | Video | character1 |
| compressed-video.mp4 | Video | character2 |
Sample code (curl)
All examples use the X-DashScope-Async: enable header to submit asynchronous tasks.
Step 1: Submit the task
<task-id> with the task_id from the Step 1 response.
Sample code (Python)
This example submits a multi-character task and polls until the result is ready.
Single-character performance
Create a complete character performance across different scenes from a single reference video or image -- personal branding, product endorsements, educational training, and more.
Supported models: All models.
Pass a single URL in reference_urls and use character1 in the prompt. Setting shot_type to multi is recommended.
Example: Holiday unboxing video
Prompt: "Create a festive holiday unboxing experience. Shot 1 [0-2s]: Character1 sits by a beautifully decorated Christmas tree with twinkling lights, holding a wrapped gift box with elegant red and gold wrapping. Shot 2 [2-4s]: Close-up as Character1 carefully unwraps the gift, revealing premium skincare products inside. Shot 3 [4-6s]: Character1 applies the product with delight, saying: 'This holiday glow is exactly what I wanted!' Shot 4 [6-10s]: Character1 admires their radiant skin in a handheld mirror, surrounded by festive decorations, ending with a warm smile to camera."
| Input | Output |
|---|---|
| wan-r2v-role-4.mp4 (reference video) | Multi-shot video with audio |
Sample code (curl)
Step 1: Submit the task
<task-id> with the task_id from the Step 1 response.
Silent video generation
Create visual-only videos without audio -- animated posters, silent short videos, and similar use cases.
Supported model: wan2.6-r2v-flash only.
Set audio to false.
Example: Silent dance video
Prompt: "character1 drinks bubble tea while dancing spontaneously to the music."
| Input | Output |
|---|---|
| wan-r2v-role-1.mp4 (reference video) | Silent video |
Sample code (curl)
Step 1: Submit the task
<task-id> with the task_id from the Step 1 response.
Output specifications
| Property | Details |
|---|---|
| Number of videos | 1 per task |
| Format | MP4 (H.264, 30 fps) |
| Resolution | wan2.7: Set by resolution + ratio. wan2.6: Set by size parameter (e.g. 1280*720). |
| URL expiration | 24 hours |
Response fields
| Field | Description |
|---|---|
output.task_id | Task identifier for polling status. |
output.task_status | PENDING, RUNNING, SUCCEEDED, or FAILED. |
output.video_url | URL to the generated video. Available when task_status is SUCCEEDED. |
output.orig_prompt | The original input prompt. |
usage.duration | Output video duration in seconds. |
usage.input_video_duration | Total duration of input videos in seconds. |
usage.output_video_duration | Output video duration in seconds. |
usage.video_count | Number of videos generated (always 1). |
usage.audio | Whether the output video has audio. |
usage.SR | Resolution tier (720 or 1080). |
Billing and rate limits
- For free quota and pricing, see Model invocation pricing.
- For rate limits, see Rate limits.
Billing rules
| Item | Billed? | Unit |
|---|---|---|
| Input images | No | -- |
| Input videos | Yes | Per second |
| Output videos | Yes | Per second |
| Failed tasks | No | Do not consume the free quota for new users |
wan2.6-r2v-flash has separate rates for each.
Total billed duration = Input video duration (capped at 5s) + Output video duration
How input video duration is billed
The 5-second cap is distributed evenly across all references (images and videos combined). Each video is billed for min(actual duration, truncation limit). Images are free.
| Number of references | Truncation limit per video |
|---|---|
| 1 | 5s |
| 2 | 2.5s |
| 3 | 1.65s |
| 4 | 1.25s |
| 5 | 1s |
min(video 1 duration, 1.65s) + min(video 2 duration, 1.65s). The image is not billed.
API reference
FAQ
How do I set the video aspect ratio?
wan2.7: Use the ratio parameter (16:9, 9:16, 1:1, 4:3, 3:4). When a first_frame image is provided, the ratio is inferred from the image.
wan2.6: Use the size parameter. Each resolution maps to a fixed aspect ratio -- for example, size=1280*720 produces a 16:9 video.
How do I reference characters in the prompt?
wan2.7: Use Video N or Image N identifiers. Images and videos are counted separately:
character1, character2, and so on -- the identifiers map directly to the order of URLs in reference_urls:
What happens when a task fails?
Poll the task status endpoint. If task_status is FAILED, the message field in the response describes the error. Common causes include invalid reference URLs, unsupported file formats, or exceeding rate limits. Failed tasks are not billed.
