Skip to main content
Video gen & edits

Reference-to-video

Replicate motion and look

  • Character portrayal: Replicate a character's appearance from a reference image or video. If the reference is a video, the model can also replicate voice timbre.
  • Multi-character interaction: Compose scenes with up to five characters for natural dialogue and interaction.
  • Multi-shot narrative: Maintain character consistency across shots with intelligent multi-shot scheduling.
  • Voice cloning (wan2.7): Provide a reference_voice audio file to set the voice timbre independently from reference videos.
Quick links: API reference: wan2.7, wan2.6 | Prompt guide

Supported models

ModelFeaturesInputOutput
wan2.7-r2v RecommendedAudio sync, multi-character, voice cloning, first-frame control, media array inputText, image, video, audio720P or 1080P, 2-15s, 30 fps, MP4 (H.264)
wan2.6-r2v-flashVideo with or without audio, single or multi-character, multi-shot narrative, audio-video sync. Fast and cost-effective.Text, image, video720P or 1080P, 2-10s, 30 fps, MP4 (H.264)
wan2.6-r2vVideo with audio, multi-role reference-to-video, multi-shot narrative, audio-video syncText, image, video720P or 1080P, 2-10s, 30 fps, MP4 (H.264)

Prerequisites

Get a DashScope API key from the Qwen Cloud. Set it as an environment variable:
export DASHSCOPE_API_KEY="YOUR_API_KEY"

How it works

Reference-to-video tasks run asynchronously:
1

Submit a task

POST your model, prompt, reference URLs, and parameters. The API returns a task_id.
2

Poll for results

GET the task status using the task_id. When the status is SUCCEEDED, the response includes a URL to the generated video.
Task statusDescription
PENDINGTask is queued
RUNNINGVideo is being generated
SUCCEEDEDGeneration complete; video URL available in output.video_url
FAILEDGeneration failed; check message for details
The output video URL expires after 24 hours. Download or transfer it before expiration.

Getting started

Submit a reference-to-video task and poll for the result.
  • curl (wan2.7)
  • curl (wan2.6)
Wan 2.7 uses the media array to provide references. Use Video N or Image N in the prompt to reference characters (numbered separately by type).Step 1: Submit the task
curl --location 'https://dashscope-intl.aliyuncs.com/api/v1/services/aigc/video-generation/video-synthesis' \
  -H 'X-DashScope-Async: enable' \
  -H "Authorization: Bearer $DASHSCOPE_API_KEY" \
  -H 'Content-Type: application/json' \
  -d '{
  "model": "wan2.7-r2v",
  "input": {
    "prompt": "Video 2 holds Image 3 and plays a soothing American country ballad in a coffee shop, while Video 1 smiles, watches Video 2, and slowly walks towards him",
    "media": [
      {"type": "reference_video", "url": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20260129/hfugmr/wan-r2v-role1.mp4"},
      {"type": "reference_video", "url": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20260129/qigswt/wan-r2v-role2.mp4"},
      {"type": "reference_image", "url": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20260129/qpzxps/wan-r2v-object4.png"}
    ]
  },
  "parameters": {
    "resolution": "720P",
    "duration": 10,
    "prompt_extend": false,
    "watermark": true
  }
}'
Step 2: Get the result using the task IDReplace {task_id} with the task_id value returned by the previous API call.
curl -X GET 'https://dashscope-intl.aliyuncs.com/api/v1/tasks/{task_id}' \
  -H "Authorization: Bearer $DASHSCOPE_API_KEY"
Key differences from wan2.6: wan2.7 uses media array instead of reference_urls, characters are referenced as Video 1/Image 1 (not character1), uses resolution+ratio instead of size, supports reference_voice for voice cloning, and watermark defaults to false.
Sample output When the task succeeds, the response includes the video URL:
{
  "output": {
    "task_id": "your-task-id",
    "task_status": "SUCCEEDED",
    "submit_time": "2026-01-01 12:00:00.000",
    "scheduled_time": "2026-01-01 12:00:01.000",
    "end_time": "2026-01-01 12:01:00.000",
    "orig_prompt": "...",
    "video_url": "https://dashscope-result.aliyuncs.com/..."
  },
  "usage": {
    "duration": 10,
    "input_video_duration": 5,
    "output_video_duration": 10,
    "video_count": 1,
    "audio": true,
    "SR": 720
  },
  "request_id": "abc123"
}
For a complete Python example with built-in polling, see Multi-character interaction.

Parameters

wan2.7-r2v

ParameterTypeRequiredDescription
modelstringYes"wan2.7-r2v"
input.promptstringYesUp to 5,000 characters. Use Video 1/Image 1 to reference characters by their order in the media array (images and videos are counted separately).
input.mediaarrayYes1-5 items. Each has type (reference_image, reference_video, or first_frame) and url.
input.negative_promptstringNoUp to 500 characters. Content to exclude.
input.reference_voicestringNoURL to an audio file (WAV/MP3, 1-10s, max 15 MB) for voice cloning. Overrides audio from reference videos.
parameters.resolutionstringNo"720P" or "1080P" (default).
parameters.ratiostringNo"16:9" (default), "9:16", "1:1", "4:3", "3:4". Ignored when first_frame is provided.
parameters.durationintegerNo2-15 (images only) or 2-10 (with video). Default: 5.
parameters.prompt_extendbooleanNoRewrite prompt via LLM. Default: true.
parameters.watermarkbooleanNoDefault: false.
parameters.seedintegerNo0 to 2,147,483,647.

Character referencing (wan2.7)

Each media item maps to a character identifier based on its position. Images and videos are counted separately:
"media": [
  {"type": "reference_video", "url": "https://example.com/person-a.mp4"},   // Video 1
  {"type": "reference_video", "url": "https://example.com/person-b.mp4"},   // Video 2
  {"type": "reference_image", "url": "https://example.com/guitar.png"}      // Image 3
]
Use these identifiers in your prompt: "Video 1 plays guitar while Video 2 sings along, holding Image 3."

wan2.6 models

ParameterTypeRequiredDescription
modelstringYes"wan2.6-r2v-flash" or "wan2.6-r2v"
input.promptstringYesText prompt describing the scene. Reference characters as character1, character2, etc., matching the order of reference_urls.
input.reference_urlsarrayYesUp to 5 URLs pointing to reference images or videos. Each reference must contain a single character.
parameters.sizestringNoOutput resolution. Example: "1280*720" (16:9). See the API reference for all options.
parameters.durationintegerNoOutput video length in seconds.
parameters.audiobooleanNotrue for audio, false for silent video. Default: true. Silent video is supported only by wan2.6-r2v-flash.
parameters.shot_typestringNo"multi" for multi-shot switching or "single" for a fixed perspective.
parameters.watermarkbooleanNotrue to add a watermark.

Reference input limits

TypeMaximum countNotes
Images5People, objects, or backgrounds
Videos3Best for character or object references. Avoid videos of backgrounds or empty scenes.
Total (images + videos)5Combined limit across all references
Input method: public URL (HTTP or HTTPS).

Character referencing (wan2.6)

Each reference URL maps to a character identifier based on its position in the reference_urls array:
"reference_urls": [
  "https://example.com/person-a.mp4",   // character1
  "https://example.com/person-b.mp4",   // character2
  "https://example.com/guitar.png"      // character3
]
Use these identifiers in your prompt: "character1 plays guitar while character2 sings along, holding character3."

Multi-character interaction

Compose scenes with up to five characters for natural dialogue and interactions -- interviews, conversations, tutorials, and more. Supported models: All models. Set shot_type to multi for dynamic multi-shot switching, or single for a fixed perspective.

Example: Four-reference scene

Prompt: "character2 sits on a chair by the window, holding character3, and plays a soothing American country folk song next to character4. character1 says to character2: 'that sounds great'"
InputTypeReference
wan-r2v-role1.mp4Videocharacter1 (person)
wan-r2v-role2.mp4Videocharacter2 (person)
wan-r2v-object4
Imagecharacter3 (object)
wan-r2v-backgroud5
Imagecharacter4 (background)
Output: Multi-shot video with audio

Example: Two-character dialogue

Prompt: "character1 says to character2: 'I'll rely on you tomorrow morning!' character2 replies: 'You can count on me!'"
InputTypeReference
compressed-video (1).mp4Videocharacter1
compressed-video.mp4Videocharacter2
Output: Multi-shot video with audio

Sample code (curl)

All examples use the X-DashScope-Async: enable header to submit asynchronous tasks. Step 1: Submit the task
curl --location 'https://dashscope-intl.aliyuncs.com/api/v1/services/aigc/video-generation/video-synthesis' \
  -H 'X-DashScope-Async: enable' \
  -H "Authorization: Bearer $DASHSCOPE_API_KEY" \
  -H 'Content-Type: application/json' \
  -d '{
  "model": "wan2.6-r2v-flash",
  "input": {
    "prompt": "Character2 sits on a chair by the window, holding character3, and plays a soothing American country folk song next to character4. Character1 says to Character2: \"that sounds great\"",
    "reference_urls": [
      "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/en-US/20260205/aacgyk/wan-r2v-role1.mp4",
      "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/en-US/20260205/mmizqq/wan-r2v-role2.mp4",
      "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20260129/qpzxps/wan-r2v-object4.png",
      "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20260129/wfjikw/wan-r2v-backgroud5.png"
    ]
  },
  "parameters": {
    "size": "1280*720",
    "duration": 10,
    "audio": true,
    "shot_type": "multi",
    "watermark": true
  }
}'
Step 2: Retrieve the result Replace <task-id> with the task_id from the Step 1 response.
curl -X GET https://dashscope-intl.aliyuncs.com/api/v1/tasks/<task-id> \
  --header "Authorization: Bearer $DASHSCOPE_API_KEY"

Sample code (Python)

This example submits a multi-character task and polls until the result is ready.
import os
import time
import requests

API_KEY = os.environ.get("DASHSCOPE_API_KEY")
BASE_URL = "https://dashscope-intl.aliyuncs.com/api/v1"

headers = {
  "Authorization": f"Bearer {API_KEY}",
  "Content-Type": "application/json",
  "X-DashScope-Async": "enable",
}

# Step 1: Submit the task
payload = {
  "model": "wan2.6-r2v-flash",
  "input": {
    "prompt": (
      'Character2 sits on a chair by the window, holding character3, '
      'and plays a soothing American country folk song next to character4. '
      'Character1 says to Character2: "that sounds great"'
    ),
    "reference_urls": [
      "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/en-US/20260205/aacgyk/wan-r2v-role1.mp4",
      "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/en-US/20260205/mmizqq/wan-r2v-role2.mp4",
      "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20260129/qpzxps/wan-r2v-object4.png",
      "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20260129/wfjikw/wan-r2v-backgroud5.png",
    ],
  },
  "parameters": {
    "size": "1280*720",
    "duration": 10,
    "audio": True,
    "shot_type": "multi",
    "watermark": True,
  },
}

response = requests.post(
  f"{BASE_URL}/services/aigc/video-generation/video-synthesis",
  headers=headers,
  json=payload,
)
result = response.json()
task_id = result["output"]["task_id"]
print(f"Task submitted: {task_id}")

# Step 2: Poll for the result
poll_headers = {"Authorization": f"Bearer {API_KEY}"}

while True:
  status_response = requests.get(
    f"{BASE_URL}/tasks/{task_id}", headers=poll_headers
  )
  status = status_response.json()
  task_status = status["output"]["task_status"]

  if task_status == "SUCCEEDED":
    video_url = status["output"]["video_url"]
    print(f"Video ready: {video_url}")
    break
  elif task_status == "FAILED":
    print(f"Task failed: {status['output'].get('message', 'Unknown error')}")
    break
  else:
    print(f"Status: {task_status} -- waiting 10s...")
    time.sleep(10)

Single-character performance

Create a complete character performance across different scenes from a single reference video or image -- personal branding, product endorsements, educational training, and more. Supported models: All models. Pass a single URL in reference_urls and use character1 in the prompt. Setting shot_type to multi is recommended.

Example: Holiday unboxing video

Prompt: "Create a festive holiday unboxing experience. Shot 1 [0-2s]: Character1 sits by a beautifully decorated Christmas tree with twinkling lights, holding a wrapped gift box with elegant red and gold wrapping. Shot 2 [2-4s]: Close-up as Character1 carefully unwraps the gift, revealing premium skincare products inside. Shot 3 [4-6s]: Character1 applies the product with delight, saying: 'This holiday glow is exactly what I wanted!' Shot 4 [6-10s]: Character1 admires their radiant skin in a handheld mirror, surrounded by festive decorations, ending with a warm smile to camera."

Sample code (curl)

Step 1: Submit the task
curl --location 'https://dashscope-intl.aliyuncs.com/api/v1/services/aigc/video-generation/video-synthesis' \
  -H 'X-DashScope-Async: enable' \
  -H "Authorization: Bearer $DASHSCOPE_API_KEY" \
  -H 'Content-Type: application/json' \
  -d '{
  "model": "wan2.6-r2v-flash",
  "input": {
    "prompt": "Create a festive holiday unboxing experience.Shot 1 [0-2s]: Character1 sits by a beautifully decorated Christmas tree with twinkling lights, holding a wrapped gift box with elegant red and gold wrapping. Shot 2 [2-4s]: Close-up as Character1 carefully unwraps the gift, revealing premium skincare products inside. Shot 3 [4-6s]: Character1 applies the product with delight, saying: \"This holiday glow is exactly what I wanted!\" Shot 4 [6-10s]: Character1 admires their radiant skin in a handheld mirror, surrounded by festive decorations, ending with a warm smile to camera.",
    "reference_urls":["https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/en-US/20260205/mjgmzx/wan-r2v-role-4.mp4"]
  },
  "parameters": {
    "size": "1280*720",
    "duration": 10,
    "shot_type":"multi",
    "watermark": true
  }
}'
Step 2: Retrieve the result Replace <task-id> with the task_id from the Step 1 response.
curl -X GET https://dashscope-intl.aliyuncs.com/api/v1/tasks/<task-id> \
  --header "Authorization: Bearer $DASHSCOPE_API_KEY"

Silent video generation

Create visual-only videos without audio -- animated posters, silent short videos, and similar use cases. Supported model: wan2.6-r2v-flash only. Set audio to false.

Example: Silent dance video

Prompt: "character1 drinks bubble tea while dancing spontaneously to the music."
InputOutput
wan-r2v-role-1.mp4 (reference video)Silent video

Sample code (curl)

Step 1: Submit the task
curl --location 'https://dashscope-intl.aliyuncs.com/api/v1/services/aigc/video-generation/video-synthesis' \
  -H 'X-DashScope-Async: enable' \
  -H "Authorization: Bearer $DASHSCOPE_API_KEY" \
  -H 'Content-Type: application/json' \
  -d '{
  "model": "wan2.6-r2v-flash",
  "input": {
    "prompt": "character1 drinks bubble tea while dancing spontaneously to the music.",
    "reference_urls":["https://cdn.wanx.aliyuncs.com/static/demo-wan26/vace.mp4"]
  },
  "parameters": {
    "size": "1280*720",
    "duration": 5,
    "shot_type":"multi",
    "audio": false,
    "watermark": true
  }
}'
Step 2: Retrieve the result Replace <task-id> with the task_id from the Step 1 response.
curl -X GET https://dashscope-intl.aliyuncs.com/api/v1/tasks/<task-id> \
  --header "Authorization: Bearer $DASHSCOPE_API_KEY"

Output specifications

PropertyDetails
Number of videos1 per task
FormatMP4 (H.264, 30 fps)
Resolutionwan2.7: Set by resolution + ratio. wan2.6: Set by size parameter (e.g. 1280*720).
URL expiration24 hours

Response fields

FieldDescription
output.task_idTask identifier for polling status.
output.task_statusPENDING, RUNNING, SUCCEEDED, or FAILED.
output.video_urlURL to the generated video. Available when task_status is SUCCEEDED.
output.orig_promptThe original input prompt.
usage.durationOutput video duration in seconds.
usage.input_video_durationTotal duration of input videos in seconds.
usage.output_video_durationOutput video duration in seconds.
usage.video_countNumber of videos generated (always 1).
usage.audioWhether the output video has audio.
usage.SRResolution tier (720 or 1080).

Billing and rate limits

Billing rules

ItemBilled?Unit
Input imagesNo--
Input videosYesPer second
Output videosYesPer second
Failed tasksNoDo not consume the free quota for new users
Videos with audio and silent videos are priced differently. For example, wan2.6-r2v-flash has separate rates for each. Total billed duration = Input video duration (capped at 5s) + Output video duration

How input video duration is billed

The 5-second cap is distributed evenly across all references (images and videos combined). Each video is billed for min(actual duration, truncation limit). Images are free.
Number of referencesTruncation limit per video
15s
22.5s
31.65s
41.25s
51s
Example: 3 references (1 image + 2 videos) with a 1.65s truncation limit per video: Billed input duration = min(video 1 duration, 1.65s) + min(video 2 duration, 1.65s). The image is not billed.

API reference

FAQ

How do I set the video aspect ratio?

wan2.7: Use the ratio parameter (16:9, 9:16, 1:1, 4:3, 3:4). When a first_frame image is provided, the ratio is inferred from the image. wan2.6: Use the size parameter. Each resolution maps to a fixed aspect ratio -- for example, size=1280*720 produces a 16:9 video.

How do I reference characters in the prompt?

wan2.7: Use Video N or Image N identifiers. Images and videos are counted separately:
"media": [
  {"type": "reference_video", "url": "https://example.com/girl.mp4"},   // Video 1
  {"type": "reference_image", "url": "https://example.com/clock.png"}   // Image 2
]
wan2.6: Use character1, character2, and so on -- the identifiers map directly to the order of URLs in reference_urls:
"reference_urls": [
  "https://example.com/girl.mp4",   // character1
  "https://example.com/clock.png"   // character2
]

What happens when a task fails?

Poll the task status endpoint. If task_status is FAILED, the message field in the response describes the error. Common causes include invalid reference URLs, unsupported file formats, or exceeding rate limits. Failed tasks are not billed.