Voice design generates custom voices from text descriptions. After creating a voice, use the returned voice name with Qwen TTS or Realtime streaming TTS.
The target_model in voice design must match the model in synthesis. Mismatched models cause failures.
How it works
- Write a voice description (
voice_prompt) and preview text (preview_text).
- Send a Create voice request with your
target_model.
- The API returns a voice name and Base64-encoded preview audio. Decode the Base64 string to get the audio file (WAV format).
- Listen to the preview. If satisfied, use the voice name for synthesis. Otherwise, create a new voice.
Quick start
Prerequisites
- Get an API key and set the
DASHSCOPE_API_KEY environment variable.
Endpoint
All voice design operations use a single endpoint:
POST https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customization
Create a voice
curl -X POST https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customization \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "qwen-voice-design",
"input": {
"action": "create",
"target_model": "qwen3-tts-vd-2026-01-26",
"voice_prompt": "A calm young female voice with clear articulation and gentle tone, suitable for audiobook narration.",
"preview_text": "Hello, welcome to our program. Today we will explore the wonders of nature.",
"preferred_name": "narrator",
"language": "en"
},
"parameters": {
"sample_rate": 24000,
"response_format": "wav"
}
}'
import requests
import base64
import os
response = requests.post(
"https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customization",
headers={
"Authorization": f"Bearer {os.getenv('DASHSCOPE_API_KEY')}",
"Content-Type": "application/json"
},
json={
"model": "qwen-voice-design",
"input": {
"action": "create",
"target_model": "qwen3-tts-vd-2026-01-26",
"voice_prompt": "A calm young female voice with clear articulation "
"and gentle tone, suitable for audiobook narration.",
"preview_text": "Hello, welcome to our program. "
"Today we will explore the wonders of nature.",
"preferred_name": "narrator",
"language": "en"
},
"parameters": {
"sample_rate": 24000,
"response_format": "wav"
}
},
timeout=60
)
result = response.json()
voice_name = result["output"]["voice"]
print(f"Voice created: {voice_name}")
# Decode and save preview audio
audio_bytes = base64.b64decode(result["output"]["preview_audio"]["data"])
with open(f"{voice_name}_preview.wav", "wb") as f:
f.write(audio_bytes)
import com.google.gson.Gson;
import com.google.gson.JsonObject;
import java.io.*;
import java.net.HttpURLConnection;
import java.net.URL;
import java.util.Base64;
public class VoiceDesign {
public static void main(String[] args) {
String apiKey = System.getenv("DASHSCOPE_API_KEY");
String apiUrl = "https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customization";
try {
String body = "{"
+ "\"model\": \"qwen-voice-design\","
+ "\"input\": {"
+ "\"action\": \"create\","
+ "\"target_model\": \"qwen3-tts-vd-realtime-2026-01-15\","
+ "\"voice_prompt\": \"A calm young female voice with clear articulation "
+ "and gentle tone, suitable for audiobook narration.\","
+ "\"preview_text\": \"Hello, welcome to our program. "
+ "Today we will explore the wonders of nature.\","
+ "\"preferred_name\": \"narrator\","
+ "\"language\": \"en\""
+ "},"
+ "\"parameters\": {"
+ "\"sample_rate\": 24000,"
+ "\"response_format\": \"wav\""
+ "}"
+ "}";
HttpURLConnection conn = (HttpURLConnection) new URL(apiUrl).openConnection();
conn.setRequestMethod("POST");
conn.setRequestProperty("Authorization", "Bearer " + apiKey);
conn.setRequestProperty("Content-Type", "application/json");
conn.setDoOutput(true);
try (OutputStream os = conn.getOutputStream()) {
os.write(body.getBytes("UTF-8"));
}
int status = conn.getResponseCode();
InputStream is = (status >= 200 && status < 300)
? conn.getInputStream()
: conn.getErrorStream();
StringBuilder sb = new StringBuilder();
try (BufferedReader br = new BufferedReader(new InputStreamReader(is, "UTF-8"))) {
String line;
while ((line = br.readLine()) != null) {
sb.append(line);
}
}
if (status == 200) {
Gson gson = new Gson();
JsonObject result = gson.fromJson(sb.toString(), JsonObject.class);
JsonObject output = result.getAsJsonObject("output");
String voiceName = output.get("voice").getAsString();
System.out.println("Voice created: " + voiceName);
// Decode and save preview audio
String audioData = output.getAsJsonObject("preview_audio").get("data").getAsString();
byte[] audioBytes = Base64.getDecoder().decode(audioData);
try (FileOutputStream fos = new FileOutputStream(voiceName + "_preview.wav")) {
fos.write(audioBytes);
}
System.out.println("Preview saved: " + voiceName + "_preview.wav");
} else {
System.err.println("Error " + status + ": " + sb.toString());
}
} catch (Exception e) {
e.printStackTrace();
}
}
}
The response includes the voice name and Base64-encoded preview audio. Decode the Base64 string to get the WAV file and listen to the preview.
Use the voice for synthesis
Use the returned voice name with the matching synthesis model. The model in synthesis must match the target_model used during voice creation.
curl -X POST 'https://dashscope-intl.aliyuncs.com/api/v1/services/aigc/multimodal-generation/generation' \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H 'Content-Type: application/json' \
-d '{
"model": "qwen3-tts-vd-2026-01-26",
"input": {
"text": "Welcome to our audiobook. Let me take you on a journey through the wonders of nature.",
"voice": "VOICE_NAME"
}
}'
Replace VOICE_NAME with the voice name returned from the create step. The response contains an output.audio.url field with a download link (valid for 24 hours).import requests
import os
voice_name = "VOICE_NAME" # <-- from the create step
response = requests.post(
"https://dashscope-intl.aliyuncs.com/api/v1/services/aigc/multimodal-generation/generation",
headers={
"Authorization": f"Bearer {os.getenv('DASHSCOPE_API_KEY')}",
"Content-Type": "application/json"
},
json={
"model": "qwen3-tts-vd-2026-01-26",
"input": {
"text": "Welcome to our audiobook. "
"Let me take you on a journey through the wonders of nature.",
"voice": voice_name
}
},
timeout=60
)
result = response.json()
audio_url = result["output"]["audio"]["url"]
print(f"Audio URL: {audio_url}")
import com.google.gson.Gson;
import com.google.gson.JsonObject;
import java.io.*;
import java.net.HttpURLConnection;
import java.net.URL;
public class VoiceDesignSynthesize {
public static void main(String[] args) {
String apiKey = System.getenv("DASHSCOPE_API_KEY");
String voiceName = "VOICE_NAME"; // <-- from the create step
try {
String body = "{"
+ "\"model\": \"qwen3-tts-vd-2026-01-26\","
+ "\"input\": {"
+ "\"text\": \"Welcome to our audiobook. "
+ "Let me take you on a journey through the wonders of nature.\","
+ "\"voice\": \"" + voiceName + "\""
+ "}"
+ "}";
HttpURLConnection conn = (HttpURLConnection) new URL(
"https://dashscope-intl.aliyuncs.com/api/v1/services/aigc/multimodal-generation/generation"
).openConnection();
conn.setRequestMethod("POST");
conn.setRequestProperty("Authorization", "Bearer " + apiKey);
conn.setRequestProperty("Content-Type", "application/json");
conn.setDoOutput(true);
try (OutputStream os = conn.getOutputStream()) {
os.write(body.getBytes("UTF-8"));
}
int status = conn.getResponseCode();
InputStream is = (status >= 200 && status < 300)
? conn.getInputStream()
: conn.getErrorStream();
StringBuilder sb = new StringBuilder();
try (BufferedReader br = new BufferedReader(new InputStreamReader(is, "UTF-8"))) {
String line;
while ((line = br.readLine()) != null) {
sb.append(line);
}
}
if (status == 200) {
Gson gson = new Gson();
JsonObject result = gson.fromJson(sb.toString(), JsonObject.class);
String audioUrl = result.getAsJsonObject("output")
.getAsJsonObject("audio").get("url").getAsString();
System.out.println("Audio URL: " + audioUrl);
// Download the audio file
try (InputStream in = new URL(audioUrl).openStream();
FileOutputStream out = new FileOutputStream("synthesis_output.wav")) {
byte[] buffer = new byte[4096];
int bytesRead;
while ((bytesRead = in.read(buffer)) != -1) {
out.write(buffer, 0, bytesRead);
}
}
System.out.println("Audio saved: synthesis_output.wav");
} else {
System.err.println("Error " + status + ": " + sb.toString());
}
} catch (Exception e) {
e.printStackTrace();
}
}
}
For real-time streaming synthesis with custom voices, see Realtime streaming TTS. For complete API parameters and more operations (list, query, delete), see the Voice design API reference.
Supported models
Voice design uses two models: a design model and a target synthesis model.
| Model | Value | Use with |
|---|
| Voice design model | qwen-voice-design | All voice design operations (fixed value) |
| Real-time synthesis target | qwen3-tts-vd-realtime-2026-01-15 | Realtime streaming TTS |
| Real-time synthesis target (earlier version) | qwen3-tts-vd-realtime-2025-12-16 | Realtime streaming TTS |
| Non-real-time synthesis target | qwen3-tts-vd-2026-01-26 | Qwen TTS |
Voice design models (qwen3-tts-vd-*) only support custom-designed voices. They do not support system voices (Chelsie, Serena, Ethan, Cherry).
Supported languages
| Code | Language |
|---|
zh | Chinese |
en | English |
de | German |
it | Italian |
pt | Portuguese |
es | Spanish |
ja | Japanese |
ko | Korean |
fr | French |
ru | Russian |
voice_prompt supports Chinese and English only. The language parameter must match the preview_text language.
Write effective voice descriptions
A voice description (voice_prompt) tells the model what voice to generate. Combine gender, age, tone, and use case to define a distinctive voice.
Constraints
- Max length: 2,048 characters.
- Languages: Chinese and English only.
Description dimensions
| Dimension | Examples |
|---|
| Gender | Male, female, neutral |
| Age | Child (5--12), teenager (13--18), young adult (19--35), middle-aged (36--55), elderly (55+) |
| Pitch | High, medium, low, high-pitched, low-pitched |
| Pace | Fast, medium, slow, fast-paced, slow-paced |
| Emotion | Cheerful, calm, gentle, serious, lively, composed, soothing |
| Characteristics | Magnetic, crisp, hoarse, mellow, sweet, rich, powerful |
| Use case | News broadcast, ad voice-over, audiobook, animation character, voice assistant, documentary narration |
Tips
- Be specific. Use concrete qualities like "deep," "crisp," or "fast-paced." Avoid vague terms like "nice" or "normal."
- Use multiple dimensions. Combine gender, age, emotion, and use case. "Female voice" alone is too broad.
- Be objective. Focus on physical and perceptual features. Write "high-pitched and energetic" instead of "my favorite voice."
- Be original. Describe voice qualities directly. Celebrity imitation is not supported and involves copyright risks.
- Be concise. Every word should serve a purpose. Avoid synonyms and meaningless intensifiers.
Examples
Good descriptions:
- "A young, lively female voice with a fast pace and noticeable upward inflection, suitable for fashion product introductions."
- "A calm, middle-aged male voice with a slow pace and deep, magnetic tone, suitable for news or documentary narration."
- "A cute child's voice, around 8 years old, with a slightly childish tone, suitable for animation character voice-overs."
Ineffective descriptions:
| Description | Issue | Improvement |
|---|
| "A nice voice" | Too vague | "A young female voice with a clear vocal line and gentle tone." |
| "A voice like a certain celebrity" | Celebrity imitation not supported | "A mature, magnetic male voice with a calm pace." |
| "A very, very, very nice female voice" | Redundant repetition | "A female voice, 20--24 years old, with a light tone and sweet quality." |
Voice quota and cleanup
- Account limit: 1,000 voices per account. Check the
total_count field in the List voices response.
- Automatic cleanup: Voices unused for synthesis in the past year are deleted automatically.
Error codes
If a call fails, see Error messages.
Common voice design errors:
| HTTP status | Error code | Cause | Resolution |
|---|
| 400 | BadRequest.VoiceNotFound | The specified voice does not exist (in voice design or synthesis operations) | Verify the voice name with List voices or Query a voice. If the voice does not exist, create a new voice with Create a voice. |
Next steps