Choose a model for voice conversation, speech translation, and more.
S2S vs pipeline
Two ways to build voice-enabled apps:
| S2S | Pipeline (ASR + LLM + TTS) | |
|---|---|---|
| Latency | Low — single model, streaming | Higher — 3 sequential hops |
| Audio understanding | End-to-end — hears tone, emotion, responds in kind | Transcribes to text first — audio nuance lost |
| Voice customization | Preset voices via system prompt | Voice cloning, voice design (CosyVoice) |
- Use S2S when interactive conversation, low latency, and audio-aware responses matter. Continue reading this page.
- Use Pipeline when you need custom voices or want to mix-and-match the best ASR, LLM, and TTS for each stage.
Real-time or file-based?
-
Real-time (WebSocket) — Use for live voice interfaces: voice assistants, call centers, simultaneous interpretation. Audio streams in, speech streams out. Model names contain
-realtime. - File-based (HTTP) — Use when you can trade latency for better results: video dubbing, podcast translation, offline content processing. Unlocks function calling (Qwen3.5-Omni, Qwen3-Omni-Flash), web search (Qwen3.5-Omni), thinking mode (Qwen3-Omni-Flash), and video context (Livetranslate).
Function calling
Let the model take actions based on what it hears and sees — check a knowledge base, query a schedule, trigger a workflow. Use qwen3.5-omni-plus (HTTP), qwen3.5-omni-flash (HTTP), or qwen3-omni-flash (HTTP). Not available on realtime or Livetranslate models.
Web search
Let the model retrieve real-time information to answer questions about current events, stock prices, weather, and more. Use qwen3.5-omni-plus (HTTP) or qwen3.5-omni-plus-realtime (WebSocket). The model autonomously decides whether to search. Not available on Qwen3-Omni-Flash or Livetranslate models.
Thinking mode
Use qwen3-omni-flash (HTTP) when answer quality matters more than latency. The model reasons step-by-step before producing speech — useful for technical support, complex Q&A, or multi-step instructions. Not available on Qwen3.5-Omni models.
Translation
All three model families can translate speech:
- Qwen3-Livetranslate — 18 languages + 5 Chinese dialects, ~3-second latency, out of the box. File-based variant accepts video for context-aware accuracy. 7 languages output text only (no audio).
- Qwen3.5-Omni — 29 output languages + 7 Chinese dialects. Superior audio-video understanding and web search. Inject terminology and domain context via system prompt. Both realtime and file-based.
- Qwen3-Omni-Flash — 11 output languages + 8 Chinese dialects. Inject terminology and domain context via system prompt for specialized fields. Both realtime and file-based. Lower cost.
Livetranslate for quick setup; Qwen3.5-Omni for best quality and broadest language coverage; Qwen3-Omni-Flash for cost-sensitive scenarios.
Supported languages
Supported languages
| Language | Qwen3-Livetranslate | Qwen3.5-Omni | Qwen3-Omni-Flash |
|---|---|---|---|
| English | ✓ | ✓ | ✓ |
| Chinese (Mandarin) | ✓ | ✓ | ✓ |
| + Cantonese | ✓ | ✓ | ✓ |
| + Sichuanese | ✓ | ✓ | ✓ |
| + Shanghainese | ✓ | ✓ | ✓ |
| + Beijing | ✓ | ✓ | ✓ |
| + Tianjin | ✓ | ✓ | ✓ |
| + Nanjing | — | ✓ | ✓ |
| + Shaanxi | — | ✓ | ✓ |
| + Hokkien | — | ✓ | ✓ |
| French | ✓ | ✓ | ✓ |
| German | ✓ | ✓ | ✓ |
| Russian | ✓ | ✓ | ✓ |
| Italian | ✓ | ✓ | ✓ |
| Spanish | ✓ | ✓ | ✓ |
| Portuguese | ✓ | ✓ | ✓ |
| Japanese | ✓ | ✓ | ✓ |
| Korean | ✓ | ✓ | ✓ |
| Thai | Text only | ✓ | ✓ |
| Indonesian | Text only | ✓ | — |
| Vietnamese | Text only | ✓ | — |
| Arabic | Text only | ✓ | — |
| Hindi | Text only | ✓ | — |
| Turkish | Text only | ✓ | — |
| Finnish | — | ✓ | — |
| Polish | — | ✓ | — |
| Dutch | — | ✓ | — |
| Czech | — | ✓ | — |
| Urdu | — | ✓ | — |
| Tagalog | — | ✓ | — |
| Swedish | — | ✓ | — |
| Danish | — | ✓ | — |
| Hebrew | — | ✓ | — |
| Icelandic | — | ✓ | — |
| Malay | — | ✓ | — |
| Norwegian | — | ✓ | — |
| Persian | — | ✓ | — |
| Greek | Text only | — | — |
qwen-omni-turbo supports Chinese and English only.Recommended models
| Model | API | Input | Function calling | Web search | Thinking | Batch |
|---|---|---|---|---|---|---|
qwen3.5-omni-plus-realtime | WebSocket | Text, audio, image, video | — | ✓ | — | — |
qwen3.5-omni-plus | HTTP | Text, audio, image, video | ✓ | ✓ | — | — |
qwen3.5-omni-flash-realtime | WebSocket | Text, audio, image, video | — | ✓ | — | — |
qwen3.5-omni-flash | HTTP | Text, audio, image, video | ✓ | ✓ | — | — |
qwen3-omni-flash-realtime | WebSocket | Text, audio, image, video | — | — | — | — |
qwen3-omni-flash | HTTP | Text, audio, image, video | ✓ | — | ✓ | — |
qwen3-livetranslate-flash-realtime | WebSocket | Audio | — | — | — | — |
qwen3-livetranslate-flash | HTTP | Audio, video | — | — | — | — |
All models
Qwen3.5-Omni
Qwen3.5-Omni
| Model | API | Input | Function calling | Web search | Thinking | Batch |
|---|---|---|---|---|---|---|
qwen3.5-omni-plus-realtime | WebSocket | Text, audio, image, video | — | ✓ | — | — |
qwen3.5-omni-plus-realtime-2026-03-15 | WebSocket | Text, audio, image, video | — | ✓ | — | — |
qwen3.5-omni-flash-realtime | WebSocket | Text, audio, image, video | — | ✓ | — | — |
qwen3.5-omni-flash-realtime-2026-03-15 | WebSocket | Text, audio, image, video | — | ✓ | — | — |
qwen3.5-omni-plus | HTTP | Text, audio, image, video | ✓ | ✓ | — | — |
qwen3.5-omni-plus-2026-03-15 | HTTP | Text, audio, image, video | ✓ | ✓ | — | — |
qwen3.5-omni-flash | HTTP | Text, audio, image, video | ✓ | ✓ | — | — |
qwen3.5-omni-flash-2026-03-15 | HTTP | Text, audio, image, video | ✓ | ✓ | — | — |
Qwen3-Omni-Flash
Qwen3-Omni-Flash
| Model | API | Input | Function calling | Web search | Thinking | Batch |
|---|---|---|---|---|---|---|
qwen3-omni-flash-realtime | WebSocket | Text, audio, image, video | — | — | — | — |
qwen3-omni-flash-realtime-2025-12-01 | WebSocket | Text, audio, image, video | — | — | — | — |
qwen3-omni-flash-realtime-2025-09-15 | WebSocket | Text, audio, image, video | — | — | — | — |
qwen3-omni-flash | HTTP | Text, audio, image, video | ✓ | — | ✓ | — |
qwen3-omni-flash-2025-12-01 | HTTP | Text, audio, image, video | ✓ | — | ✓ | — |
qwen3-omni-flash-2025-09-15 | HTTP | Text, audio, image, video | ✓ | — | ✓ | — |
Qwen3-Livetranslate
Qwen3-Livetranslate
| Model | API | Input | Languages |
|---|---|---|---|
qwen3-livetranslate-flash-realtime | WebSocket | Audio | 18 |
qwen3-livetranslate-flash-realtime-2025-09-22 | WebSocket | Audio | 18 |
qwen3-livetranslate-flash | HTTP | Audio, video | 18 |
qwen3-livetranslate-flash-2025-12-01 | HTTP | Audio, video | 18 |
Legacy
Legacy
These models are no longer updated. Use Qwen3.5-Omni or Qwen3-Omni-Flash for new projects.
| Model | Input | API |
|---|---|---|
qwen2.5-omni-7b | Text, audio, image, video | HTTP |
qwen-omni-turbo | Text, audio, image, video | HTTP |
qwen-omni-turbo-latest | Text, audio, image, video | HTTP |
qwen-omni-turbo-2025-03-26 | Text, audio, image, video | HTTP |
qwen-omni-turbo-realtime | Text, audio | WebSocket |
qwen-omni-turbo-realtime-latest | Text, audio | WebSocket |
qwen-omni-turbo-realtime-2025-05-08 | Text, audio | WebSocket |