Alibaba releases open-source Qwen3-Next model with 10x power, long context and lower costs
Alibaba has introduced Qwen3-Next, a new generation of open-source large language models claimed to be ten times more powerful and ten times cheaper to build than their predecessor. The release includes two variants, Qwen3-Next-Instruct and Qwen3-Next-Thinking, both under the Apache 2.0 license and available on Hugging Face, ModelScope, Kaggle, Alibaba Cloud, and Qwen Chat.
Qwen3-Next uses a hybrid architecture combining Gated DeltaNet for fast processing and Gated Attention for reasoning accuracy. Only 3 billion of its 80 billion parameters are activated per token, improving efficiency. It also expands its Mixture-of-Experts design to 512 experts, up from 128 in Qwen3, balancing performance and cost. The models support a native 256,000-token context window and can handle up to one million tokens with RoPE scaling, while offering at least 25 percent lower pricing than Qwen3-235B on Alibaba Cloud.
Additional features include native multi-token prediction for faster inference, updated normalization for more stable training, and major throughput improvements on long contexts. The models are integrated with Hugging Face Transformers, SGLang, vLLM, and Qwen-Agent, and can run at 80 billion parameters on a single Nvidia H200 GPU.

