On 29 April 2025, Alibaba Cloud’s Qwen team released Qwen 3, an open-weight family of large-language models whose flagship checkpoint packs 235 billion parameters in a mixture-of-experts architecture. The launch also delivers a 30 billion-parameter MoE and six dense variants, all licensed under Apache-2.0 and downloadable from platforms like Hugging Face, ModelScope, and Kaggle.
Qwen 3 debuts a user-switchable “thinking mode” that can be toggled for step-by-step reasoning or rapid direct answers. The flagship model offers a 32 K native context window — extendable to 128 K with YaRN — and activates only eight of 128 experts per token, reducing inference cost versus comparable dense systems.
Pre-training used about 36 trillion tokens spanning 119 languages, twice the corpus of Qwen 2.5 and enriched with synthetic math and code data. Deployment guidance points to vLLM and SGLang for servers and to Ollama or llama.cpp for local setups, signaling support for both cloud and edge developers.
Community feedback is buoyant, with Reddit’s r/LocalLLaMA users calling the 235 B MoE a “local-LLM game-changer,” and Tech Buzz China’s Substack dubbing the move proof Alibaba is “all-in on AI.” Analysts add that earlier QwQ-32B and Qwen 2.5 announcements briefly lifted Alibaba shares eight percent in March, underscoring the strategic weight the company places on open models.
 
 
		                         
		                         
		                         
