NVIDIA debuts Nemotron 3 open models for scalable agent systems with up to 500B parameters
NVIDIA has introduced the Nemotron 3 family of open models, along with datasets and libraries aimed at advancing agentic AI across industries. The models use a hybrid latent mixture of experts architecture designed to reduce communication overhead, limit context drift, and lower inference costs while supporting scalable multi agent systems.
The lineup includes three variants: Nemotron 3 Nano with 30B parameters and 3B active per task, Super with 100B parameters and 10B active per token, and Ultra with 500B parameters and 50B active per token. This range allows developers to align model choice with workload complexity and cost targets. Compared to Nemotron 2 Nano, the Nano model delivers up to four times higher throughput, reduces reasoning token generation by up to sixty percent, and supports long horizon tasks with a one million token context window.
Independent benchmarks place Nemotron 3 Nano at the top of its size class for openness, efficiency, and accuracy. Nemotron 3 Super and Ultra use NVFP4 four bit training on NVIDIA Blackwell architecture to reduce memory usage and speed up training. Nemotron 3 Nano is available through Hugging Face, multiple inference providers like Together AI, OpenRouter or Fireworks AI, and as an NVIDIA NIM microservice, while Super and Ultra are expected in the first half of 2026.
