OpenAI unveils o3 and o4-mini models with advanced reasoning and tool access

OpenAI unveils o3 and o4-mini models with advanced reasoning and tool access

OpenAI has introduced two advanced reasoning models, o3 and o4-mini, which have achieved state-of-the-art results in AI benchmarks. These models are notable for having full access to external tools, including web browsing and a Python interpreter, marking a first for OpenAI.

The o3 model is particularly powerful, outperforming previous models in benchmarks like Codeforces, SWE-bench, and MMMU, and is capable of analyzing visual inputs through image uploads. Evaluations show that o3 reduces significant errors by 20% compared to its predecessor, o1, on complex tasks. The o4-mini model is designed for efficiency, optimized for high-volume reasoning tasks, and performs comparably to o3 across math, coding, and visual domains. It scored 99.5% on the AIME 2025 math benchmark when used with a Python interpreter. Both models are trained through reinforcement learning to effectively use tools and are described as more natural and conversational, with features supporting memory and prior context.

Available via ChatGPT Plus, Pro, and Team, these models replace previous versions, with Enterprise and Edu users gaining access soon. OpenAI also introduced Codex CLI, a new command-line tool that serves as a lightweight coding assistant developers can use directly on their local machines. Future updates will integrate additional tools like web search and code interpreter into the models' reasoning process.

by Mauricio B. Holguin

james-the-developer
james-the-developer found this interesting
ChatGPT iconChatGPT
  419
  • ...

ChatGPT is a generative AI chatbot developed by OpenAI and launched in 2022, utilizing the GPT-4o large language model. Rated 4.4, it offers AI-powered capabilities in a web-based chat interface. Key features include its advanced AI-driven interactions and seamless web accessibility. Notable alternatives include HuggingChat, Google Gemini, and GPT4ALL.

Comments

UserPower
5

OpenAI plays with words since Codex CLI is not an agent, but an orchestor (it fetches data, doesn't do any computing), all the AI "thinking" (and so a lot of them) is done on OpenAI servers through API calls (so queries limits can be reached very fast). As for raw performances, nothing very impressive, since OpenAI has skipped "o2" (if there is a single logic in the naming, hard to think they burn $300M on marketing each year), it's pretty much a moderate evolution, just from looking at o3-mini to o4-mini performance gain. Still, theses models are expensive to train and very very expensive to run (even if most users don't care since they don't pay per task, at least not yet but OpenAI is thinking about it), and since we're talking about OpenAI, that still think bigger=better models (and are not afraid to spend $19B in a data-center, money it still doesn't have...), o3/o4 may be the last reasoning models it will offer for a pretty cheap subscription.

Gu