Google launches Gemini 2.5 Computer Use, a new model for automating browser-based actions

Google launches Gemini 2.5 Computer Use, a new model for automating browser-based actions

Google has introduced Gemini 2.5 Computer Use, a new AI model designed to interact directly with user interfaces in web browsers, bypassing the need for traditional application programming interfaces. This release enables AI agents to perform human-like actions such as clicking, typing, scrolling, and filling out forms within browser windows. Gemini 2.5 Computer Use leverages visual understanding and reasoning capabilities to interpret requests and manipulate elements on web pages.

It automates workflows on sites without APIs, supporting tasks such as UI testing, data retrieval, and form submissions. Google reports that Gemini 2.5 Computer Use surpasses models like OpenAI’s ChatGPT Agent and Anthropic’s computer use tool in web and mobile benchmarks, though it is currently limited to browser-based control and not yet optimized for desktop system-level actions. The model handles up to 13 UI actions, including typing, clicking, and dragging elements within the browser.

Developers provide screenshots and action histories for context, and the model executes UI commands step by step, requesting user confirmation for sensitive actions like purchases. It also allows customization of supported actions and is available in public preview through the Gemini API on Google AI Studio and Vertex AI.

by Mauricio B. Holguin

cz
ro
city_zen found this interesting
  • ...

Google Gemini is an AI chatbot that provides direct access to Google AI, assisting with tasks such as writing, planning, and learning. Rated 3.4, it features AI-powered capabilities, operates ad-free, and requires no coding. Users seeking alternatives might explore other AI chatbots and virtual assistants.

No comments so far, maybe you want to be first?
Gu