News

OpenAI Launches ChatGPT Agent with Computer Control Capabilities

OpenAI has released ChatGPT Agent, a new artificial intelligence tool that can perform complex tasks by controlling a virtual computer, marking the company's most significant step toward autonomous AI systems that can interact with the real world.

The new capability, announced today and rolling out to paid subscribers, allows ChatGPT to browse websites, fill out forms, run code, edit spreadsheets, and create presentations while operating through its own computer interface. Users can request tasks like "plan and buy ingredients to make a Japanese breakfast for four" or "analyze three competitors and create a slide deck," and the system will complete them independently.

And the company acknowledged in its announcement that the tool comes with some risks.

"This introduces new risks, particularly because ChatGPT agent can work directly with your data," OpenAI said, noting the expanded capabilities require stronger safety measures than previous versions.

Combining Previous Tools
ChatGPT Agent integrates capabilities from two existing OpenAI products: Operator, which can interact with websites, and Deep Research, which excels at analyzing information. The unified system can now switch among visual browsing, text-based web queries, terminal access, and direct API connections depending on the task at hand.

The tool is powered by a new model explicitly trained for multi-step tasks using reinforcement learning. The development team includes 20 to 35 people across the product and research divisions.

Performance Benchmarks
OpenAI reported significant performance improvements across multiple evaluation metrics. On Humanity's Last Exam, which measures AI performance on expert-level questions, the model achieved a score of 41.6, rising to 44.4 when running multiple attempts simultaneously.

The system also demonstrated superior performance on FrontierMath, reaching 27.4% accuracy on problems that typically require expert mathematicians hours or days to solve. On internal benchmarks measuring knowledge-work tasks, ChatGPT Agent matched or exceeded human performance in roughly half the cases across various completion times.

On SpreadsheetBench, which evaluates spreadsheet editing capabilities, ChatGPT Agent scored 45.5% compared to Microsoft Copilot's 20%. The system also set new records on BrowseComp, a web browsing benchmark, with 68.9% accuracy.

Safety Measures and Restrictions
Due to the expanded capabilities, OpenAI has classified ChatGPT Agent under its "High Biological and Chemical Capabilities" framework, implementing comprehensive safety measures, including threat modeling, refusal training, and monitoring systems. The company emphasized it lacks definitive evidence that the model could help novices create biological weapons, but is exercising caution.

The system requires explicit user permission before taking consequential actions, such as making purchases or sending emails. Financial transactions are currently restricted, and "Watch Mode" activates when users navigate to sensitive sites, such as banking platforms, requiring continuous supervision.

To address prompt injection risks, where malicious websites could attempt to manipulate the AI's behavior, OpenAI has implemented specialized training and monitoring systems. Users can delete all browsing data and log out of active sessions with a single click.

Industry Context
The launch positions OpenAI in the growing AI agent market, where companies are developing systems that move beyond conversation to complete real-world tasks. Anthropic released a similar "Computer Use" feature in October, while Google has been hiring talent specifically for agentic AI projects.

The AI agent concept gained prominence after fintech company Klarna announced in February 2024 that its AI agent handled two-thirds of customer service chats in one month, equivalent to 700 full-time workers. Major tech companies, including Amazon, Meta, and Google, have since highlighted AI agents as strategic priorities.

Availability and Limitations
ChatGPT Agent is available to Pro, Plus, and Team subscribers, starting with Enterprise and Education access planned for the coming weeks. Pro users receive 400 monthly messages, while other paid tiers receive 40 messages monthly, with additional usage available through credit-based options.

The European Economic Area and Switzerland currently lack access, with OpenAI working on compliance issues. The company plans to sunset its separate Operator research preview site within weeks.

OpenAI acknowledged current limitations, particularly in slideshow generation, which remains in beta with "rudimentary" formatting and occasional discrepancies between preview and exported files. The company is training improved versions to address these issues.

"ChatGPT agent is still in its early stages," the company stated. "It's capable of taking on a range of complex tasks, but it can still make mistakes."

The release represents OpenAI's most ambitious attempt to create AI systems that can perform practical tasks in the real world. However, the company emphasized that users maintain control and can intervene at any point during task execution.

About the Author

John K. Waters is the editor in chief of a number of Converge360.com sites, with a focus on high-end development, AI and future tech. He's been writing about cutting-edge technologies and culture of Silicon Valley for more than two decades, and he's written more than a dozen books. He also co-scripted the documentary film Silicon Valley: A 100 Year Renaissance, which aired on PBS.  He can be reached at [email protected].

Featured