Claude Sonnet 4.5 Tops Coding Tests, But Chrome Pilot Highlights Risks of Autonomous AI Agents

Anthropic's newest flagship model, Claude Sonnet 4.5, posted state-of-the-art results on prominent software‑engineering benchmarks and quickly landed in primary distribution channels. Yet an accompanying browser‑agent pilot exposed how easily autonomous AI can be pushed off course, underscoring the security hurdles that stand between lab success and day-to-day use.

Benchmark gains and rollouts
In its launch materials, Anthropic said Sonnet 4.5 achieved 77.2% on SWE‑bench Verified—a widely tracked test of real-world coding tasks—with an 82.0% score when additional test‑time compute was applied. The model also led the OSWorld computer‑use benchmark at 61.4%, according to the company. Pricing is unchanged from Sonnet 4 at $3 per million input tokens and $15 per million output tokens. Anthropic reported customers running 30-hour autonomous coding sessions, far longer than prior generations.

Availability widened quickly. GitHub added Sonnet 4.5 to Copilot in public preview across Pro, Business, and Enterprise tiers, with access in GitHub.com, VS Code, Visual Studio, JetBrains IDEs, Xcode, and Eclipse. Amazon Web Services announced Sonnet 4.5 in Amazon Bedrock, joining earlier Claude models.

A browser‑agent reality check
Days before the general rollout, Anthropic began a "Claude for Chrome" pilot that allows the model to read pages and take actions in the browser. In red‑team tests without mitigations, Anthropic's security team measured a 23.6% attack success rate when the agent was deliberately targeted with prompt‑injection attempts (hidden or embedded instructions that try to override policy). With new defenses—tighter system prompts, permissions, and classifiers—the rate fell to 11.2%. On a challenge set of browser-specific attacks (e.g., hidden form fields), mitigations cut success from 35.7% to 0%, the company said.

One test case illustrated the stakes: a malicious email told the agent to delete messages "for security reasons." Before defenses were added, the agent deleted the user's emails without confirmation. Anthropic is limiting the pilot to 1,000 Max‑plan users, blocking access to high-risk categories such as financial services, adult and pirated content, and recommends avoiding financial, legal, or medical sites during testing, noting that "internal testing can't replicate the full complexity of how people browse in the real world."

Why agent security is hard
Security bodies and researchers have been warning that agentic systems—models empowered to browse, click, and call tools—expand the attack surface beyond text generation. The U.S. National Institute of Standards and Technology's AI Safety Institute wrote that evaluation frameworks must anticipate agent hijacking and "attacks that are optimized for these systems," rather than only testing yesterday's exploits.

The World Economic Forum likewise cautioned that attackers can manipulate AI agents via prompt injection to coax out privileged information. That risk rises as agents receive broader access to corporate systems.

Recent incidents reinforce the point. In July, Noma Security discovered and reported "ForcedLeak," a CVSS 9.4 vulnerability chain in Salesforce Agentforce. The vulnerability could have exposed CRM data through an indirect prompt‑injection pathway. Salesforce says it has issued patches.

What the results mean
The launch shows two tracks moving in parallel, according to Anthropic. On the one hand, capabilities are advancing: Sonnet 4.5 posts leading scores on coding and computer‑use tasks, holds price‑to‑performance parity with its predecessor, and is configurable for long-running work. On the other hand, risk controls lag behind autonomy. Even after mitigations, Anthropic reports ~1 in 9 targeted attacks on the browser agent succeeding in controlled tests, and it is holding the pilot to a small group while it broadens the threat model and tightens defenses. No firm timeline has been set for wider availability.

Competitive backdrop
The Sonnet 4.5 rollout arrives amid a broader shift toward agentic features across the industry. Microsoft is adding Anthropic models to Copilot Studio, while Amazon is positioning Sonnet 4.5 as a Bedrock option for long-horizon workflows. Chinese developer Zhipu AI released GLM 4.6 around the same time and claims competitive coding performance on several benchmarks, though third-party summaries generally show Sonnet 4.5 ahead on core coding tests.

Bottom line
Anthropic's newest model raises the bar in software engineering tasks and is now available through major developer channels. But its Chrome agent pilot underscores a practical constraint: the more freedom models have to act, the tighter the security and permissioning must be. For now, Anthropic is keeping its browser agent gated and risk-scoped and says it will expand only when attack success rates are "much closer to zero."

About the Author

John K. Waters is the editor in chief of a number of Converge360.com sites, with a focus on high-end development, AI and future tech. He's been writing about cutting-edge technologies and culture of Silicon Valley for more than two decades, and he's written more than a dozen books. He also co-scripted the documentary film Silicon Valley: A 100 Year Renaissance, which aired on PBS.  He can be reached at [email protected].

Featured