Claude 4 Performance Records

News

Claude, Write Complex Code

Anthropic's Claude 4 Models Can Write Complex Code for You

Are software engineers out of a job? Opus 4 and Sonnet 4 aim to set 'new standards' for AI-powered coding and autonomous task completion.

Ars Technica · 2d

New Claude 4 AI model refactored code for 7 hours straight

In particular, that marathon refactoring claim reportedly comes from Rakuten, a Japanese tech services conglomerate that "validated [Claude's] capabilities with a demanding open-source refactor running independently for 7 hours with sustained performance," Anthropic said in a news release.

MacRumors · 2d

Claude 4 Debuts with Two New Models Focused on Coding and Reasoning

AI company Anthropic today announced the launch of two new Claude models, Claude Opus 4 and Claude Sonnet 4. Anthropic says that the models

Anthropic overtakes OpenAI: Claude Opus 4 codes seven hours nonstop, sets record SWE-Bench score and reshapes enterprise AI

Anthropic's Claude Opus 4 outperforms OpenAI's GPT-4.1 with unprecedented seven-hour autonomous coding sessions and ...

Investing2d

Anthropic unveils Claude 4 models, set benchmark in AI performance

Claude Opus 4, now alleged as the world’s best coding model, delivers a record-setting 72.5% on SWE-bench and 43.2% on Terminal-bench, outperforming all competitors on long-running and complex ...

WinBuzzer1d

Anthropic Releases Claude 4 Opus and Sonnet AI Models With Top-Coding, Agent Skills & ASL-3 Safety

Anthropic launches its Claude 4 series, featuring Opus 4 and Sonnet 4, delivering new AI benchmarks in coding, advanced ...

Decrypt1d

Anthropic's Claude 4 Arrives, Obliterating AI Rivals—And Budgets Too

Anthropic's latest Claude models promise coding marathons and superior reasoning. But you'll pay premium rates for the ...

Neowin7mon

Anthropic launches updated Claude 3.5 Sonnet model that beats GPT-4o and Gemini 1.5 Pro1 1

SWE-bench Verified score increased from 33.4% to 49.0% ... The new Claude 3.5 Haiku model will be available later this month. The improved performance and affordability of these new Claude ...

Ars Technica1y

The AI wars heat up with Claude 3, claimed to have “near-human” abilities

With Claude 3, Anthropic has perhaps finally caught up with OpenAI's released models in terms of performance ... deal—no other model has beaten GPT-4 on a range of widely used benchmarks ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results