RICCARDO MILANI/Hans Lucas/AFP via Getty Images
- Anthropic's Claude Opus 4.5 AI model outperformed all humans on the company's own coding test.
- The two-hour engineering exam measures technical ability and judgment under time pressure.
- The new release is another notch for Anthropic in the AI coding tools space.
Anthropic's new AI model is outperforming humans in coding, the company said of its latest release.
On Monday, the company introduced Claude Opus 4.5 and described it as its most advanced AI model to date, and said that the new model "scored higher than any human candidate ever" on "a notoriously difficult take-home exam" that the company gives prospective engineering candidates.
In a blog post on Monday, Anthropic said that the two-hour take-home test is designed to assess technical ability and judgment under time pressure, and though it doesn't reflect all skills an engineer needs to possess, the fact that an AI model "outperforms strong candidates on important technical skills" is raising questions about "how AI will change engineering as a profession."
In its methodology, the company said that this result came from giving the model several chances to solve each problem and then picking its best answer.
There is not much publicly known information regarding what the engineering test consists of. A 2024 interview review published on Glassdoor said that the test has four levels and asks prospective candidates to implement a specific system and add functionalities to it. It is unclear if the test that Claude 4.5 was given was similar. Anthropic didn't provide further details in its blog and did not respond to a request for comment.
The latest release of Claude 4.5 comes just three months after the rollout of its previous edition. Aside from coding, the new model also has upgrades in generating professional documents, including Excel spreadsheets and PowerPoint presentations.
The new release continues to solidify Anthropic's dominance in AI coding. Even Mark Zuckerberg's Meta is using Claude to support its Devmate internal coding assistant despite being rivals in the AI race.
The company has kept its training methods a secret. Eric Simons, the CEO of Stackblitz, the startup behind the vibe coding service Bolt.new, previously told Business Insider that he believes Anthropic had its AI models write and launch code on their own, then the company reviewed the results using both people and AI tools. Dianne Penn, the Head of Product Management, Research and Frontiers, at Anthropic, said this description was "generally true."
In October, Anthropic CEO Dario Amodei said at the Dreamforce conference that Claude AI is already writing 90% of code for most teams at the company, though he would not be replacing any software engineers with the bot.
"If Claude is writing 90% of the code, what that means, usually, is, you need just as many software engineers. You might need more, because they can then be more leverage," said Amodei. "They can focus on the 10% that's editing the code or writing the 10% that's the hardest, or supervising a group of AI models."
Leave a Reply