Skip to main content
  1. Home
  2. Computing
  3. News

GPT-4o and Gemini 1.5 Pro just got beat in the AI race

Add as a preferred source on Google
a screenshot of claude 3.5 sonnet, with an 8-bit crab
Anthropic

There’s a new leader, technically, in the race for AI assistant dominance, and it’s Anthropic’s new Claude 3.5 Sonnet. The newly released model outperforms both Gemini 1.5 Pro and ChatGPT-4o across a spectrum of benchmark tests, the company announced on Thursday.

This new iteration of Sonnet is the first in Anthropic’s upcoming line of 3.5 models, and it significantly outperforms the more expansive Opus 3.0 model, and does so at a fraction of the larger model’s energy cost. Compute efficiency is becoming an increasingly important aspect of AI system design, especially as the cost of both powering and cooling AI data centers soars while the infrastructure pushes into the gigawatt range.

Claude 3.5 Sonnet for vision

“Claude 3.5 Sonnet operates at twice the speed of Claude 3 Opus,” the Anthropic team wrote in a blog post. “This performance boost, combined with cost-effective pricing, makes Claude 3.5 Sonnet ideal for complex tasks such as context-sensitive customer support and orchestrating multistep workflows.”

Recommended Videos

The new model has reportedly set benchmark results across three standardized tests: graduate-level reasoning with GPQA, undergraduate-level knowledge with MMLU, and coding proficiency with HumanEval. It beat out Google’s Gemini 1.5 Pro, Meta’s Llama-400b, and OpenAI’s ChatGPT-4o, though not by any huge margin and typically only by a couple percentage points.

A table showing Claude 3.5 Sonnet's performance compared to other leading AI systems.
Anthropic

Sonnet 3.5 is being billed as Anthropic’s “strongest vision model yet. ” It’s capable of performing a number of vision-based tasks — like interpreting charts and graphs or transcribing text from imperfect image sources like screenshots or scanned receipts — more accurately than Opus 3.0. In fact, Sonnet 3.5 beat out Opus 3.0 by anywhere from 6 to 17 points across industry standard vision benchmarks. The new model is also reportedly much more competent at handling humor and can converse in a much more lifelike manner.

Sonnet will also be the first Anthropic AI to offer the Artifacts feature to users. Rather than generate images or code snippets directly into the flow of the conversation, Artifacts will create that content in a dedicated space to the side of the chat. This allows users to create “a dynamic workspace where they can see, edit, and build upon Claude’s creations in real time, seamlessly integrating AI-generated content into their projects and workflows,” the Anthropic team claims. It also announced that Claude will soon support team collaboration wherein a company can store its data, documents and projects in a single, central silo, with Claude acting as an on-demand assistant.

You can try out Claude 3.5 Sonnet today for free on the Claude.ai website and the Claude iOS app (a Claude Pro or Team subscription will garner you significantly higher rate limits). Third-party integration is also available through the Anthropic API, Amazon Bedrock, and Google Cloud’s Vertex AI. Claude Haiku 3.5 and Opus 3.5 are scheduled for release later in the year.

Andrew Tarantola
Former Computing Writer
Andrew Tarantola is a journalist with more than a decade reporting on emerging technologies ranging from robotics and machine…
Google’s new AI app wants to replace endless scrolling with stories about your own life
Dreambeans is Google's most direct argument yet that the problem with social media isn't the content, it’s the infinite feed.
Adult, Female, Person

Most apps are designed to keep you on them as long as possible, especially content consumption apps where you scroll a never-ending feed of content. 

Dreambeans, a new experimental app from Google Labs, does the opposite. It gives you a small collection of AI-illustrated stories each morning and sends you off to live your actual life.

Read more
Apple reportedly slashes its Vision roadmap for smart glasses, and Meta’s lead matters more than ever
Apple is betting it can enter the smart glasses market late and still win on brand and ecosystem.
A woman wearing the Apple Vision Pro headset.

A year ago, Apple analyst Ming-Chi Kuo published a Vision product roadmap featuring seven devices. Now, he has published a new one with just two products remaining. 

The change in the product roadmap, Kuo claims, has been approved by John Ternus, Apple's incoming CEO, who officially takes over on September 1, 2026.

Read more
Got a missed call from an unknown number? Malwarebytes’ new free tool will tell you if it’s a scam
With $21 billion stolen from Americans last year through phone scams, a free no-friction reverse lookup removes the guesswork entirely.
Business Card, Paper, Text

Missed calls from unknown numbers used to be easy to ignore, but now they’re harder, especially since scammers spoof real local numbers and clone familiar voices with AI. Malwarebytes has launched a direct answer to that problem.

A free, standalone reverse phone lookup tool that tells you whether a number is safe, suspicious, or a known scam, so that you don’t call it back unnecessarily. It’s called Scam Number Check and it is available now at malwarebytes.com/scam-check/phone. The best part is that you don’t need an account or subscription to access it. 

Read more