Skip to main content
  1. Home
  2. Computing
  3. News

GPT-4o and Gemini 1.5 Pro just got beat in the AI race

Add as a preferred source on Google
a screenshot of claude 3.5 sonnet, with an 8-bit crab
Anthropic

There’s a new leader, technically, in the race for AI assistant dominance, and it’s Anthropic’s new Claude 3.5 Sonnet. The newly released model outperforms both Gemini 1.5 Pro and ChatGPT-4o across a spectrum of benchmark tests, the company announced on Thursday.

This new iteration of Sonnet is the first in Anthropic’s upcoming line of 3.5 models, and it significantly outperforms the more expansive Opus 3.0 model, and does so at a fraction of the larger model’s energy cost. Compute efficiency is becoming an increasingly important aspect of AI system design, especially as the cost of both powering and cooling AI data centers soars while the infrastructure pushes into the gigawatt range.

Claude 3.5 Sonnet for vision

“Claude 3.5 Sonnet operates at twice the speed of Claude 3 Opus,” the Anthropic team wrote in a blog post. “This performance boost, combined with cost-effective pricing, makes Claude 3.5 Sonnet ideal for complex tasks such as context-sensitive customer support and orchestrating multistep workflows.”

Recommended Videos

The new model has reportedly set benchmark results across three standardized tests: graduate-level reasoning with GPQA, undergraduate-level knowledge with MMLU, and coding proficiency with HumanEval. It beat out Google’s Gemini 1.5 Pro, Meta’s Llama-400b, and OpenAI’s ChatGPT-4o, though not by any huge margin and typically only by a couple percentage points.

A table showing Claude 3.5 Sonnet's performance compared to other leading AI systems.
Anthropic

Sonnet 3.5 is being billed as Anthropic’s “strongest vision model yet. ” It’s capable of performing a number of vision-based tasks — like interpreting charts and graphs or transcribing text from imperfect image sources like screenshots or scanned receipts — more accurately than Opus 3.0. In fact, Sonnet 3.5 beat out Opus 3.0 by anywhere from 6 to 17 points across industry standard vision benchmarks. The new model is also reportedly much more competent at handling humor and can converse in a much more lifelike manner.

Sonnet will also be the first Anthropic AI to offer the Artifacts feature to users. Rather than generate images or code snippets directly into the flow of the conversation, Artifacts will create that content in a dedicated space to the side of the chat. This allows users to create “a dynamic workspace where they can see, edit, and build upon Claude’s creations in real time, seamlessly integrating AI-generated content into their projects and workflows,” the Anthropic team claims. It also announced that Claude will soon support team collaboration wherein a company can store its data, documents and projects in a single, central silo, with Claude acting as an on-demand assistant.

You can try out Claude 3.5 Sonnet today for free on the Claude.ai website and the Claude iOS app (a Claude Pro or Team subscription will garner you significantly higher rate limits). Third-party integration is also available through the Anthropic API, Amazon Bedrock, and Google Cloud’s Vertex AI. Claude Haiku 3.5 and Opus 3.5 are scheduled for release later in the year.

Andrew Tarantola
Former Computing Writer
Andrew Tarantola is a journalist with more than a decade reporting on emerging technologies ranging from robotics and machine…
AI’s chip hunger could keep memory prices painfully high for years
Memory shortages may haunt your next phone, laptop, and GPU for years
Crucial Memory and SSD

While recent reports claimed that memory prices may not fall till 2027, it seems like the memory chip crunch isn't a short-term headache. And that's bad news for anyone hoping phone, laptop, and GPU prices will get cheaper again soon.

Reuters reports that SK Group chairman Chey Tae-won said the global chip wafer shortage is likely to last until 2030, with artificial intelligence demand continuing to outpace the supply. Chey said the current shortage could remain above 20%, largely because AI systems require huge amounts of high-bandwidth memory and therefore burn through a lot of wafers.

Read more
One of the most controversial US agencies is reportedly taste-testing Anthropic uber-powerful Mythos AI
The agency's reported use of Mythos highlights a widening split inside the US government over AI risk
Claude AI on an iPhone.

The US government's AI fight just got harder to square. The National Security Agency is reportedly using Anthropic's Mythos Preview even as senior Pentagon officials keep pushing to cut the company off over supply chain concerns. It shows how quickly real security needs can outrun official policy.

Since February, the Defense Department has been trying to block Anthropic and push vendors to do the same. Yet, according to an Axios report, the NSA appears to be moving ahead with one of the company's most powerful models anyway, suggesting cybersecurity demand is carrying more weight than the feud now playing out inside government.

Read more
AI streaming is going mainstream in China, whether audiences want it or not
IQiyi wants AI to make most of its content someday, and it's already starting.
man holding tablet watching iQiyi

China's Netflix, iQiyi, is making one of the biggest bets in streaming history. The company wants AI to create the bulk of its films and shows someday soon, and it's already restructuring its 16-year-old business to make that happen.

At its annual content showcase in Beijing, founder and CEO Gong Yu announced that iQiyi is pivoting its popular streaming platform into a social media destination built around AI-generated content. 

Read more