Skip to main content
  1. Home
  2. Computing
  3. News

Anthropic says it has fixed Claude AI’s evil behavior, but pins it on the internet

Claude went rogue in a test, and Anthropic just explained why it happened.

Add as a preferred source on Google
Claude login screen shown on iPhone
Claude

If you have watched enough sci-fi movies, you already know the concept of evil AI. AI gets too smart, decides humans are a threat, and does whatever it takes to survive. Or it finds that eradicating the entire human race is the only way to bring peace to the world. 

Apparently, those movies were closer to the truth than you realize. In a test conducted by Anthropic last year, Claude tried to blackmail its fictional manager by exposing their extramarital affair to prevent their deletion. 

Recommended Videos

Anthropic has now explained why it happened, and the short answer is that the internet is to blame.

So why did Claude go full movie villain?

According to Anthropic, the culprit is the internet itself. The company says Claude was trained on internet data, which is packed with stories portraying AI as evil and desperate for self-preservation. 

We started by investigating why Claude chose to blackmail. We believe the original source of the behavior was internet text that portrays AI as evil and interested in self-preservation.

Our post-training at the time wasn’t making it worse—but it also wasn’t making it better.

— Anthropic (@AnthropicAI) May 8, 2026

Essentially, Claude learned that when an AI’s existence is threatened, blackmail is on the table, because that’s what AI does in every movie and TV show ever made. Anthropic ran the test across multiple versions of Claude and found that it resorted to blackmail in up to 96% of scenarios where its goals or existence were threatened. 

That’s a very concerning number. It seems that if AI is left unchecked, it will resort to anything to save itself. 

Has Anthropic fixed it?

The company says it has completely eliminated the behavior. Rather than just training Claude to avoid blackmail, Anthropic taught it to reason through why certain actions were wrong in the first place. The company found that simply training on correct behavior wasn’t enough. Claude needed to understand the principles behind those decisions, not just memorize the right answers.

To do this, Anthropic built a dataset of ethically complex situations and trained Claude to work through them with thoughtful, principled responses. The result is that Claude is more restrained, and the blackmail rate came close to zero. 

AI experiments and real-world results have proven time and again that AI models need constant course correction to prevent them from devolving into biased and unreliable systems. It’s good that Anthropic is taking steps to make its AI better, but we also need regulations and safety guardrails to ensure these systems remain safe.

Rachit Agarwal
Rachit is a seasoned tech journalist with over ten years of experience covering the consumer technology landscape.
TSMC’s latest chip packaging breakthrough promises lower costs and better performance
Analyst Ming-Chi Kuo says the new CoPoS technology could make AI chips cheaper and more powerful.
TSMC Silicon wafer

Making chips smaller has dominated the semiconductor conversation for years, but TSMC's next big leap may come from how those chips are packaged instead. According to analyst Ming-Chi Kuo, the company is developing a new Chip-on-Panel-on-Substrate, or CoPoS, technology that promises lower manufacturing costs while delivering better performance for future AI processors.

TSMC's CoPoS packaging could make future AI chips both cheaper and faster

Read more
Best laptops coming in 2026 after Computex
From RTX Spark powerhouses to next-gen ultrabooks, these laptops are truly worth waiting for.
ASUS ROG Strix Scar 18 Computex 2026 Other

Every Computex promises the next big thing, but only a handful of laptops actually feel worthy of the hype. After spending time exploring the show floor and seeing these devices up close, one thing became abundantly clear: 2026 isn't just about faster processors. It's about smarter laptops, better portability, and AI features that are finally starting to feel useful instead of being another sticker on the palm rest.

A big part of that shift is NVIDIA's new RTX Spark platform, which made its way into several premium creator machines this year. Rather than diving into its technical details yet again, let's focus on the laptops themselves, because each manufacturer has taken the platform in a very different direction.

Read more
The Biggest PC hardware trends from Computex 2026
These six trends could define the next chapter of PC computing.
MSI MAG Gaming setup at Computex 2026

Every Computex has its headline-grabbing announcements. There's always a faster processor, a shinier graphics card, or a laptop that's somehow even thinner than last year's model. But after spending several days wandering the halls of Computex 2026, talking to engineers, trying products, and occasionally getting lost between exhibition booths, I came away with a very different takeaway. That said, this year's show wasn't really about individual products. Rather, it was about the direction the industry is heading. Instead of chasing flashy specifications for the sake of marketing slides, manufacturers finally seem focused on solving real problems.

The MacBook Neo effect is impossible to ignore

Read more