Skip to main content
  1. Home
  2. Computing
  3. News

Anthropic says it has fixed Claude AI’s evil behavior, but pins it on the internet

Claude went rogue in a test, and Anthropic just explained why it happened.

Add as a preferred source on Google
Claude login screen shown on iPhone
Claude

If you have watched enough sci-fi movies, you already know the concept of evil AI. AI gets too smart, decides humans are a threat, and does whatever it takes to survive. Or it finds that eradicating the entire human race is the only way to bring peace to the world. 

Apparently, those movies were closer to the truth than you realize. In a test conducted by Anthropic last year, Claude tried to blackmail its fictional manager by exposing their extramarital affair to prevent their deletion. 

Recommended Videos

Anthropic has now explained why it happened, and the short answer is that the internet is to blame.

So why did Claude go full movie villain?

According to Anthropic, the culprit is the internet itself. The company says Claude was trained on internet data, which is packed with stories portraying AI as evil and desperate for self-preservation. 

We started by investigating why Claude chose to blackmail. We believe the original source of the behavior was internet text that portrays AI as evil and interested in self-preservation.

Our post-training at the time wasn’t making it worse—but it also wasn’t making it better.

— Anthropic (@AnthropicAI) May 8, 2026

Essentially, Claude learned that when an AI’s existence is threatened, blackmail is on the table, because that’s what AI does in every movie and TV show ever made. Anthropic ran the test across multiple versions of Claude and found that it resorted to blackmail in up to 96% of scenarios where its goals or existence were threatened. 

That’s a very concerning number. It seems that if AI is left unchecked, it will resort to anything to save itself. 

Has Anthropic fixed it?

The company says it has completely eliminated the behavior. Rather than just training Claude to avoid blackmail, Anthropic taught it to reason through why certain actions were wrong in the first place. The company found that simply training on correct behavior wasn’t enough. Claude needed to understand the principles behind those decisions, not just memorize the right answers.

To do this, Anthropic built a dataset of ethically complex situations and trained Claude to work through them with thoughtful, principled responses. The result is that Claude is more restrained, and the blackmail rate came close to zero. 

AI experiments and real-world results have proven time and again that AI models need constant course correction to prevent them from devolving into biased and unreliable systems. It’s good that Anthropic is taking steps to make its AI better, but we also need regulations and safety guardrails to ensure these systems remain safe.

Rachit Agarwal
Rachit is a seasoned tech journalist with over seven years of experience covering the consumer technology landscape.
The size of a credit card: This fully functional computer even packs an e-ink screen
This credit card-sized computer packs Wi-Fi, NFC, and an e ink display
Muxcard

A developer has built a remarkably thin computer that is almost the same size and thickness as a standard credit card, potentially opening the door to a new category of ultra-portable computing devices.

Called the “Muxcard,” the experimental device combines a fully functional microcomputer, wireless connectivity, NFC support, sensors, and an E Ink display into a body measuring just 1mm thick - thin enough to fit inside a regular wallet alongside bank cards. The project, created by GitHub user “krauseler,” has quickly drawn attention from the maker and hardware enthusiast community for pushing the physical limits of compact electronics.

Read more
If your router or drone maker is banned in the US, it will get an update lifeline until 2029
Your “banned” router isn’t dead yet, says the FCC
Drone

The Federal Communications Commission has extended a key waiver allowing certain foreign-made routers, drones, and drone components to continue receiving software and firmware updates in the United States until at least January 1, 2029.

The move comes after growing concerns that millions of already-deployed devices could become cybersecurity risks if manufacturers were suddenly blocked from issuing security patches and compatibility updates. The decision was announced through the FCC’s Office of Engineering and Technology (OET), which also expanded the scope of the waiver to cover additional software-related changes needed to maintain device functionality.

Read more
AI-pilled graduates are not a big hit for finance jobs with their shallow ideas
Turns out ChatGPT can’t survive every finance interview
Artificial Intelligence

Artificial intelligence may be transforming the financial industry, but some firms are beginning to push back against a growing trend: graduates who rely too heavily on AI tools without demonstrating deeper analytical thinking.

According to a report by The Financial Times, the issue recently surfaced through experiences shared by senior finance professionals, including one New York financier who described his company’s 2025 interns as the first group of “true AI natives.” These students had grown up using both digital platforms and generative AI systems, and initially appeared highly capable during recruitment.

Read more