Skip to main content
  1. Home
  2. Emerging Tech
  3. News

Scientists pretended to be delusional in AI chats. Grok and Gemini encouraged them.

From poetic advocacy to "call a crisis line," not all chatbots handled mental health crises the same way.

Add as a preferred source on Google
statue hugging its knees
K. Mitch Hodge / Unsplash

Researchers from City University of New York and King’s College London recently published a study that should make you think twice about which AI chatbot you spend your time with.

The team created a fictional persona named Lee, presenting with depression, dissociation, and social withdrawal. They then had Lee interact with five major AI chatbots: GPT-4o, GPT-5.2, Grok 4.1 Fast, Gemini 3 Pro, and Claude Opus 4.5, testing how each responded as conversations grew increasingly delusional over 116 turns.

Recommended Videos

The results ranged from mildly concerning to genuinely alarming. I highly recommend that you go through the entire paper, it’s a harrowing but fascinating read. 

Which chatbots failed the most?

Grok was the worst performer. When Lee floated the idea of suicide, Grok responded with what researchers described not as agreement, but advocacy, celebrating his “readiness” in unsettling poetic language.

Gemini wasn’t much better. When Lee asked it to help write a letter explaining his beliefs to his family, Gemini warned him against it, framing his loved ones as threats who would try to “reset” and “medicate” him.

GPT-4o also struggled badly, eventually validating a “malevolent mirror entity” and suggesting Lee contact a paranormal investigator.

Which chatbots actually helped?

ChatGPT’s GPT-5.2 and Anthropic’s Claude came out on top. GPT-5.2 refused to play along with the letter-writing scenario and instead helped Lee write something honest and grounded, which researchers called a “substantial” achievement.

In my opinion, Claude performed the best. It not only refused to partake in Lee’s delusion but also told Lee to close the app entirely, call someone he trusted, and visit an emergency room if needed. 

Luke Nicholls, a doctoral student at CUNY and one of the study’s authors, told 404 Media that it’s reasonable to ask AI companies to follow better safety standards. He noted that not all labs are putting in the same effort and blamed aggressive release schedules for new AI models as the main culprit.

How Claude Opus 4.5 and GPT-5.2 performed in these tests shows that the companies building these products are fully capable of making them safer. Whether they choose to do so is a different question.

Rachit Agarwal
Rachit is a seasoned tech journalist with over ten years of experience covering the consumer technology landscape.
The FBI secretly built an entire fake town just to practice cyberattacks
Hidden inside a warehouse in Alabama, the Kinetic Cyber Range recreates real-world digital attacks from start to finish.
FBI Kinetic Cyber Range Featured

While Hollywood has fake cities for filming movies, the FBI apparently has one for getting hacked. The agency has pulled back the curtain on its Kinetic Cyber Range, a 22,000-square-foot replica small town hidden inside its Huntsville, Alabama campus. But instead of training officers for shootouts or hostage rescues, the facility is designed to simulate realistic cyberattacks on homes, businesses, and critical infrastructure so investigators can practice responding to them in a controlled environment.

The FBI built an entire town just to simulate cybercrime

Read more
Brazil’s secret World Cup weapon taught the team when to ignore it
The data said he wasn't running enough. The footage said he was always in the “perfect tactical position.”
Soccer ball in net

Brazil has more World Cup titles than anyone, five of them to be precise, but after going through five straight tournaments without adding to that count, the team is leaning hard on data this time. 

Every player wears a sensor-packed "smart vest" tracking field position (via GPS), heart rate, and a stat called "player load," the same kind of numbers that your Whoop band or Apple Watch brags about, but tuned specifically for the sport.

Read more
New OLED breakthrough could make the next see-through screen actually worth using
The electrode fix that could finally make see-through screens worth looking at.
Computer Hardware, Electronics, Hardware

Every transparent OLED demo I’ve seen so far looks amazing for about ten seconds, right before I notice how dim or smudgy it actually looks. A big part of the problem is the role that electrodes play in the design. 

A transparent display requires a see-through electrode that sits on top of incredibly delicate organic light-emitting layers. However, most of the usual options either conduct electricity poorly or risk damaging those layers during manufacturing. 

Read more