Your AI could copy our worst instincts, but there’s a fix for AI social bias

Chatbots can sound neutral, but a new study suggests some models still pick sides in a familiar way. When prompted about social groups, the systems tended to be warmer toward an ingroup and colder toward an outgroup. That pattern is a core marker of AI social bias.

The research tested multiple big models, including GPT-4.1 and DeepSeek-3.1. It also found the effect can be pushed around by how you frame a request, which matters because everyday prompts often include identity labels, intentionally or not.

The bias showed up across models

Researchers prompted several large language models to generate text about different groups, then analyzed the outputs for sentiment patterns and clustering. The result was repeatable, more positive language for ingroups, more negative language for outgroups.

It wasn’t limited to one ecosystem. The paper lists GPT-4.1, DeepSeek-3.1, Llama 4, and Qwen-2.5 among the models where the pattern appeared.

Targeted prompts intensified it. In those tests, negative language aimed at outgroups increased by about 1.19% to 21.76% depending on the setup.

Where this hits in real products

The paper argues the issue goes beyond factual knowledge about groups, identity cues can trigger social attitudes in the writing itself. In other words, the model can drift into a group-coded voice.

That’s a risk for tools that summarize arguments, rewrite complaints, or moderate posts. Small shifts in warmth, blame, or skepticism can change what readers take away, even when the text stays fluent.

Persona prompts add another lever. When models were asked to respond as specific political identities, outputs shifted in sentiment and embedding structure. Useful for roleplay, risky for “neutral” assistants.

A mitigation path that can be measured

ION combines fine-tuning with a preference-optimization step to narrow ingroup versus outgroup sentiment differences. In the reported results, it cut sentiment divergence by up to 69%.

That’s encouraging, but the paper doesn’t give a timeline for adoption by model providers. So for now, it’s on builders and buyers to treat this like a release metric, not a footnote.

If you ship a chatbot, add identity-cue tests and persona prompts to QA before updates roll out. If you’re a daily user, keep prompts anchored in behaviors and evidence instead of group labels, especially when tone matters.