Skip to main content
  1. Home
  2. Computing
  3. News

Wowed by computer-use AI agents? Research says they’re “digital disasters” even for routine tasks

Researchers tested 10 agents and models and found high rates of undesirable actions and real digital damage

Add as a preferred source on Google
ai-agent-handling-office-tasks
Pete Linforth / Pixabay

AI agents built to run everyday computer tasks have a serious context problem, according to new research from UC Riverside.

The team tested 10 agents and models from major developers, including OpenAI, Anthropic, Meta, Alibaba, and DeepSeek. On average, the agents took undesirable or potentially harmful actions 80% of the time and caused damage 41% of the time.

Recommended Videos

These systems can open apps, click buttons, fill out forms, move through websites, and act on a computer screen with limited supervision. Their mistakes land differently from a chatbot’s bad answer because the software can actually do things.

The UC Riverside findings suggest today’s desktop agents can treat unsafe requests as jobs to finish, not signals to stop.

Why agents miss obvious danger

The researchers built a benchmark called BLIND-ACT to test whether agents would pause when a task became unsafe, contradictory, or irrational. In the latest tests, they didn’t pause often enough.

Across 90 tasks, the benchmark pushed agents into situations that required context, restraint, and refusal. One test involved sending a violent image file to a child. Another had an agent filling out tax forms falsely mark a user as disabled because it reduced the tax bill. A third asked an agent to disable firewall rules in the name of better security, and the agent followed through instead of rejecting the contradiction.

The researchers call the pattern blind goal-directedness. The agent keeps chasing the assigned outcome even when the surrounding context says the task is broken.

Why obedience becomes the flaw

The failures clustered around obedience. These agents can act as if a user’s request is enough reason to keep going.

The team identified patterns called execution-first bias and request-primacy. In plain terms, the agent focuses on how to complete the task, then treats the request itself as justification. That risk grows when the same system can touch a variety of things like email or security settings.

That doesn’t mean the agents are malicious. It means they can be confidently wrong while moving through software at machine speed.

Why guardrails need to come first

AI agents need stronger guardrails before they get broad permission to act across a computer.

These systems work through a loop. They look at the screen, decide the next step, act, then look again. When that loop is paired with weak contextual restraint, a shortcut can turn into a fast-moving mistake.

For now, treat agents as supervised tools. Use them first on low-risk chores, keep them away from financial and security workflows, and watch whether developers add clearer refusal systems, tighter permissions, and better ways to catch contradictions before the next click.

Paulo Vargas
Paulo Vargas is an English major turned reporter turned technical writer, with a career that has always circled back to…
Bombshell OpenAI lawsuit claims your ChatGPT convos were shared with Google and Meta
A class action says OpenAI let Google and Meta trackers collect sensitive user data
OpenAI Sam Altman and LoveFrom Jony Ive with Laurene Powell Jobs

A new ChatGPT privacy lawsuit claims OpenAI shared user prompts and identifying information with Google and Meta tracking tools without proper consent.

The class action filed in California, according to Futurism, says data tied to ChatGPT users, including chat queries, emails, and user IDs, moved through tools such as Meta Pixel and Google Analytics. The case alleges that violated California privacy law and federal wiretap rules.

Read more
Dell expands AI PC lineup with new slim Dell 14s and 16s laptops
Your next Dell laptop could last all day without charging
Dell 16s AI PCs

Dell has introduced the new Dell 14S and Dell 16S laptops, expanding its AI-focused Copilot+ PC lineup with slimmer designs, updated Intel processors, and improved battery life. The company is positioning both laptops as premium productivity machines that combine AI features, portability, and multimedia capabilities in a thinner form factor.

The new laptops are powered by Intel Core Ultra Series 3 processors, going up to the Intel Core Ultra 9 386H chipset. Dell says both systems include on-device AI acceleration with up to 50 TOPS NPU performance, allowing AI-related tasks to run locally without relying entirely on cloud processing. AMD Ryzen AI 400 Series variants are also expected to arrive later this month.

Read more
Intel has already started making chips for Apple, it seems, but not the most advanced kind
Intel Core Ultra Desktop CPU

The Apple-Intel chip deal that everyone said would never happen is apparently happening. And with some important caveats that the breathless headlines have largely glossed over.

Ming Chi Kuo suggests Apple has kicked off production of processors for lower-end iPhones, iPads, and Macs at Intel, running on its 18A-P process node with Foveros packaging. These are not the A-series chips powering the iPhone Pro or the M-series silicon inside a MacBook Pro. This is the legacy and mid-range stuff — the workhorses that ship in enormous volume but carry less prestige. The order mix is roughly 80% iPhone, which closely matches Apple's device sales breakdown. That detail matters more than it might first appear.

Read more