Skip to main content
  1. Home
  2. Computing
  3. News

New study shows AI isn’t ready for office work

Your job is safe for now as AI still struggles with real office tasks

Add as a preferred source on Google
openai-chatgpt-os
Levart_Photographer / Unsplash

It has been nearly two years since Microsoft CEO Satya Nadella predicted that generative AI would take over knowledge work, but if you look around a typical law firm or investment bank today, the human workforce is still very much in charge. Despite all the hype about “reasoning” and “planning,” a new study from training-data company Mercor explains exactly why the robot revolution is stalled: AI just can’t handle the messiness of real work.

A reality check for the “replacement” theory

Mercor released a new benchmark called APEX-Agents, and it is brutal. Unlike the usual tests that ask AI to write a poem or solve a math problem, this one uses actual queries from lawyers, consultants, and bankers. It asks the models to do complete, multi-step tasks that require jumping between different types of information.

The results? Even the absolute best models on the market—we are talking about Gemini 3 Flash and GPT-5.2—couldn’t crack a 25% accuracy rate. Gemini led the pack at 24%, with GPT-5.2 right behind it at 23%. Most others were stuck in the teens.

Why AI is failing the “office test”

Mercor CEO Brendan Foody points out that the issue isn’t raw intelligence; it’s context. In the real world, answers aren’t served up on a silver platter. A lawyer has to check a Slack thread, read a PDF policy, look at a spreadsheet, and then synthesize all that to answer a question about GDPR compliance.

Humans do this context-switching naturally. AI, it turns out, is terrible at it. When you force these models to hunt for information across “scattered” sources, they either get confused, give the wrong answer, or just give up entirely.

The “Unreliable Intern”

For anyone worried about their job security, this is a bit of a relief. The study suggests that right now, AI functions less like a seasoned professional and more like an unreliable intern who gets things right about a quarter of the time.

Recommended Videos

That said, the progress is terrifyingly fast. Foody noted that just a year ago, these models were scoring between 5% and 10%. Now they are hitting 24%. So, while they aren’t ready to take the wheel yet, they are learning to drive much faster than we expected. For now, though, the “knowledge work” revolution is on hold until the bots learn how to multitask p

Moinak Pal
Moinak Pal is has been working in the technology sector covering both consumer centric tech and automotive technology for the…
Google responds to Chrome’s silent Gemini Nano install, stops short of addressing consent
Chrome's GM says on-device AI is central to the browser's security strategy, but did not explain why deleting it triggers an automatic re-download.
Google Chrome with Gemini

Google Chrome VP and GM Parisa Tabriz has responded to criticism over Chrome's practice of silently downloading a 4GB AI model onto users' devices, saying on-device AI is central to the browser's security and developer strategy.

What triggered the backlash

Read more
Anthropic just taught Claude to dream between tasks, and it makes agents meaningfully smarter
Dreaming turns Claude from an AI that forgets everything the moment a session ends into one that quietly gets better at its job every time it's not actively working.
Claude Dreaming featured image.

Anthropic just gave Claude something that sounds like a perfect science fiction plot: the ability to dream. The company has announced three upgrades to Claude Managed Agents: Dreaming, Outcomes, and Multiagent Orchestration. 

While I agree that the Dreaming clearly has the most evocative name, it’s also the one with the most practical implications for developers building AI agents that can handle complex, long-running work. 

Read more
Samsung patent shows a laptop with a clever touch-sensitive palmrest for shortcuts
This new Samsung laptop patent wants to rewire your typing habits
The screen of the Galaxy Book4 Ultra.

Samsung has been known to file some bizarre patents from time to time, including its ⁠multi-fold laptop and rollable smartphone concepts from 2022. A new patent from the company has recently surfaced, showcasing a laptop concept that could reduce the need for modifier keys like Ctrl, Shift, and Alt. The laptop in the patent is shown using built-in sensors in the palmrest to detect the position of the user’s arms and adjust the function of the keys accordingly.

Could your palmrest replace Ctrl and Shift?

Read more