Skip to main content
  1. Home
  2. Computing
  3. News

Google finds AI chatbots are only 69% accurate… at best

AI chatbots still get one in three answers wrong

Add as a preferred source on Google
phone-showing-ai-chatbots
Solen Feyissa / Unsplash

Google has published a blunt assessment of how reliable today’s AI chatbots really are, and the numbers are not flattering. Using its newly introduced FACTS Benchmark Suite, the company found that even the best AI models struggle to break past a 70% factual accuracy rate. The top performer, Gemini 3 Pro, reached 69% overall accuracy, while other leading systems from OpenAI, Anthropic, and xAI scored even lower. The takeaway is simple and uncomfortable. These chatbots still get roughly one out of every three answers wrong, even when they sound confident doing it.

The benchmark matters because most existing AI tests focus on whether a model can complete a task, not whether the information it produces is actually true. For industries like finance, healthcare, and law, that gap can be costly. A fluent response that sounds confident but contains errors can do real damage, especially when users assume the chatbot knows what it is talking about.

What Google’s accuracy test reveals

The FACTS Benchmark Suite was built by Google’s FACTS team with Kaggle to directly test factual accuracy across four real-world use. One test measures parametric knowledge, which checks whether a model can answer fact-based questions using only what it learned during training. Another evaluates search performance, testing how well models use web tools to retrieve accurate information. A third focuses on grounding, meaning whether the model sticks to a provided document without adding false details. The fourth examines multimodal understanding, such as reading charts, diagrams, and images correctly.

The results show sharp differences between models. Gemini 3 Pro led the leaderboard with a 69% FACTS score, followed by Gemini 2.5 Pro and OpenAI’s ChatGPT-5 nearly at 62% percent. Claude 4.5 Opus landed at ~51% percent, while Grok 4 scored ~54%. Multimodal tasks were the weakest area across the board, with accuracy often below 50%. This matters because these tasks involve reading charts, diagrams, or images, where a chatbot could confidently misread a sales graph or pull the wrong number from a document, leading to mistakes that are easy to miss but hard to undo.

Recommended Videos

The takeaway isn’t that chatbots are useless, but blind trust is risky. Google’s own data suggests AI is improving, yet it still needs verification, guardrails, and human oversight before it can be treated as a reliable source of truth.

Manisha Priyadarshini
Manisha Priyadarshini is a tech and entertainment writer with over nine years of editorial experience.
WWDC 2026: iOS 27, Siri AI, Apple Intelligence upgrades, and everything else
Apple stopped making promises at WWDC 2026 and started delivering: Siri AI, six OS updates, and Cook's farewell.
WWDC 2026 poster

Unlike most years, Apple’s WWDC 2026 carried more weight than usual, not just because it was Tim Cook’s final keynote as CEO, but also because it represented Apple’s chance at redemption after missing deadlines, mounting questions, and criticism about its ability to keep pace in the AI race. 

Fortunately, Apple answered many of those questions on June 8, 2026, unveiling an upgraded AI-powered Siri alongside a range of new Apple Intelligence features, while also raising a few fresh questions. WWDC was packed with announcements across six operating systems that underpin Apple’s ecosystem of devices. 

Read more
Forget RGB, Aston Martin’s gaming PC is dressed for Monaco
This Aston Martin RTX 5090 PC is gorgeous and wildly expensive
Chillblast X Aston Martin Gaming PC Front

Gaming PCs are usually easy to spot. They are loud both in design and fan noise, depending on how you build it. But Chillblast is and Aston Martin have a very different approach to this with a new collection of hand-built gaming PCs that look like something you would part next to a watch winder than hide under a desk.

The Chillblast x Aston Martin Collection has been designed and handcrafted in the UK with three models in the lineup. The entry point is the Chillblast x Aston Martin RTX 5070 PC, priced from £3,749.99. Above that sits the Limited Edition RTX 5090 PC, priced from £8,499.99 and limited to just 20 units. At the top is the Signature Water Cooled RTX 5090 PC, a built-to-order collector’s machine priced at a staggering £15,999.99.

Read more
reMarkable Paper Pure review: An excellent digital slate that I love, and feel vexed by
If you love writing, or just want to get back in the groove, it's unbeatable. If you seek digital conveniences, too, there are better options.
reMarkable Paper Pure digital note-taker device.

Quick Review

I bought into the reMarkable dream years ago and tried multiple slates, but the Paper Pure is the version I keep coming back to. At $399, it’s the entry-level E Ink tablet from the brand that finally retires the aging reMarkable 2, and it does so by stripping away almost everything you’d expect from a 2026 gadget.

Read more