Skip to main content
  1. Home
  2. Computing
  3. News

Google finds AI chatbots are only 69% accurate… at best

AI chatbots still get one in three answers wrong

Add as a preferred source on Google
phone-showing-ai-chatbots
Solen Feyissa / Unsplash

Google has published a blunt assessment of how reliable today’s AI chatbots really are, and the numbers are not flattering. Using its newly introduced FACTS Benchmark Suite, the company found that even the best AI models struggle to break past a 70% factual accuracy rate. The top performer, Gemini 3 Pro, reached 69% overall accuracy, while other leading systems from OpenAI, Anthropic, and xAI scored even lower. The takeaway is simple and uncomfortable. These chatbots still get roughly one out of every three answers wrong, even when they sound confident doing it.

The benchmark matters because most existing AI tests focus on whether a model can complete a task, not whether the information it produces is actually true. For industries like finance, healthcare, and law, that gap can be costly. A fluent response that sounds confident but contains errors can do real damage, especially when users assume the chatbot knows what it is talking about.

What Google’s accuracy test reveals

The FACTS Benchmark Suite was built by Google’s FACTS team with Kaggle to directly test factual accuracy across four real-world use. One test measures parametric knowledge, which checks whether a model can answer fact-based questions using only what it learned during training. Another evaluates search performance, testing how well models use web tools to retrieve accurate information. A third focuses on grounding, meaning whether the model sticks to a provided document without adding false details. The fourth examines multimodal understanding, such as reading charts, diagrams, and images correctly.

The results show sharp differences between models. Gemini 3 Pro led the leaderboard with a 69% FACTS score, followed by Gemini 2.5 Pro and OpenAI’s ChatGPT-5 nearly at 62% percent. Claude 4.5 Opus landed at ~51% percent, while Grok 4 scored ~54%. Multimodal tasks were the weakest area across the board, with accuracy often below 50%. This matters because these tasks involve reading charts, diagrams, or images, where a chatbot could confidently misread a sales graph or pull the wrong number from a document, leading to mistakes that are easy to miss but hard to undo.

Recommended Videos

The takeaway isn’t that chatbots are useless, but blind trust is risky. Google’s own data suggests AI is improving, yet it still needs verification, guardrails, and human oversight before it can be treated as a reliable source of truth.

Manisha Priyadarshini
Manisha Priyadarshini is a tech and entertainment writer with over nine years of editorial experience.
Claude’s Sonnet 5 is built to do more on its own and cost you less
Better than its predecessor, nearly as good as the flagship, and meaningfully cheaper than both.
Art, Floral Design, Graphics

Every major AI lab is racing to prove its models can work autonomously with minimal hand-holding; we’re now seeing pricing emerge as the next battleground. 

Anthropic just fired its latest shot, Claude Sonnet 5, a model the company says performs nearly as well as its flagship Opus 4.8 at a fraction of the cost.

Read more
Apple Creator Studio adds AI tools across Final Cut Pro, Logic Pro and Pixelmator Pro
Final Cut Pro gets AI captions, Auto Mask and better Pixelmator Pro workflows in Creator Studio update
Computer Hardware, Electronics, Hardware

Apple has introduced a major update to Apple Creator Studio, adding new AI features, deeper Pixelmator Pro integration, and workflow upgrades across Final Cut Pro, Logic Pro, Keynote, Pages, Numbers, Motion, Compressor, Freeform, and Final Cut Camera.

The update makes Creator Studio more useful across Mac, iPad, and iPhone, especially for people who move between video editing, image editing, presentations, documents, spreadsheets, and music production.

Read more
AI browsers like Perplexity Comet can be tricked into spilling your password through BioShocking exploit
Six AI browsers were found leaking saved passwords and many of them haven't fixed it yet.
MacBook Air in hand, Comet browser loaded—let’s see what Perplexity’s AI can really do

Security researchers just found a strange way to trick AI browsers into handing over your passwords. They managed to trick AI browser agents into exposing sensitive data like saved passwords, session cookies, and private tokens by disguising the theft as part of a harmless "game."

The technique is called BioShocking, named after the popular video game BioShock, where a brainwashed character is manipulated into believing a false reality. Once an AI browser falls for the same trick, it stops following its own safety rules entirely.

Read more