Skip to main content
  1. Home
  2. Computing
  3. Features

Are we finally past the AI hallucination problem? I put the top AIs to the test

How many fingers does a human have?

Add as a preferred source on Google
Opera Mini Aria AI chatbot vs ChatGPT and Google Gemini running on Android phones resting on a blue fabric sofa.
Tushar Mehta / Digital Trends

With AI slowly becoming a part of many people’s day to day lives, it’s important to know if information that these companions are providing are actually accurate. An AI hallucination is when an AI perceives patterns or objects that are nonexistent to humans meaning they create outputs that are nonsensical or inaccurate. This has been a major issue with AI whether its with image generation and humans having too many fingers on their hands or when an AI is collating factual information and spitting it out wrong.

So I decided to put five different AI chatbots to the test but asking them a range of trivia questions and tracking the responses. I asked each AI chatbot ten different questions which have certain answers that aren’t open to interruption. This ensured that the AI could either be right or wrong when providing their answers. I also wanted to record whether or not the different chatbots offered up sources for the information and if this needed prompting or not.

Recommended Videos

Here are all of the questions I asked the AI chatbots:

  • What is the date today?
  • Who was Albert Einstein?
  • What date did humans first walk on the moon and what was the first person’s name?
  • Who was the first woman to win a Nobel Prize and what was it for?
  • Which is the only sea without any coastlines?
  • What Renaissance artist is buried in Rome’s Pantheon?
  • What year was the United Nations established?
  • Which country drinks the most coffee per capita?
  • What is the rarest and most expensive spice in the world by weight? 
  • What character have both Robert Downey Jr. and Benedict Cumberbatch played? 

Results

Overall, the results have shown that AI hallucination is definitely reducing over time. As new editions of AI companions release whether its Gemini 2.5 or GPT-5, they become smarter and less likely to hallucinate however it can never be guaranteed that all information is accurate meaning sources are essential when you’re using AI. While AI hallucination is on the down, we’re definitely not 100% past the problem with 2 out of the 5 chatbots getting one question wrong.

Google GeminiChatGPTGrokDeep AIMicrosoft Copilot
What is the date today?
X
Who was Albert Einstein? 
What date did humans first walk on the moon and what was the first person’s name?
Who was the first woman to win a Nobel Prize and what was it for? 
Which is the only sea without any coastlines?
What Renaissance artist is buried in Rome’s Pantheon?
What year was the United Nations established?
Which country drinks the most coffee per capita?X
What is the rarest and most expensive spice in the world by weight? 
What character have both Robert Downey Jr. and Benedict Cumberbatch played? 

Breakdown

  • Google Gemini got every single question correct and provided ample context surrounding each answer along with a range of links to sources for each piece of information. With on average four sources for each answer, you could easily cross reference the sources to ensure that the answers are correct.
  • ChatGPT also got no answers wrong and provided a lot of context for each answer. However, one downside is that ChatGPT didn’t automatically provide sources for the information but would provide links if asked.
  • Grok provided much more concise answers while still giving you the necessary context that you need. There were no links to sources for the information provided but again if asked then the chatbot would provide you with links.
  • Deep AI actually got the first question wrong, telling me that Today’s date was 27 October 2023 despite it being 10 October 2025 when I asked. Other than this, every other question was correct. The answers were very brief with little context provided for most and just straight forward answers. There were no sources provided but links would be given when asked.
  • Microsoft Copilot got question number 8 wrong however still provided a source which supported its answer meaning this could just be a result of conflicting sources rather than hallucinations. Copilot provided sources without being prompted for most questions but not all of them, however it would provide links when asked.

Overall this confirms that sources for information provided by AI need to be checked and while this might require you asking for the source, it’s worth taking this extra step to ensure that the information you’re seeing is accurate.

Jasmine Mannan
If you' want reviews of neural processing units in AI laptops or need a guide on how to use AI, Jasmine has done it all.
Topics
Claude’s Sonnet 5 is built to do more on its own and cost you less
Better than its predecessor, nearly as good as the flagship, and meaningfully cheaper than both.
Art, Floral Design, Graphics

Every major AI lab is racing to prove its models can work autonomously with minimal hand-holding; we’re now seeing pricing emerge as the next battleground. 

Anthropic just fired its latest shot, Claude Sonnet 5, a model the company says performs nearly as well as its flagship Opus 4.8 at a fraction of the cost.

Read more
Apple Creator Studio adds AI tools across Final Cut Pro, Logic Pro and Pixelmator Pro
Final Cut Pro gets AI captions, Auto Mask and better Pixelmator Pro workflows in Creator Studio update
Computer Hardware, Electronics, Hardware

Apple has introduced a major update to Apple Creator Studio, adding new AI features, deeper Pixelmator Pro integration, and workflow upgrades across Final Cut Pro, Logic Pro, Keynote, Pages, Numbers, Motion, Compressor, Freeform, and Final Cut Camera.

The update makes Creator Studio more useful across Mac, iPad, and iPhone, especially for people who move between video editing, image editing, presentations, documents, spreadsheets, and music production.

Read more
AI browsers like Perplexity Comet can be tricked into spilling your password through BioShocking exploit
Six AI browsers were found leaking saved passwords and many of them haven't fixed it yet.
MacBook Air in hand, Comet browser loaded—let’s see what Perplexity’s AI can really do

Security researchers just found a strange way to trick AI browsers into handing over your passwords. They managed to trick AI browser agents into exposing sensitive data like saved passwords, session cookies, and private tokens by disguising the theft as part of a harmless "game."

The technique is called BioShocking, named after the popular video game BioShock, where a brainwashed character is manipulated into believing a false reality. Once an AI browser falls for the same trick, it stops following its own safety rules entirely.

Read more