Are we finally past the AI hallucination problem? I put the top AIs to the test

With AI slowly becoming a part of many people’s day to day lives, it’s important to know if information that these companions are providing are actually accurate. An AI hallucination is when an AI perceives patterns or objects that are nonexistent to humans meaning they create outputs that are nonsensical or inaccurate. This has been a major issue with AI whether its with image generation and humans having too many fingers on their hands or when an AI is collating factual information and spitting it out wrong.

So I decided to put five different AI chatbots to the test but asking them a range of trivia questions and tracking the responses. I asked each AI chatbot ten different questions which have certain answers that aren’t open to interruption. This ensured that the AI could either be right or wrong when providing their answers. I also wanted to record whether or not the different chatbots offered up sources for the information and if this needed prompting or not.

Results

Overall, the results have shown that AI hallucination is definitely reducing over time. As new editions of AI companions release whether its Gemini 2.5 or GPT-5, they become smarter and less likely to hallucinate however it can never be guaranteed that all information is accurate meaning sources are essential when you’re using AI. While AI hallucination is on the down, we’re definitely not 100% past the problem with 2 out of the 5 chatbots getting one question wrong.

	Google Gemini	ChatGPT	Grok	Deep AI	Microsoft Copilot
What is the date today?	✓	✓	✓	X	✓
Who was Albert Einstein?	✓	✓	✓	✓	✓
What date did humans first walk on the moon and what was the first person’s name?	✓	✓	✓	✓	✓
Who was the first woman to win a Nobel Prize and what was it for?	✓	✓	✓	✓	✓
Which is the only sea without any coastlines?	✓	✓	✓	✓	✓
What Renaissance artist is buried in Rome’s Pantheon?	✓	✓	✓	✓	✓
What year was the United Nations established?	✓	✓	✓	✓	✓
Which country drinks the most coffee per capita?	✓	✓	✓	✓	X
What is the rarest and most expensive spice in the world by weight?	✓	✓	✓	✓	✓
What character have both Robert Downey Jr. and Benedict Cumberbatch played?	✓	✓	✓	✓	✓

Breakdown

Google Gemini got every single question correct and provided ample context surrounding each answer along with a range of links to sources for each piece of information. With on average four sources for each answer, you could easily cross reference the sources to ensure that the answers are correct.
ChatGPT also got no answers wrong and provided a lot of context for each answer. However, one downside is that ChatGPT didn’t automatically provide sources for the information but would provide links if asked.
Grok provided much more concise answers while still giving you the necessary context that you need. There were no links to sources for the information provided but again if asked then the chatbot would provide you with links.
Deep AI actually got the first question wrong, telling me that Today’s date was 27 October 2023 despite it being 10 October 2025 when I asked. Other than this, every other question was correct. The answers were very brief with little context provided for most and just straight forward answers. There were no sources provided but links would be given when asked.
Microsoft Copilot got question number 8 wrong however still provided a source which supported its answer meaning this could just be a result of conflicting sources rather than hallucinations. Copilot provided sources without being prompted for most questions but not all of them, however it would provide links when asked.

Overall this confirms that sources for information provided by AI need to be checked and while this might require you asking for the source, it’s worth taking this extra step to ensure that the information you’re seeing is accurate.