Did you know that cats have been to the moon? That it is safe to stare at the sun for 15 minutes or even longer, as long as you have dark skin? Or that, to stay healthy, you should eat one small stone a day?
These are some of the latest pearls of wisdom that Google has offered to its US users (we’re not so lucky here in Britain yet). “Let Google search for you,” the search giant promised when it introduced a feature called AI Overviews earlier this month. This integrates Google’s generative AI model Gemini into the search engine. The answers it generates appear above the traditional list of ranked results. And you can’t lose them.
AI Overviews has not had the effect Google had hoped for, to say the least. It certainly created instant internet virality, with people sharing their favorite answers. Not because these are useful, but because they are so ridiculous. For example, if you ask AI Overviews for a list of fruits that end in “um,” it returns: “Applum, Strawberrum, and Coconut.” This is what in AI parlance is called a ‘hallucination’.
Despite a $2 trillion market cap and the ability to hire the world’s biggest minds, Google continues to stumble in AI. The first attempt to join the generative AI gold rush in February last year was the ill-fated Bard chatbot, which had similar problems with expressing factual inaccuracies. During his first live demonstration, Bard falsely declared that the James Webb Space Telescope, which wasn’t launched until 2021, had taken “the first photos” ever of Earth from outside the solar system. The mistake wiped out Google’s market value by $100 billion.
In February, Google tried AI again, this time with Gemini, an image and text generator. The problem was that there were very heavy-handed diversity guardrails. When asked to produce historically accurate images, this would instead produce black Nazi soldiers, Native American Founding Fathers, and a South Asian female Pope.
This was “a well-intentioned mistake,” pleaded The economist. But Google wasn’t surprised by the problems inherent in generative AI. It will have been aware of its possibilities and pitfalls.
Before the current AI mania really took off, analysts had already figured that generative AI is unlikely to improve the user experience, and could even worsen it. This caution was abandoned as investors started pouring in.
So why does Google’s AI bring out such poor results? In fact, it works exactly as you would expect. Don’t be fooled by the “artificial intelligence” branding. Essentially, AI Overviews simply tries to guess the next word it should use, based on statistical probability, but without any hold on reality. The algorithm can’t say “I don’t know” when asked a difficult question because it doesn’t “know” anything. It can’t even do simple math, as users have shown, because it has no underlying concept of numbers or valid arithmetic operations. Hence the hallucinations and omissions.
This is less of an issue when the output doesn’t matter much, such as when AI processes an image and causes a minor glitch. Our phones use machine learning to process our photos every day, and most problems we don’t notice or care about. But the fact that Google advises us all to start eating stones is no small problem.
Such mistakes are more or less inevitable because of the way the AI is trained. Instead of learning from a curated dataset with precise information, AI models are trained on a huge, virtually open dataset. Google’s AI and ChatGPT have already collected as much as possible from the internet and needless to say, much of what is on the internet is not true. Forums like Reddit are teeming with sarcasm and jokes, but these are treated by the AI as trustworthy, as sincere and correct explanations for problems. Programmers have long used the phrase “GIGO” to describe what’s going on here: garbage in, garbage out.
AI’s hallucination problem is consistent across domains. It pretty much rules out generative AI from being practically useful in commercial and enterprise applications, where you would expect it to save a lot of time. A new study on generative AI in legal work finds that the extra verification steps now required to ensure the AI isn’t hallucinating will negate the time saved by deploying it in the first place.
‘[Programmers] still make the same blunt mistakes as before. No one has actually solved hallucinations with major language models and I don’t think we can,” cognitive scientist and veteran AI skeptic Professor Gary Marcus noted last week.
Another problem now comes into view. The AI makes an already bad job worse by generating false information, which then pollutes the rest of the internet. “Google learns what junk it encounters on the internet and nothing generates junk better than AI,” as one X user put it.
Last year, the leading AI companies recognized that because they could no longer scrape content from the Internet, they began using synthetic training data – that is, data generated by the generative AI itself. A year ago, OpenAI’s Sam Altman said he was “pretty sure that soon all data will be synthetic data” made up by other AIs.
This is a huge problem. It actually causes the models to ‘collapse’ and no longer produce useful results. “Model collapse is when generative AI becomes unstable, unreliable, or stops functioning. It could happen when generative AI models are trained on content generated by AI rather than humans,” Professor Nigel Shadbolt of the Open Data Institute warned last December. One researcher, Jathan Sadowski, has called this phenomenon “Habsburg AI,” after the Spanish Habsburg dynasty, which became extinct in 1700 due to diseases caused by inbreeding.
You could argue that something like this is already happening without the help of AI, for example when a false fact is posted on Wikipedia, quoted in the media, and then the media quotes become the justification for its continued inclusion on Wikipedia.
AI simply automates and accelerates this process of generating falsehoods. This week is the Telegraph gave the following example: ‘When Google claimed that there was no African country beginning with the letter K, the answer appeared to be based on a web discussion in which ChatGPT answered the same question incorrectly. In other words, AI is now using other AI fabrications as gospel.”
The most striking description of this phenomenon comes from some American researchers, who last year coined the term ‘Model Autophagy Disorder’ or MAD. They wanted to call out the practice of introducing bovine prions into the livestock food supply, a practice that caused bovine spongiform encephalopathy, or mad cow disease. “Our key conclusion across all scenarios is that without enough new real data in each generation of an autophagoic loop, future generative models are doomed to see their quality (precision) or diversity (recall) gradually decline,” they wrote.
Very few people warned about the downsides of generative AI when OpenAI opened its ChatGPT tool in November 2022. Now ChatGPT has polluted the internet and poisoned itself and other AI tools. Cleaning this up will be a huge challenge. While the promised benefits of AI remain elusive, the costs are clearly starting to add up.
Andreas Orlowski is a weekly columnist at the Telegraph. Visit his website here. Follow him on X: @AndrewOrlowski.