- Scientists claim that ChatGPT-4 is the first AI to pass a two-player Turing Test
- The AI was able to fool a human interlocutor in 54 percent of the cases
Since it was first proposed in 1950, passing the ‘Turing Test’ has been seen as one of the highest goals in AI.
But now researchers claim that ChatGPT has become the first AI to pass this famous test for human intelligence.
Proposed by computer pioneer Alan Turing, it argues that for AI to be considered truly intelligent, humans cannot tell whether they are talking to a human or a machine.
In a preprint article, cognitive scientists at UC San Diego claim that the ChatGPT-4 can fool human subjects more than half the time.
However, the researchers say this could say more about the Turing Test than about the intelligence of modern AI.
In 1950, British World War II codebreaker Alan Turing created what he believed would be the ultimate test of computer intelligence.
He envisioned a human participant sitting behind a screen and talking to a human or a computer through a text interface.
If the computer could not be distinguished from a human on a wide range of possible subjects, Turing reasoned that we would have to admit that it was as intelligent as a human.
To replicate this famous test, the researchers asked 500 participants to speak to four different agents: three AIs and one human.
The AIs were ChatGPT-4, ChatGPT-3.5 – two models behind OpenAI’s popular chatbot – and a 1960s chat program called ELIZA.
Participants were placed in a chat room and told they had to convince the other person they were human, or find out who they were talking to.
Participants were then allowed to chat for five minutes before being asked whether they thought they were speaking to a computer or a human, why they thought that and how confident they were.
While legacy ELIZA only fooled participants 22 percent of the time, ChatGPT-3.5 went unnoticed in 50 percent of conversations.
The more advanced version, ChatGPT-4, did even better, with participants labeling it as human 54 percent of the time.
Click here to change the format of this module
This means that ChatGPT-4 can occur to a human more often than chance would suggest.
And while this may seem small, it’s worth noting that participants correctly identified people as such in only 67 percent of conversations.
The researchers write that these results provide “the first robust empirical demonstration that any artificial system passes an interactive two-player Turing Test.”
It’s worth noting that this is a pre-print paper, meaning it is currently awaiting peer review, so the results should be taken with some degree of care.
However, if the results are supported, this would be the first strong evidence that an AI has ever passed the Turing Test, as envisioned by Alan Turing.
Nell Watson, an AI researcher at the Institute of Electrical and Electronics Engineers (IEEE), told LiveScience: ‘Machines can confabulate and piece together plausible ex-post-facto justifications for things, just like humans do.
‘All these elements mean that human weaknesses and quirks are reflected in AI systems, making them appear more human than previous approaches that had little more than a list of canned responses.’
Importantly, the low performance of the ELIZA program also supports the importance of these results.
While it may seem strange to include a program from the 1960s in a test of high technology, this model was included to test for something called the “ELIZA effect.”
The ELIZA effect is the idea that people can attribute human-like characteristics to even very simple systems.
But the fact that people were fooled by ChatGPT and not ELIZA suggests that this result is ‘non-trivial’.
The researchers also point out that changing public perception of AI could have changed the results we can expect from the Turing Test.
They write: ‘At first glance, the low success rate might be surprising.
“If the test measures human likeness, shouldn’t people be 100%?”
Click here to change the format of this module
In 1950, this assumption would make perfect sense, because in a world without advanced AI, we would assume that anything that sounds human is human.
But as the public becomes more aware of AI and our trust in AI increases, we are more likely to misidentify humans as AI.
This could mean that the small gap between the success rate of humans and ChatGPT-4 is even more compelling as evidence for computer intelligence.
In February this year, researchers at Stanford discovered that ChatGPT could pass a version of the Turing Test in which the AI answered a commonly used personality test.
While these researchers found that ChatGPT-4’s results were indistinguishable from those of humans, this latest paper marks one of the first times the AI has passed a robust two-player Turing Test based on conversations.
However, the researchers also acknowledge that there are long-standing and valid criticisms of the Turing test.
The researchers point out that “stylistic and social-emotional factors play a greater role in passing the Turing Test than traditional notions of intelligence.”
Interrogators were far more likely to cite style, personality and tone as reasons for identifying their conversation partner as a robot than anything to do with intelligence.
Similarly, one of the most successful strategies for identifying robots was asking about human experiences, which worked 75 percent of the time.
This suggests that the Turing Test does not actually prove that a system is intelligent, but rather measures its ability to imitate or deceive humans.
At best, the researchers suggest this provides ‘probabilistic’ support for the claim that ChatGPT is intelligent.
Click here to change the format of this module
But this doesn’t mean the Turing Test is worthless, as the researchers note that the ability to impersonate a human will have enormous economic and social consequences.
The researchers say that sufficiently persuasive AIs can “fill economically valuable, customer-facing roles historically reserved for human workers, mislead the general public or their own human operators, and erode social trust in authentic human interactions.”
Ultimately, the Turing Test could be just part of what we need to assess if we want to develop an AI system.
Mrs. Watson says, “Raw intellect only goes so far. What really matters is that you are intelligent enough to understand a situation and the skills of others and to have the empathy to connect these elements.
‘Capabilities are only a small part of AI’s value – their ability to understand the values, preferences and boundaries of others is also essential.’