ChatGPT passes the famous ‘Turing test’ for human-like intelligence

  • Scientists claim that ChatGPT-4 is the first AI to pass a two-player Turing Test
  • The AI ​​was able to fool a human interlocutor in 54 percent of the cases



Since it was first proposed in 1950, passing the ‘Turing Test’ has been seen as one of the highest goals in AI.

But now researchers claim that ChatGPT has become the first AI to pass this famous test for human intelligence.

Proposed by computer pioneer Alan Turing, it argues that for AI to be considered truly intelligent, humans cannot tell whether they are talking to a human or a machine.

In a preprint article, cognitive scientists at UC San Diego claim that the ChatGPT-4 can fool human subjects more than half the time.

However, the researchers say this could say more about the Turing Test than about the intelligence of modern AI.

ChatGPT-4 has passed the famous ‘Turing test’ designed to see if computers have human-like intelligence
Overview of the Turing Test: A human interrogator (C) asks questions to an AI (A) and another human (B) and evaluates the answers. The interrogator doesn’t know who is what. If the AI ​​fools the interrogator into thinking its answers are generated by a human, it passes the test

What is the Turing Test?

The Turing Test was introduced in 1950 by World War II codebreaker Alan Turing.

He predicted that computers would one day be programmed to acquire skills that could rival human intelligence.

He proposed the test that would determine whether a computer can think.

A person called the interrogator has a text-based conversation with another person and a computer – and must determine which is which.

If this fails, the computer has passed the test.

In 1950, British World War II codebreaker Alan Turing created what he believed would be the ultimate test of computer intelligence.

He envisioned a human participant sitting behind a screen and talking to a human or a computer through a text interface.

If the computer could not be distinguished from a human on a wide range of possible subjects, Turing reasoned that we would have to admit that it was as intelligent as a human.

To replicate this famous test, the researchers asked 500 participants to speak to four different agents: three AIs and one human.

The AIs were ChatGPT-4, ChatGPT-3.5 – two models behind OpenAI’s popular chatbot – and a 1960s chat program called ELIZA.

Participants were placed in a chat room and told they had to convince the other person they were human, or find out who they were talking to.

Participants were then allowed to chat for five minutes before being asked whether they thought they were speaking to a computer or a human, why they thought that and how confident they were.

Participants were placed in a chat room with a human or a computer and had to guess who they were talking to
The experiment was a replica of the experiment that Alan Turing (photo) designed in the 1950s
Turing Test: Can You Tell the Difference? One of these conversations is with a human and all three others are with AIs. Read them carefully and take a guess. The answers are in the box below

While legacy ELIZA only fooled participants 22 percent of the time, ChatGPT-3.5 went unnoticed in 50 percent of conversations.

The more advanced version, ChatGPT-4, did even better, with participants labeling it as human 54 percent of the time.

Click here to change the format of this module

This means that ChatGPT-4 can occur to a human more often than chance would suggest.

And while this may seem small, it’s worth noting that participants correctly identified people as such in only 67 percent of conversations.

The researchers write that these results provide “the first robust empirical demonstration that any artificial system passes an interactive two-player Turing Test.”

It’s worth noting that this is a pre-print paper, meaning it is currently awaiting peer review, so the results should be taken with some degree of care.

However, if the results are supported, this would be the first strong evidence that an AI has ever passed the Turing Test, as envisioned by Alan Turing.

Nell Watson, an AI researcher at the Institute of Electrical and Electronics Engineers (IEEE), told LiveScience: ‘Machines can confabulate and piece together plausible ex-post-facto justifications for things, just like humans do.

‘All these elements mean that human weaknesses and quirks are reflected in AI systems, making them appear more human than previous approaches that had little more than a list of canned responses.’

People were correctly identified as humans in just over 60 percent of cases (blue bar), while ChatGPT-4 was able to fool its interlocutors in 54 percent of cases

Turing Test – Answers

Chat A: ChatGPT-4

Chat B: Human

Chat C: ChatGPT-3.5

Chat D: ELISA

Importantly, the low performance of the ELIZA program also supports the importance of these results.

While it may seem strange to include a program from the 1960s in a test of high technology, this model was included to test for something called the “ELIZA effect.”

The ELIZA effect is the idea that people can attribute human-like characteristics to even very simple systems.

But the fact that people were fooled by ChatGPT and not ELIZA suggests that this result is ‘non-trivial’.

The researchers also point out that changing public perception of AI could have changed the results we can expect from the Turing Test.

They write: ‘At first glance, the low success rate might be surprising.

“If the test measures human likeness, shouldn’t people be 100%?”

According to the new study, this is the first time an AI has passed the test that Alan Turing invented in 1950. The early computer pioneer’s life and invention of the Turing Test was famously dramatized in The Imitation Game, starring Benedict Cumberbatch (pictured)

Click here to change the format of this module

In 1950, this assumption would make perfect sense, because in a world without advanced AI, we would assume that anything that sounds human is human.

But as the public becomes more aware of AI and our trust in AI increases, we are more likely to misidentify humans as AI.

This could mean that the small gap between the success rate of humans and ChatGPT-4 is even more compelling as evidence for computer intelligence.

In February this year, researchers at Stanford discovered that ChatGPT could pass a version of the Turing Test in which the AI ​​answered a commonly used personality test.

While these researchers found that ChatGPT-4’s results were indistinguishable from those of humans, this latest paper marks one of the first times the AI ​​has passed a robust two-player Turing Test based on conversations.

However, the researchers also acknowledge that there are long-standing and valid criticisms of the Turing test.

The researchers point out that “stylistic and social-emotional factors play a greater role in passing the Turing Test than traditional notions of intelligence.”

The researchers say this doesn’t necessarily show that AI has become intelligent, just that it has become better at imitating humans (stock image)

Interrogators were far more likely to cite style, personality and tone as reasons for identifying their conversation partner as a robot than anything to do with intelligence.

Similarly, one of the most successful strategies for identifying robots was asking about human experiences, which worked 75 percent of the time.

This suggests that the Turing Test does not actually prove that a system is intelligent, but rather measures its ability to imitate or deceive humans.

At best, the researchers suggest this provides ‘probabilistic’ support for the claim that ChatGPT is intelligent.

Participants were more likely to identify the AI ​​based on an assessment of personality and details about themselves, rather than based on intelligence

Click here to change the format of this module

But this doesn’t mean the Turing Test is worthless, as the researchers note that the ability to impersonate a human will have enormous economic and social consequences.

The researchers say that sufficiently persuasive AIs can “fill economically valuable, customer-facing roles historically reserved for human workers, mislead the general public or their own human operators, and erode social trust in authentic human interactions.”

Ultimately, the Turing Test could be just part of what we need to assess if we want to develop an AI system.

Mrs. Watson says, “Raw intellect only goes so far. What really matters is that you are intelligent enough to understand a situation and the skills of others and to have the empathy to connect these elements.

‘Capabilities are only a small part of AI’s value – their ability to understand the values, preferences and boundaries of others is also essential.’

Leave a Comment