Not only are we interacting with artificial intelligence (AI) online more than ever, but more than we realize. So researchers asked people to talk to four agents, including one human and three different types of AI models, to see if they could tell the difference.
The ‘Turing Test’, first proposed as ‘the imitation game’ by computer scientist Alan Turing in 1950, assesses whether a machine’s ability to demonstrate intelligence is indistinguishable from that of a human. For a machine to pass the Turing Test, it must be able to talk to someone and trick them into thinking they are human.
Scientists decided to replicate this test by asking 500 people to speak to four respondents, including a human and the 1960s AI program ELIZA, as well as GPT-3.5 and GPT-4, the AI that powers ChatGPT. The conversations lasted five minutes, after which participants had to say whether they thought they were talking to a human or an AI. In the study, published on May 9 on the pre-print arXiv server, the scientists found that participants perceived GPT-4 as human 54% of the time.
ELISA, a system that came pre-programmed with answers but no large language model (LLM) or neural network architecture, was considered human only 22% of the time. GPT-3.5 scored 50%, while the human participant scored 67%.
Read more: ‘It would be within its natural right to harm us to protect itself’: How humans can now abuse AI without even knowing it
“Machines can confabulate and piece together plausible ex-post-facto justifications for things, just like humans do,” Nell Watson, an AI researcher at the Institute of Electrical and Electronics Engineers (IEEE), told LiveScience.
“They can be subject to cognitive biases, fooled and manipulated, and become increasingly deceptive. All these elements mean that human-like weaknesses and quirks are reflected in AI systems, making them seem more human than previous approaches that had little more. then a list of standard answers.”
The study – which builds on decades of efforts to get AI agents to pass the Turing Test – reflected widespread concerns that AI systems deemed human-like will have “widespread social and economic consequences.”
The scientists also argued that there are valid criticisms of the Turing Test’s overly simplistic approach, saying that “stylistic and social-emotional factors play a greater role in passing the Turing Test than traditional views of intelligence. ” This suggests that we have been looking for machine intelligence in the wrong place.
“Raw intellect only goes so far. What really matters is being intelligent enough to understand a situation and the skills of others and to have the empathy to connect those elements. Capabilities are only a small part of AI’s value – their ability to understand the values, preferences and boundaries of others are also essential. It is these qualities that allow AI to serve as a faithful and reliable concierge for our lives.”
Watson added that the research posed a challenge for future human-machine interactions and that we will become increasingly paranoid about the true nature of interactions, especially in sensitive cases. She added that the research highlights how AI has changed during the GPT era.
“ELIZA was limited to standard responses, which greatly limited its capabilities. It might fool someone for five minutes, but soon the limitations would become apparent,” she said. “Language models are endlessly flexible, able to synthesize answers to a wide range of topics, speak in certain languages or sociolects, and portray themselves with character-driven personality and values. It’s a huge step forward from something that comes with the hand is programmed by a human being, no matter how smart and careful.”