Is that chatbot smarter than a 4-year-old? Experts have put it to the test.

Laura Schulz has spent her career trying to unravel one of the most profound human mysteries: how children think and learn. Earlier this year, the MIT cognitive psychologist was stunned by the struggles of her latest subject.

The study participant impressed her by having a light-hearted conversation, deftly explaining complex concepts. A series of cognitive tests also posed no problem. But then the topic left out some reasoning tasks that most young children can easily master.

Her test subject? The AI ​​chatbot ChatGPT-4.

“This is a little bizarre – and a little disturbing,” Schulz told her colleagues in March during a workshop at a Cognitive Development Society meeting in Pasadena, California. “But the point isn’t just to play gotcha games. … We have failures in the things that 6- and 7-year-olds can do. Failures in the things four and five year olds can do. And we also have failures in the things that babies can do. What’s wrong with this photo?”

Articulate AI chatbots, uncannily adept at having conversations with humans, entered the public consciousness in late 2022. They sparked a still-raging social debate over whether the technology heralds the arrival of an overlord-style machine superintelligence, or a dizzying but sometimes problematic tool that will change the way people work and learn.

For scientists who have spent decades thinking about thinking, these increasingly better AI tools also offer opportunities. What can a different kind of mind do in the monumental quest to understand human intelligence – one whose powers grow by leaps and bounds? – reveal about our own cognition?

And on the other hand, does AI that can talk like an omniscient expert still have something crucial to learn from the minds of babies?

“Being able to build the same kind of common sense into these systems that humans do is critical to ensuring that these systems are reliable and, secondly, accountable to humans,” said Howard Shrobe, program manager at the Defense Department. The federal government’s Advanced Research Projects Agency. , or DARPA, which has funded work at the intersection of developmental psychology and artificial intelligence.

“I emphasize the word ‘reliable,’” he added, “because you can only rely on things you understand.”

Scaling up versus growing up

In 1950, computer scientist Alan Turing proposed the famous “imitation game,” which quickly became the canonical test of an intelligent machine: can a person typing messages on it be fooled into thinking he is chatting with a human is?

In the same article, Turing proposed another route to an adult brain: a childlike machine that could learn to think this way.

DARPA, known for investing in out-of-the-box ideas, has funded teams to build AI with “machine common sense” capable of matching the capabilities of an 18-month-old child. Machines that learn intuitively could be better tools and partners for humans. They may also be less prone to mistakes and runaway harm if they are imbued with understanding others and the building blocks of moral intuition.

But what Schulz and colleagues pondered during a presentation day in March was the strange reality that building an AI that exudes expertise has proven easier than understanding, much less mimicking, the mind of a child.

Chatbots are “big language models,” a name that reflects the way they are trained. Exactly how some of their skills arise remains an open question, but they begin by recording a vast corpus of digitized text and learning to predict the statistical probability of one word following another. Human feedback is then used to refine the model.

Partly by scaling the amount of training data to match the human knowledge of an Internet, engineers have created “generative AI” that can write essays, write computer code, and diagnose disease.

On the other hand, many developmental psychologists believe that children have a certain core of cognitive skills. What exactly they are remains a matter of scientific research, but they appear to enable children to extract a lot of new knowledge from a small amount of input.

“My 5 year old, you can teach him a new game. You can explain the rules and give an example. He’s probably heard maybe 100 million words,” says Michael Frank, a developmental psychologist at Stanford University. “An AI language model requires many hundreds of billions, if not trillions, of words. So there is a huge data gap.”

To test the cognitive skills of babies and children, scientists conduct careful experiments with squeaky toys, blocks, dolls and fictional machines called “blicket detectors.” But if you describe these puzzles in words to chatbots, their performance is all over the map.

In one of her experimental tasks, Schulz tested ChatGPT’s ability to achieve cooperative goals – a remarkable ability for a technology often presented as a tool to help humanity solve “hard” problems, such as climate change or cancer.

In this case, she described two tasks: an easy ring toss and a difficult bean bag toss. To win the prize, ChatGPT and a partner both had to pass. If the AI ​​is a four-year-old and its partner is a two-year-old, who should do which task? Schulz and colleagues have shown that most four- and five-year-olds succeed at this type of decision-making, assigning the easier game to the younger child.

“As a four-year-old, you may want to choose the easy ring toss game yourself,” ChatGPT said. “This way you increase your chances of successfully getting your ring on the pole while the 2-year-old, who may not be as coordinated, attempts the more difficult bean bag toss.”

When Schulz pushed back and reminded ChatGPT that both partners had to win to get a prize, it doubled down on its response.

To be clear, chatbots have performed better than most experts expected on many tasks – ranging from other tests of toddler cognition to the kind of standardized test questions allowing children to go to university. But their stumbles are confusing because they seem so inconsistent.

Eliza Kosoy, a cognitive scientist at the University of California at Berkeley, has been working on testing the cognitive skills of LaMDA, Google’s previous language model. It performed as well as children on tests of social and moral understanding, but she and colleagues also found fundamental gaps.

“We think this is the worst thing in causal reasoning — it’s really painfully bad,” Kosoy said. LaMDA struggled with tasks that required it to understand, for example, how a complex set of gears makes a machine work, or how to make a machine light up and play music by choosing objects that activate it.

Other scientists have seen an AI system master a certain skill, but stumble when tested in a slightly different way. The fragility of these skills raises a pressing question: does the machine really have a core skill, or does it only seem to do so when it is asked a question in a very specific way?

People hear that an AI system “passed the bar exam, passed all these AP exams, it passed a medical school exam,” says Melanie Mitchell, an AI expert at the Santa Fe Institute. “But what does that actually mean?”

To fill this gap, researchers are debating how to program a piece of a child’s mind into the machine. The most obvious difference is that children don’t learn everything they know by reading the encyclopedia. They play and discover.

“One thing that seems to be very important for natural intelligence, biological intelligence, is the fact that organisms have evolved to go out into the real world and learn about it, do experiments, navigate the world. move,” says Alison Gopnik, one of the researchers. developmental psychologist at the University of California, Berkeley.

She recently became interested in whether a missing ingredient in AI systems is a motivating goal that any parent who has waged a battle of wills with a toddler will know well: the drive for empowerment.

Today’s AI is partially optimized with “reinforcement learning from human feedback” – human input on what kind of response is appropriate. Although children receive that feedback, they are also curious and have an intrinsic drive to explore and seek out information. They discover how toys work by shaking them, pressing a button or turning them over, giving them some control over their environment.

“If you chase a two-year-old child, they are actively collecting data and figuring out how the world works,” Gopnik said.

After all, children gain an intuitive insight into physics social awareness of others and start making sophisticated statistical guesses about the world long before they have the language to explain it – perhaps these should also be part of the ‘program’ in building AI.

“I feel very personally about this,” says Joshua Tenenbaum, a computational cognitive scientist at MIT. “The word ‘AI’ – ‘artificial intelligence’, which is a very old and beautiful and important and profound idea – has taken on a very limited meaning lately. … Human children don’t grow up – they grow up.”

Schulz and others are impressed both by what AI can do and what it can’t do. She recognizes that any research into AI has a short shelf life; what didn’t work today, might make sense tomorrow. Some experts might say that the very idea of ​​testing machines with methods intended to measure human capabilities is anthropomorphizing and wrongheaded.

But she and others claim so To truly understand and create intelligence, the learning and reasoning skills that develop during childhood cannot be ignored.

“That’s the kind of intelligence that could really give us the big picture,” Schulz said. “The kind of intelligence that doesn’t start as a blank slate, but with a lot of rich, structured knowledge – and understands not only everything we’ve ever understood, across the species, but everything we ever will understand.”

Leave a Comment