Why artificial intelligence often struggles with math

In the school year that ended recently, one class of learners stood out as a seeming puzzle. They are hardworking, improving and remarkably articulate. But curiously, these learners — artificially intelligent chatbots — often struggle with math.

Chatbots such as Open AI’s ChatGPT can write poetry, summarize books and answer questions, often with human-level fluency. These systems can do math, based on what they have learned, but the results can vary and be wrong. They are fine-tuned for determining probabilities, not doing rules-based calculations. Likelihood is not accuracy, and language is more flexible, and forgiving, than math.

“The AI chatbots have difficulty with math because they were never designed to do it,” said Kristian Hammond, a computer science professor and artificial intelligence researcher at Northwestern University.

The world’s smartest computer scientists, it seems, have created AI that is more liberal arts major than numbers whiz.

That, on the face of it, is a sharp break with computing’s past. Since the early computers appeared in the 1940s, a good summary definition of computing has been “math on steroids.” Computers have been tireless, fast, accurate calculating machines. Crunching numbers has long been what computers are really good at, far exceeding human performance.

Traditionally, computers have been programmed to follow step-by-step rules and retrieve information in structured databases. They were powerful but brittle. So, past efforts at AI hit a wall.

Yet, more than a decade ago, a different approach broke through and began to deliver striking gains. The underlying technology, called a neural network, is loosely modeled on the human brain.

This kind of AI is not programmed with rigid rules, but learns by analyzing vast amounts of data. It generates language, based on all the information it has absorbed, by predicting what word or phrase is most likely to come next — much as humans do.

“This technology does brilliant things, but it doesn’t do everything,” Hammond said. “Everybody wants the answer to AI to be one thing. That’s foolish.”

  Caleb Williams Sends Message to Bears OL After Sack in Bengals Win

At times, AI chatbots have stumbled with simple arithmetic and math word problems that require multiple steps to reach a solution, something recently documented by some technology reviewers. The AI’s proficiency is getting better, but it remains a shortcoming.

Speaking at a recent symposium, Kristen DiCerbo, chief learning officer of Khan Academy, an education nonprofit that is experimenting with an AI chatbot tutor and teaching assistant, introduced the subject of math accuracy. “It is a problem, as many of you know,” DiCerbo told the educators.

Related Articles

Technology |


Salazars begin site work on $150M Alamo Drafthouse-anchored Glendale project

Technology |


Kroger and Albertsons merger on hold until Denver court rules on lawsuit

Technology |


Unionized janitors vote to authorize strike across metro Denver if contract talks fail

Technology |


A new Trader Joe’s is coming to the Denver metro in 2025

Technology |


On NYC beaches, angry birds fight drones patrolling for sharks and struggling swimmers

A few months ago, Khan Academy made a significant change to its AI-powered tutor, called Khanmigo. It sends many numerical problems to a calculator program instead of asking the AI to solve the math. While waiting for the calculator program to finish, students see the words “doing math” on their screens and a Khanmigo icon bobbing its head.

“We’re actually using tools that are meant to do math,” said DiCerbo, who remains optimistic that conversational chatbots will play an important role in education.

For more than a year, ChatGPT has used a similar workaround for some math problems. For tasks such as large-number division and multiplication, the chatbot summons help from a calculator program.

Math is an “important ongoing area of research,” OpenAI said in a statement, and a field where its scientists have made steady progress. Its new version of GPT achieved nearly 64% accuracy on a public database of thousands of problems requiring visual perception and mathematical reasoning, the company said. That is up from 58% for the previous version.

  DWTS Alum Felt ‘Awful’ About Who Won the Mirrorball on His Season

The AI chatbots often excel when they have consumed vast quantities of relevant training data — textbooks, drills and standardized tests. The effect is that the chatbots have seen and analyzed very similar, if not the same, questions before. A recent version of the technology that underlies ChatGPT scored in the 89th percentile in the math SAT test for high school students, the company said.

The technology’s erratic performance in math adds grist to a spirited debate in the AI community about the best way forward in the field. Broadly, there are two camps.

On one side are those who believe that the advanced neural networks, known as large language models, that power AI chatbots are almost a singular path to steady progress and eventually to artificial general intelligence, or AGI, a computer that can do anything the human brain can do. That is the dominant view in much of Silicon Valley.

But there are skeptics who question whether adding more data and computing firepower to the large language models is enough. Prominent among them is Yann LeCun, chief AI scientist at Meta.

The large language models, LeCun has said, have little grasp of logic and lack common-sense reasoning. What’s needed, he insists, is a broader approach, which he calls “world modeling,” or systems that can learn how the world works much as humans do. And it may take a decade or so to achieve.

In the meantime, though, Meta is incorporating AI-powered smart assistant software in its social media services including Facebook, Instagram and WhatsApp, based on its large language model, LLaMA. The current models may be flawed, but they still do a lot.

David Ferrucci led the team that built IBM’s famed Watson computer, which beat the best-ever human “Jeopardy!” players in 2011. Like most computer scientists, Ferrucci finds the latest AI technology undeniably impressive — but mainly for its language skills, not for its accuracy. His startup, Elemental Cognition, develops software to improve business decision-making in fields such as finance, travel and drug discovery. It uses large language models as one ingredient, but also more rules-based software.

  Dear Abby: I want to reconnect with man I dumped for being too short

That structured software, Ferrucci said, is the computing infrastructure that currently runs much of the world’s essential systems such as banking, supply chains and air traffic control. “For a lot of things that really matter, painful precision is required,” he said.

Kirk Schneider, a high school math teacher in New York, says he views the incursion of AI chatbots into education as inevitable. School administrators can try to ban them, but students are going to use them, he said.

Schneider still has some qualms. “They’re usually fine, but usually isn’t good enough in math. It’s got to be accurate,” he said. “It’s got to be right.”

Those occasional slipups have turned out, though, to be a teaching opportunity. Schneider often divides his classes into small groups of students, and the chatbot answers can be a focal point of discussion. Compare your answer to the bot’s. Who’s right? How did each of you arrive at your solution?

“It teaches them to look at things with a critical eye and sharpens critical thinking,” he said. “It’s similar to asking another human — they might be right and they might be wrong.”

It seems like a life lesson for his students, one worth remembering long after they have forgotten the Pythagorean theorem: Don’t believe everything an AI program tells you. Don’t trust it too much.

This article originally appeared in The New York Times.

Get more business news by signing up for our Economy Now newsletter.

(Visited 1 times, 1 visits today)

Leave a Reply

Your email address will not be published. Required fields are marked *