New AI Method Improves Prediction Accuracy and Reliability – Neuroscience News

Resume: Researchers have developed a new approach to improve uncertainty estimates in machine learning models, increasing prediction accuracy. Their method, IF-COMP, uses the principle of minimum description length to provide more reliable confidence measures for AI decision-making, critical in high-stakes settings like healthcare.

This scalable technique can be applied to large models, allowing non-experts to determine the reliability of AI predictions. The findings could lead to better decision-making in real-world applications.

Key Facts:

  1. Improved accuracy: IF-COMP improves uncertainty estimates in AI forecasting.
  2. Scalability: Applicable to large, complex models in critical environments such as healthcare.
  3. User friendly: Helps non-experts assess the reliability of AI decisions.

Source: MIT

Because machine learning models can make false predictions, researchers often give them the ability to tell a user how confident they are about a particular decision. This is especially important in high-stakes situations, such as when models are used to identify diseases in medical images or to filter job applications.

But a model’s uncertainty quantifications are only useful if they are accurate. If a model says that it is 49% certain that a medical image shows a pleural effusion, then the model should be right 49% of the time.

The researchers tested their system on these three tasks and found it was faster and more accurate than other methods. Credit: Neuroscience News

MIT researchers have introduced a new approach that can improve uncertainty estimates in machine learning models. Their method not only generates more accurate uncertainty estimates than other techniques, but also does so more efficiently.

Furthermore, because the technique is scalable, it can be applied to large deep learning models that are increasingly being deployed in healthcare and other safety-critical situations.

This technique can provide end users, many of whom lack the necessary machine learning expertise, with better information to help them determine whether they can trust a model’s predictions or whether the model should be deployed for a particular task.

“It’s easy to see that these models do very well in scenarios where they’re very good, and then assume that they’ll be just as good in other scenarios.

“That’s why it’s especially important to promote this kind of work, which aims to better calibrate the uncertainty of these models to ensure that they match human perceptions of uncertainty,” said lead author Nathan Ng, a doctoral student at the University of Toronto who is a visiting scholar at MIT.

Ng co-wrote the paper with Roger Grosse, assistant professor of computer science at the University of Toronto; and lead author Marzyeh Ghassemi, associate professor in the Department of Electrical Engineering and Computer Science and a member of the Institute of Medical Engineering Sciences and the Laboratory for Information and Decision Systems. The research is being presented at the International Conference on Machine Learning.

Quantifying uncertainty

Uncertainty quantification methods often require complex statistical calculations that do not scale well to machine learning models with millions of parameters. These methods also require users to make assumptions about the model and the data used to train it.

The MIT researchers took a different approach. They used the so-called minimum description length principle (MDL), which does not require assumptions that can hinder the accuracy of other methods. MDL is used to better quantify and calibrate uncertainty for test points that the model needs to label.

The technique the researchers developed, known as IF-COMP, makes MDL fast enough for use with the large deep learning models used in many real-world situations.

MDL involves considering all possible labels that a model can give to a test point. If there are many alternative labels for this point that fit well, the confidence in the chosen label should decrease accordingly.

“One way to understand how confident a model is is to give it some counterfactual information and see how likely the model is to believe you,” Ng says.

For example, consider a model that says that a medical image shows a pleural effusion. If the researchers tell the model that this image shows edema, and it is willing to revise its belief, then the model should be less certain of its original decision.

With MDL, if a model is certain when it labels a data point, it should use a very short code to describe that point. If it is uncertain about its decision because the point could have many other labels, it uses a longer code to capture those possibilities.

The amount of code used to label a data point is known as stochastic data complexity. If researchers ask the model how willing it is to update its belief about a data point based on counter-evidence, stochastic data complexity should decrease if the model is confident.

But testing every data point with MDL requires an enormous amount of computational effort.

Accelerate the process

With IF-COMP, the researchers developed an approximation technique that can accurately estimate stochastic data complexity using a special function known as an influence function. They also used a statistical technique called temperature scaling, which improves the calibration of the model outputs. This combination of influence functions and temperature scaling enables high-quality approximations of stochastic data complexity.

Ultimately, IF-COMP can efficiently produce well-calibrated uncertainty quantifications that reflect the true confidence of a model. The technique can also determine whether the model has mislabeled certain data points or reveal which data points are outliers.

The researchers tested their system on these three tasks and found that it was faster and more accurate than other methods.

“It’s really important to have some confidence that a model is well-calibrated, and there’s a growing need to detect when a specific prediction doesn’t look quite right. Auditing tools are becoming increasingly necessary for machine learning problems, because we’re using large amounts of unexamined data to build models that are applied to problems that people face,” Ghassemi says.

IF-COMP is model agnostic, so it can provide accurate uncertainty quantifications for many types of machine learning models. This could allow it to be implemented in a wider range of real-world settings, ultimately helping more professionals make better decisions.

“People need to understand that these systems are very fallible and can make things up as they go. A model can look very confident, but there are a lot of different things it is willing to believe, given evidence to the contrary,” Ng says.

In the future, the researchers plan to apply their approach to large language models and study other possible application cases for the minimum description length principle.

About this AI research news

Author: Melanie Grados
Source: MIT
Contact: Melanie Grados – MIT
Image: The image is attributed to Neuroscience News

Original research: Closed access.
“Measuring stochastic data complexity with Boltzmann influence functions” by Roger Grosse et al. arXiv


Abstract

Measuring stochastic data complexity with Boltzmann influence functions

Estimating the uncertainty of a model’s prediction at a test point is crucial to ensure reliability and calibration under shifts in the distribution.

A minimum description length approach uses the predictive normalized maximum likelihood (pNML) distribution for this problem. This distribution considers every possible label for a data point and reduces the confidence in a prediction if other labels are also consistent with the model and the training data.

In this work, we propose IF-COMP, a scalable and efficient approximation to the pNML distribution that linearizes the model with a temperature-scaled Boltzmann influence function. IF-COMP can be used to produce well-calibrated predictions at test points and measure complexity in both labeled and unlabeled settings.

We experimentally validate IF-COMP on uncertainty calibration, mislabeling detection, and OOD detection tasks, where it consistently matches or outperforms stronger baseline methods.

Leave a Comment