Postavke privatnosti

Developing a technique to assess the reliability of foundation models prior to application to specific tasks from MIT and MIT-IBM Watson AI Laboratory

Researchers from mit and the MIT-IBM Watson AI Laboratory have developed a new technique to assess the reliability of foundation models before applying them to specific tasks, using an algorithm to assess model consistency. This solution can help reduce errors in safety-critical situations and enable better model selection without the need to test on actual data.

Developing a technique to assess the reliability of foundation models prior to application to specific tasks from MIT and MIT-IBM Watson AI Laboratory
Photo by: Domagoj Skledar/ arhiva (vlastita)

Researchers at MIT and the MIT-IBM Watson AI Lab have developed a technique to evaluate the reliability of foundation models before applying them to a specific task. They achieve this by analyzing a set of foundation models that slightly differ from each other. The algorithm assesses the consistency of the representations each model learns about the same test data. If the representations are consistent, the model is considered reliable.

Comparing their technique with state-of-the-art methods, the researchers found that their method is better at capturing the reliability of foundation models across various classification tasks.

This technique allows users to decide whether to apply the model in a specific environment without the need for testing on real data. This is especially useful in situations where data may not be available due to privacy issues, such as health data. Additionally, the technique can rank models according to their reliability results, allowing users to choose the best model for their task.

„All models can make mistakes, but models that know when they are wrong are more useful. The problem of quantifying uncertainty or reliability is more challenging for these foundation models because their abstract representations are difficult to compare. Our method allows quantifying how reliable a model's representation is for any input data,” says lead author Navid Azizan, professor at MIT and a member of the Laboratory for Information and Decision Systems (LIDS).

Alongside him on the work were co-lead author Young-Jin Park, a PhD student at LIDS; Hao Wang, a research scientist at the MIT-IBM Watson AI Lab; and Shervin Ardeshir, a senior research scientist at Netflix. The work will be presented at the Conference on Uncertainty in Artificial Intelligence.

Measuring Consensus
Traditional machine learning models are trained to perform a specific task. These models typically give a concrete prediction based on the input. For example, a model might say whether a particular image contains a cat or a dog. In this case, reliability assessment can be as simple as checking the final prediction.

But foundation models are different. The model is pre-trained using general data, in an environment where its creators do not know all the tasks it will be applied to. Users adapt it to their specific tasks after it has already been trained.

To evaluate the reliability of foundation models, the researchers used an ensemble approach by training several models that share many characteristics but differ slightly.

„Our idea is like measuring consensus. If all these foundation models give consistent representations for any data in our dataset, then we can say that the model is reliable,” says Park.

But they faced a problem: how to compare abstract representations?
„These models only give a vector, composed of some numbers, so we can't easily compare them,” he adds.

They solved the problem using an idea called neighborhood consistency.

For their approach, the researchers prepare a set of reliable reference points for testing on the ensemble of models. Then, for each model, they investigate the reference points that are close to the model's representation for the test point.

By looking at the consistency of neighboring points, they can assess the model's reliability.

Aligning Representations
Foundation models map data points into what is known as a representation space. One way to think about this space is as a sphere. Each model maps similar data points to the same place in its sphere, so images of cats go to one place, and images of dogs to another.

But each model would map animals differently in its sphere, so while cats might be grouped near the South Pole of one sphere, another model might map cats somewhere in the Northern Hemisphere.

Researchers use neighboring points as anchors to align these spheres so they can compare representations. If the neighbors of a data point are consistent across multiple representations, then we can be confident in the model's reliability for that point.

When they tested this approach on a wide range of classification tasks, they found it was much more consistent than baseline methods. Additionally, it was not confused by challenging test points that other methods were baffled by.

Moreover, their approach can be used to assess reliability for any input data, so it can evaluate how well the model works for a specific type of individual, such as a patient with certain characteristics.

„Even if all models have average performance, from an individual perspective, you will prefer the one that works best for that individual,” says Wang.

One limitation comes from the need to train an ensemble of foundation models, which is computationally expensive. In the future, they plan to find more efficient ways to build multiple models, possibly using small perturbations of a single model.

„With the current trend of using foundation models for their representations to support various tasks — from fine-tuning to generation with retrieval-augmented approaches — the topic of quantifying uncertainty at the representation level is becoming increasingly important but challenging, as the representations themselves lack grounding. Instead, it's about how the representations of different inputs are related to each other, an idea that this work neatly encapsulates through the proposed neighborhood consistency score,” says Marco Pavone, associate professor in the Department of Aeronautics and Astronautics at Stanford University, who was not involved in this work. „This is a promising step towards high-quality uncertainty quantification for representation models, and I am excited to see future extensions that can function without the need for model ensembling to truly enable this approach in foundation-sized models.”

This work was partially funded by the MIT-IBM Watson AI Lab, MathWorks, and Amazon.

Find accommodation nearby

Creation time: 17 July, 2024

Science & tech desk

Our Science and Technology Editorial Desk was born from a long-standing passion for exploring, interpreting, and bringing complex topics closer to everyday readers. It is written by employees and volunteers who have followed the development of science and technological innovation for decades, from laboratory discoveries to solutions that change daily life. Although we write in the plural, every article is authored by a real person with extensive editorial and journalistic experience, and deep respect for facts and verifiable information.

Our editorial team bases its work on the belief that science is strongest when it is accessible to everyone. That is why we strive for clarity, precision, and readability, without oversimplifying in a way that would compromise the quality of the content. We often spend hours studying research papers, technical documents, and expert sources in order to present each topic in a way that will interest rather than burden the reader. In every article, we aim to connect scientific insights with real life, showing how ideas from research centres, universities, and technology labs shape the world around us.

Our long experience in journalism allows us to recognize what is truly important for the reader, whether it is progress in artificial intelligence, medical breakthroughs, energy solutions, space missions, or devices that enter our everyday lives before we even imagine their possibilities. Our view of technology is not purely technical; we are also interested in the human stories behind major advances – researchers who spend years completing projects, engineers who turn ideas into functional systems, and visionaries who push the boundaries of what is possible.

A strong sense of responsibility guides our work as well. We want readers to trust the information we provide, so we verify sources, compare data, and avoid rushing to publish when something is not fully clear. Trust is built more slowly than news is written, but we believe that only such journalism has lasting value.

To us, technology is more than devices, and science is more than theory. These are fields that drive progress, shape society, and create new opportunities for everyone who wants to understand how the world works today and where it is heading tomorrow. That is why we approach every topic with seriousness but also with curiosity, because curiosity opens the door to the best stories.

Our mission is to bring readers closer to a world that is changing faster than ever before, with the conviction that quality journalism can be a bridge between experts, innovators, and all those who want to understand what happens behind the headlines. In this we see our true task: to transform the complex into the understandable, the distant into the familiar, and the unknown into the inspiring.

NOTE FOR OUR READERS
Karlobag.eu provides news, analyses and information on global events and topics of interest to readers worldwide. All published information is for informational purposes only.
We emphasize that we are not experts in scientific, medical, financial or legal fields. Therefore, before making any decisions based on the information from our portal, we recommend that you consult with qualified experts.
Karlobag.eu may contain links to external third-party sites, including affiliate links and sponsored content. If you purchase a product or service through these links, we may earn a commission. We have no control over the content or policies of these sites and assume no responsibility for their accuracy, availability or any transactions conducted through them.
If we publish information about events or ticket sales, please note that we do not sell tickets either directly or via intermediaries. Our portal solely informs readers about events and purchasing opportunities through external sales platforms. We connect readers with partners offering ticket sales services, but do not guarantee their availability, prices or purchase conditions. All ticket information is obtained from third parties and may be subject to change without prior notice. We recommend that you thoroughly check the sales conditions with the selected partner before any purchase, as the Karlobag.eu portal does not assume responsibility for transactions or ticket sale conditions.
All information on our portal is subject to change without prior notice. By using this portal, you agree to read the content at your own risk.