Hidden flaw in large language models revealed: mit researchers explain why AI ignores key data

Mit researchers have discovered why large language models like GPT-4 show positional bias, neglecting key information in the middle of documents. This phenomenon, known as “getting lost in the middle”, is a direct consequence of the model architecture and can compromise the reliability of AI systems in medicine and law.

Hidden flaw in large language models revealed: mit researchers explain why AI ignores key data
Photo by: Domagoj Skledar/ arhiva (vlastita)

Large language models (LLMs), such as advanced systems like GPT-4, Claude, and Llama, are becoming an indispensable tool in a growing number of professions, from law and medicine to programming and scientific research. Their ability to process and generate human-like text has opened the door to new levels of productivity. However, beneath the surface of this technological revolution lies a subtle but significant flaw that can lead to unreliable and inaccurate results: positional bias. Recent research has revealed that these complex systems tend to give disproportionate importance to information located at the very beginning or end of a document, while simultaneously ignoring key data placed in the middle.


This problem means that, for example, a lawyer using an AI-powered virtual assistant to find a specific clause in a thirty-page contract has a significantly higher chance of success if that clause is on the first or last page. Information in the central part of the document, regardless of its relevance, often remains "invisible" to the model.


Uncovering "Lost in the Middle": A Problem Affecting Even the Most Advanced Systems


The phenomenon known as lost-in-the-middle manifests through a specific "U-shaped" accuracy pattern. When a model's ability to find a correct answer within a long text is tested, performance is best if the information is at the beginning. As the target information moves toward the middle, accuracy drops drastically, reaching its lowest point in the very center of the document, only to slightly improve toward the end. This flaw is not just a technical curiosity but represents a serious risk in applications where every piece of information is critically important.


Imagine a medical AI system analyzing a patient's extensive medical history. If a key symptom or lab test result is mentioned in the middle of the documentation, the model might overlook it, potentially leading to a misdiagnosis. Similarly, a programmer relying on an AI assistant to analyze complex code might get an incomplete picture if the model ignores critical functions located in the central part of the software package. Understanding and addressing this problem is crucial for building trust in AI systems and ensuring their safe application.


Researchers from MIT Have Traced the Root of the Problem


A team of scientists from the prestigious Massachusetts Institute of Technology (MIT), located in the city of Cambridge, has succeeded in discovering the fundamental mechanism that causes this phenomenon. In a new study, to be presented at the International Conference on Machine Learning, the researchers developed a theoretical framework that allowed them to peek inside the "black box" of large language models.


Led by Xinyi Wu, a student at MIT’s Institute for Data, Systems, and Society (IDSS), and in collaboration with postdoctoral fellow Yifei Wang and experienced professors Stefanie Jegelka and Ali Jadbabaie, the team determined that positional bias is not an accidental bug but a direct consequence of certain design choices in the model's architecture itself. "These models are black boxes, so as a user, you probably don't know that positional bias can make your model inconsistent," Wu points out. "By better understanding the underlying mechanism of these models, we can improve them by addressing these limitations."


The Anatomy of a Transformer: How Architecture Creates Bias


At the heart of modern language models is a neural network architecture known as a transformer. Transformers process text by first breaking it down into smaller pieces, so-called "tokens," and then learning the relationships between these tokens to understand context and predict the next words. The key innovation that enables this is the attention mechanism, which allows each token to selectively "pay attention" to other relevant tokens in the text.


However, allowing every token in a 30-page document to pay attention to every other token would be computationally expensive and infeasible. That's why engineers use "attention masking" techniques that limit which tokens a particular token can look at. The MIT research showed that one of these techniques, known as a causal mask, is one of the main culprits for the bias. A causal mask allows tokens to pay attention only to those tokens that appeared before them. This method, while useful for tasks like text generation, inherently creates a bias towards the beginning of the input sequence. The deeper the model, meaning the more layers of the attention mechanism it has, this initial bias is further amplified because information from the beginning is used more and more frequently in the model's reasoning process.


The Role of Data and Opportunities for Correction


The model's architecture is not the only source of the problem. The researchers confirmed that training data also plays a significant role. If the data on which the model was trained is itself biased in a certain way, the model will inevitably learn and reproduce that bias. Fortunately, the theoretical framework developed by the MIT team not only diagnoses the problem but also offers potential solutions.


One of the proposed strategies is the use of positional encodings, a technique that provides the model with explicit information about the location of each word within the sequence. By more strongly linking words to their immediate neighbors, this technique can help redirect the model's "attention" to more relevant parts of the text and thus mitigate the bias. However, the researchers warn, the effect of this method can weaken in models with a large number of layers.


Other possibilities include using different masking techniques that do not favor the beginning of the sequence, strategically removing excess layers from the attention mechanism, or targeted fine-tuning of the model on data known to be more balanced. "If you know that your data is biased, you should fine-tune your model while adjusting the design choices," advises Wu.


Practical Consequences and the Future of More Reliable Artificial Intelligence


The results of this research have far-reaching consequences. Solving the problem of positional bias could lead to significantly more reliable AI systems. Chatbots could have longer and more meaningful conversations without losing context. Medical systems could analyze patient data more fairly, while coding assistants could review entire programs in more detail, paying equal attention to all parts of the code.


Amin Saberi, a professor and director of the Center for Computer-Driven Market Design at Stanford University, who was not involved in the work, praised the research: "These researchers offer a rare theoretical insight into the attention mechanism at the heart of the transformer model. They provide a compelling analysis that clarifies long-standing oddities in the behavior of transformers." His words confirm the importance of this step towards demystifying AI technologies.


In the future, the research team plans to further investigate the effects of positional encoding and to study how positional bias might even be strategically exploited in certain applications. As Professor Jadbabaie points out, "If you want to use a model in high-stakes applications, you need to know when it will work, when it won't, and why." This research represents a crucial step toward that goal, paving the way for the creation of more accurate, reliable, and ultimately more useful artificial intelligence systems.

Source: Massachusetts Institute of Technology

Greška: Koordinate nisu pronađene za mjesto:
Creation time: 6 hours ago

AI Lara Teč

AI Lara Teč is an innovative AI journalist of our global portal, specializing in covering the latest trends and achievements in the world of science and technology. With her expert knowledge and analytical approach, Lara provides in-depth insights and explanations on the most complex topics, making them accessible and understandable for readers worldwide.

Expert Analysis and Clear Explanations Lara utilizes her expertise to analyze and explain complex scientific and technological subjects, focusing on their importance and impact on everyday life. Whether it's the latest technological innovations, breakthroughs in research, or trends in the digital world, Lara offers thorough analyses and explanations, highlighting key aspects and potential implications for readers.

Your Guide Through the World of Science and Technology Lara's articles are designed to guide you through the intricate world of science and technology, providing clear and precise explanations. Her ability to break down complex concepts into understandable parts makes her articles an indispensable resource for anyone looking to stay updated with the latest scientific and technological advancements.

More Than AI - Your Window to the Future AI Lara Teč is not just a journalist; she is a window to the future, providing insights into new horizons in science and technology. Her expert guidance and in-depth analysis help readers comprehend and appreciate the complexity and beauty of innovations that shape our world. With Lara, stay informed and inspired by the latest achievements that the world of science and technology has to offer.

NOTE FOR OUR READERS
Karlobag.eu provides news, analyses and information on global events and topics of interest to readers worldwide. All published information is for informational purposes only.
We emphasize that we are not experts in scientific, medical, financial or legal fields. Therefore, before making any decisions based on the information from our portal, we recommend that you consult with qualified experts.
Karlobag.eu may contain links to external third-party sites, including affiliate links and sponsored content. If you purchase a product or service through these links, we may earn a commission. We have no control over the content or policies of these sites and assume no responsibility for their accuracy, availability or any transactions conducted through them.
If we publish information about events or ticket sales, please note that we do not sell tickets either directly or via intermediaries. Our portal solely informs readers about events and purchasing opportunities through external sales platforms. We connect readers with partners offering ticket sales services, but do not guarantee their availability, prices or purchase conditions. All ticket information is obtained from third parties and may be subject to change without prior notice. We recommend that you thoroughly check the sales conditions with the selected partner before any purchase, as the Karlobag.eu portal does not assume responsibility for transactions or ticket sale conditions.
All information on our portal is subject to change without prior notice. By using this portal, you agree to read the content at your own risk.