Postavke privatnosti

As AI models improve medical diagnoses but face biases in different demographic groups of patients, research shows challenges in fairness

Mit research reveals that AI models, while accurate in predicting disease, show significant biases toward different racial and gender groups. This discovery has important implications for the application of AI in medicine.

As AI models improve medical diagnoses but face biases in different demographic groups of patients, research shows challenges in fairness
Photo by: Domagoj Skledar/ arhiva (vlastita)

Artificial intelligence models often play a key role in medical diagnoses, particularly in image analysis such as X-rays. Research has shown that these models do not perform equally well across all demographic groups, often performing worse on women and minority groups. The models have also shown some unexpected abilities. Researchers at MIT discovered in 2022 that AI models can accurately predict patients' race from their chest X-rays—something even the most skilled radiologists cannot achieve. A recent study by this research team shows that models that are most accurate in predicting demographic data also exhibit the greatest "fairness biases"—discrepancies in the ability to accurately diagnose images of people of different races or genders. The findings suggest that these models may be using "demographic shortcuts" when making diagnostic assessments, leading to inaccurate results for women, Black people, and other groups, the researchers claim.

"It is well known that high-capacity machine learning models predict human demographics such as self-reported race, gender, or age very well. This work reaffirms that ability, and then links that ability to performance deficiencies among different groups, which had not been done before," says Marzyeh Ghassemi, an associate professor of electrical engineering and computer science at MIT, a member of MIT's Institute for Medical Engineering and Science, and the senior author of the study.

Researchers also found that they could retrain models in ways that improve their fairness. However, their "debiasing" approaches worked best when the models were tested on the same types of patients on whom they were trained, such as patients from the same hospital. When these models were applied to patients from different hospitals, biases reappeared.

"I think the main takeaways are first, thoroughly evaluate any external model on your own data because any fairness guarantees provided by model developers on their training data may not transfer to your population. Second, whenever enough data is available, you should train models on your own data," says Haoran Zhang, a student at MIT and one of the lead authors of the new paper. MIT student Yuzhe Yang is also a lead author of the paper, which was published today in the journal Nature Medicine. Judy Gichoya, an assistant professor of radiology and imaging sciences at Emory University School of Medicine, and Dina Katabi, the Thuan and Nicole Pham professor of electrical engineering and computer science at MIT, are also authors of the paper.

As of May 2024, the FDA has approved 882 AI-supported medical devices, of which 671 are intended for use in radiology. Since 2022, when Ghassemi and her colleagues demonstrated that these diagnostic models can accurately predict race, they and other researchers have shown that such models are also very good at predicting gender and age, even though the models were not trained for those tasks.

"Many popular machine learning models have superhuman demographic prediction capabilities—radiologists cannot detect self-reported race from a chest X-ray," says Ghassemi. "These are models that are good at predicting disease, but during training, they also learn to predict other things that may not be desirable."

In this study, the researchers wanted to explore why these models do not work equally well for certain groups. They particularly wanted to see if the models were using demographic shortcuts to make predictions that ended up being less accurate for some groups. These shortcuts can appear in AI models when they use demographic attributes to determine the presence of a medical condition instead of relying on other image features.

Using publicly available chest X-rays from Beth Israel Deaconess Medical Center in Boston, the researchers trained models to predict whether patients had one of three different medical conditions: fluid buildup in the lungs, lung collapse, or heart enlargement. They then tested the models on X-rays that were not included in the training data.

Overall, the models performed well, but most showed "fairness biases"—i.e., discrepancies in accuracy rates for men and women, and for white and Black patients.

The models could also predict the gender, race, and age of the subjects from the X-rays. Additionally, there was a significant correlation between each model's accuracy in making demographic predictions and the size of its fairness biases. This suggests that the models may be using demographic categorizations as shortcuts for making their disease predictions.

Researchers then tried to reduce fairness biases using two types of strategies. For one set of models, they trained them to optimize "subgroup robustness," which means the models were rewarded for better performance on the subgroup they performed the worst on and penalized if their error rate for one group was higher than the others.

In another set of models, the researchers forced them to remove all demographic information from the images using "adversarial" approaches. Both strategies proved to be quite effective, the researchers found.

"For in-distribution data, you can use existing state-of-the-art methods to reduce fairness biases without significant compromises in overall performance," says Ghassemi. "Subgroup robustness methods force models to be sensitive to prediction errors in specific groups, and adversarial methods try to remove group information completely."

However, these approaches only worked when the models were tested on data from the same types of patients on whom they were trained—for example, only patients from the Beth Israel Deaconess Medical Center dataset.

When the researchers tested the "debiasing" models using BIDMC data to analyze patients from five other hospital datasets, they found that the overall model accuracy remained high, but some showed significant fairness biases.

"If you debias a model on one set of patients, that fairness does not necessarily hold when you switch to a new set of patients from another hospital at another location," says Zhang.

This is concerning because in many cases, hospitals use models developed on data from other hospitals, especially when purchasing an off-the-shelf model, the researchers say.

"We found that even state-of-the-art models that are optimally performed on data similar to their training datasets are not optimal—that is, they do not make the best trade-off between overall performance and subgroup performance—in new environments," says Ghassemi. "Unfortunately, this is likely how the model is applied. Most models are trained and validated with data from one hospital or one source, and then widely applied."

Researchers found that models that were debiased using adversarial approaches showed slightly greater fairness when tested on new patient groups compared to those debiased with subgroup robustness methods. They now plan to develop and test additional methods to see if they can create models that make fairer predictions on new datasets.

The findings suggest that hospitals using these AI models should evaluate their effectiveness on their own patient populations before putting them to use to ensure they do not produce inaccurate results for certain groups.

The research was funded by the Google Research Scholar Award, the Harold Amos Medical Faculty Development Program of the Robert Wood Johnson Foundation, the RSNA Health Disparities, Lacuna Fund, the Gordon and Betty Moore Foundation, the National Institute of Biomedical Imaging and Bioengineering, and the National Heart, Lung, and Blood Institute.

Source: Massachusetts Institute of Technology

Find accommodation nearby

Creation time: 02 July, 2024

Science & tech desk

Our Science and Technology Editorial Desk was born from a long-standing passion for exploring, interpreting, and bringing complex topics closer to everyday readers. It is written by employees and volunteers who have followed the development of science and technological innovation for decades, from laboratory discoveries to solutions that change daily life. Although we write in the plural, every article is authored by a real person with extensive editorial and journalistic experience, and deep respect for facts and verifiable information.

Our editorial team bases its work on the belief that science is strongest when it is accessible to everyone. That is why we strive for clarity, precision, and readability, without oversimplifying in a way that would compromise the quality of the content. We often spend hours studying research papers, technical documents, and expert sources in order to present each topic in a way that will interest rather than burden the reader. In every article, we aim to connect scientific insights with real life, showing how ideas from research centres, universities, and technology labs shape the world around us.

Our long experience in journalism allows us to recognize what is truly important for the reader, whether it is progress in artificial intelligence, medical breakthroughs, energy solutions, space missions, or devices that enter our everyday lives before we even imagine their possibilities. Our view of technology is not purely technical; we are also interested in the human stories behind major advances – researchers who spend years completing projects, engineers who turn ideas into functional systems, and visionaries who push the boundaries of what is possible.

A strong sense of responsibility guides our work as well. We want readers to trust the information we provide, so we verify sources, compare data, and avoid rushing to publish when something is not fully clear. Trust is built more slowly than news is written, but we believe that only such journalism has lasting value.

To us, technology is more than devices, and science is more than theory. These are fields that drive progress, shape society, and create new opportunities for everyone who wants to understand how the world works today and where it is heading tomorrow. That is why we approach every topic with seriousness but also with curiosity, because curiosity opens the door to the best stories.

Our mission is to bring readers closer to a world that is changing faster than ever before, with the conviction that quality journalism can be a bridge between experts, innovators, and all those who want to understand what happens behind the headlines. In this we see our true task: to transform the complex into the understandable, the distant into the familiar, and the unknown into the inspiring.

NOTE FOR OUR READERS
Karlobag.eu provides news, analyses and information on global events and topics of interest to readers worldwide. All published information is for informational purposes only.
We emphasize that we are not experts in scientific, medical, financial or legal fields. Therefore, before making any decisions based on the information from our portal, we recommend that you consult with qualified experts.
Karlobag.eu may contain links to external third-party sites, including affiliate links and sponsored content. If you purchase a product or service through these links, we may earn a commission. We have no control over the content or policies of these sites and assume no responsibility for their accuracy, availability or any transactions conducted through them.
If we publish information about events or ticket sales, please note that we do not sell tickets either directly or via intermediaries. Our portal solely informs readers about events and purchasing opportunities through external sales platforms. We connect readers with partners offering ticket sales services, but do not guarantee their availability, prices or purchase conditions. All ticket information is obtained from third parties and may be subject to change without prior notice. We recommend that you thoroughly check the sales conditions with the selected partner before any purchase, as the Karlobag.eu portal does not assume responsibility for transactions or ticket sale conditions.
All information on our portal is subject to change without prior notice. By using this portal, you agree to read the content at your own risk.