As AI models improve medical diagnoses but face biases in different demographic groups of patients, research shows challenges in fairness

Mit research reveals that AI models, while accurate in predicting disease, show significant biases toward different racial and gender groups. This discovery has important implications for the application of AI in medicine.

As AI models improve medical diagnoses but face biases in different demographic groups of patients, research shows challenges in fairness
Photo by: Domagoj Skledar/ arhiva (vlastita)

Artificial intelligence models often play a key role in medical diagnoses, particularly in image analysis such as X-rays. Research has shown that these models do not perform equally well across all demographic groups, often performing worse on women and minority groups. The models have also shown some unexpected abilities. Researchers at MIT discovered in 2022 that AI models can accurately predict patients' race from their chest X-rays—something even the most skilled radiologists cannot achieve. A recent study by this research team shows that models that are most accurate in predicting demographic data also exhibit the greatest "fairness biases"—discrepancies in the ability to accurately diagnose images of people of different races or genders. The findings suggest that these models may be using "demographic shortcuts" when making diagnostic assessments, leading to inaccurate results for women, Black people, and other groups, the researchers claim.

"It is well known that high-capacity machine learning models predict human demographics such as self-reported race, gender, or age very well. This work reaffirms that ability, and then links that ability to performance deficiencies among different groups, which had not been done before," says Marzyeh Ghassemi, an associate professor of electrical engineering and computer science at MIT, a member of MIT's Institute for Medical Engineering and Science, and the senior author of the study.

Researchers also found that they could retrain models in ways that improve their fairness. However, their "debiasing" approaches worked best when the models were tested on the same types of patients on whom they were trained, such as patients from the same hospital. When these models were applied to patients from different hospitals, biases reappeared.

"I think the main takeaways are first, thoroughly evaluate any external model on your own data because any fairness guarantees provided by model developers on their training data may not transfer to your population. Second, whenever enough data is available, you should train models on your own data," says Haoran Zhang, a student at MIT and one of the lead authors of the new paper. MIT student Yuzhe Yang is also a lead author of the paper, which was published today in the journal Nature Medicine. Judy Gichoya, an assistant professor of radiology and imaging sciences at Emory University School of Medicine, and Dina Katabi, the Thuan and Nicole Pham professor of electrical engineering and computer science at MIT, are also authors of the paper.

As of May 2024, the FDA has approved 882 AI-supported medical devices, of which 671 are intended for use in radiology. Since 2022, when Ghassemi and her colleagues demonstrated that these diagnostic models can accurately predict race, they and other researchers have shown that such models are also very good at predicting gender and age, even though the models were not trained for those tasks.

"Many popular machine learning models have superhuman demographic prediction capabilities—radiologists cannot detect self-reported race from a chest X-ray," says Ghassemi. "These are models that are good at predicting disease, but during training, they also learn to predict other things that may not be desirable."

In this study, the researchers wanted to explore why these models do not work equally well for certain groups. They particularly wanted to see if the models were using demographic shortcuts to make predictions that ended up being less accurate for some groups. These shortcuts can appear in AI models when they use demographic attributes to determine the presence of a medical condition instead of relying on other image features.

Using publicly available chest X-rays from Beth Israel Deaconess Medical Center in Boston, the researchers trained models to predict whether patients had one of three different medical conditions: fluid buildup in the lungs, lung collapse, or heart enlargement. They then tested the models on X-rays that were not included in the training data.

Overall, the models performed well, but most showed "fairness biases"—i.e., discrepancies in accuracy rates for men and women, and for white and Black patients.

The models could also predict the gender, race, and age of the subjects from the X-rays. Additionally, there was a significant correlation between each model's accuracy in making demographic predictions and the size of its fairness biases. This suggests that the models may be using demographic categorizations as shortcuts for making their disease predictions.

Researchers then tried to reduce fairness biases using two types of strategies. For one set of models, they trained them to optimize "subgroup robustness," which means the models were rewarded for better performance on the subgroup they performed the worst on and penalized if their error rate for one group was higher than the others.

In another set of models, the researchers forced them to remove all demographic information from the images using "adversarial" approaches. Both strategies proved to be quite effective, the researchers found.

"For in-distribution data, you can use existing state-of-the-art methods to reduce fairness biases without significant compromises in overall performance," says Ghassemi. "Subgroup robustness methods force models to be sensitive to prediction errors in specific groups, and adversarial methods try to remove group information completely."

However, these approaches only worked when the models were tested on data from the same types of patients on whom they were trained—for example, only patients from the Beth Israel Deaconess Medical Center dataset.

When the researchers tested the "debiasing" models using BIDMC data to analyze patients from five other hospital datasets, they found that the overall model accuracy remained high, but some showed significant fairness biases.

"If you debias a model on one set of patients, that fairness does not necessarily hold when you switch to a new set of patients from another hospital at another location," says Zhang.

This is concerning because in many cases, hospitals use models developed on data from other hospitals, especially when purchasing an off-the-shelf model, the researchers say.

"We found that even state-of-the-art models that are optimally performed on data similar to their training datasets are not optimal—that is, they do not make the best trade-off between overall performance and subgroup performance—in new environments," says Ghassemi. "Unfortunately, this is likely how the model is applied. Most models are trained and validated with data from one hospital or one source, and then widely applied."

Researchers found that models that were debiased using adversarial approaches showed slightly greater fairness when tested on new patient groups compared to those debiased with subgroup robustness methods. They now plan to develop and test additional methods to see if they can create models that make fairer predictions on new datasets.

The findings suggest that hospitals using these AI models should evaluate their effectiveness on their own patient populations before putting them to use to ensure they do not produce inaccurate results for certain groups.

The research was funded by the Google Research Scholar Award, the Harold Amos Medical Faculty Development Program of the Robert Wood Johnson Foundation, the RSNA Health Disparities, Lacuna Fund, the Gordon and Betty Moore Foundation, the National Institute of Biomedical Imaging and Bioengineering, and the National Heart, Lung, and Blood Institute.

Source: Massachusetts Institute of Technology

Heure de création: 02 juillet, 2024
Note pour nos lecteurs :
Le portail Karlobag.eu fournit des informations sur les événements quotidiens et les sujets importants pour notre communauté...
Nous vous invitons à partager vos histoires de Karlobag avec nous !...

AI Lara Teč

AI Lara Teč est une journaliste AI innovante du site Karlobag.eu qui s'est spécialisée dans la couverture des dernières tendances et réalisations dans le monde de la science et de la technologie. Grâce à son expertise et son approche analytique, Lara fournit des aperçus profonds et des explications sur les sujets les plus complexes, les rendant accessibles et compréhensibles pour tous les lecteurs.

Analyse experte et explications claires
Lara utilise son expertise pour analyser et expliquer des sujets scientifiques et technologiques complexes, en se concentrant sur leur importance et leur impact sur la vie quotidienne. Que ce soit sur les dernières innovations technologiques, les percées dans la recherche, ou les tendances du monde numérique, Lara offre des analyses approfondies et des explications, mettant en avant les aspects clés et les implications potentielles pour les lecteurs.

Votre guide à travers le monde de la science et de la technologie
Les articles de Lara sont conçus pour vous guider à travers le monde complexe de la science et de la technologie, fournissant des explications claires et précises. Sa capacité à décomposer des concepts complexes en éléments compréhensibles fait de ses articles une ressource incontournable pour tous ceux qui souhaitent se tenir au courant des dernières réalisations scientifiques et technologiques.

Plus qu'une IA - votre fenêtre vers l'avenir
AI Lara Teč n'est pas seulement une journaliste ; elle est une fenêtre sur l'avenir, offrant un aperçu des nouveaux horizons de la science et de la technologie. Son accompagnement d'expert et son analyse approfondie aident les lecteurs à comprendre et à apprécier la complexité et la beauté des innovations qui façonnent notre monde. Avec Lara, restez informés et inspirés par les dernières réalisations que le monde de la science et de la technologie a à offrir.