AI tools accelerated the analysis of pregnancy data and the creation of models for assessing preterm birth
Researchers from the University of California, San Francisco (UCSF) and Wayne State University in Detroit published results of an experiment in which generative AI chatbots were used to build computational “pipelines” and predictive models on large pregnancy datasets. The comparison was carried out so that both human teams and AI tools received the same task: from data on more than a thousand pregnant women, build algorithms that can predict the risk of preterm birth and, in separate tasks, estimate gestational age based on biological samples.
The finding that attracted the most attention was not only that some models were comparable to those previously developed by expert data science teams, but also the fact that part of the work that in practice often slows biomedical research—writing and debugging code, joining tables, validating, and repeating analyses—was in this case shortened from weeks and months to hours, and even minutes. The authors emphasize that this is not about replacing human expertise, but about changing the dynamics of work: AI can take over routine steps, and researchers get more time to verify results, interpret them, and ask better questions.
Why preterm birth remains a major public health problem
Preterm birth, defined as birth before 37 completed weeks of pregnancy, is associated with a higher risk of newborn mortality and a range of long-term consequences, including motor and cognitive difficulties. According to the World Health Organization, it is estimated that in 2020 around 13.4 million babies were born preterm worldwide, and complications associated with preterm birth are cited as the leading cause of death in children under five. In the United States, the problem is particularly visible in statistics that for years have hovered around “one in ten” births: the CDC reports that in 2022 preterm birth affected about 10.4% of births, with persistent differences among population groups, pointing to a broader context of access to care and social inequalities.
That is precisely why interest in reliable early risk indicators continues to grow. In an ideal scenario, high-risk pregnancies would be recognized earlier, monitoring would be intensified, and interventions would be targeted more precisely. However, science still does not fully understand the causes of preterm birth; it is a complex outcome in which infections, inflammatory responses, hormonal changes, comorbidities, environmental factors, and stress can intertwine, and the contribution of individual factors often depends on the population and the stage of pregnancy.
Data from multiple studies and experience from international competitions
The UCSF team has for years been building a repository of preterm-birth data, including information on the vaginal microbiome—the community of microorganisms that can influence inflammatory processes and barrier functions of the mucosa. According to publicly available descriptions of earlier work within the international DREAM framework (Dialogue for Reverse Engineering Assessments and Methods), microbiome data were collected in multiple studies, and birth outcomes were tracked across nine studies, enabling analyses at the level of more than a thousand pregnant women.
DREAM challenges function as competitions in which organizers publish standardized datasets, and teams from around the world try to build the best predictive models within a given deadline. In earlier pregnancy-related challenges, more than a hundred groups participated, aiming to identify patterns in data that could indicate preterm birth or more precisely determine gestational age. Although models in such challenges are often developed relatively quickly, the research cycle then extends: approaches need to be aligned, results revalidated, and a scientific publication prepared, which can take years.
What generative AI chatbots did in the new experiment
In the current project, researchers decided to test whether popular generative AI tools—essentially systems used via natural language that can generate text and code—could take over part of the work that previously required many hours of programming and coordination. Eight different chatbots received carefully crafted, expert instructions in natural language. The goal was not only “build a model,” but also: load the data, clean it, engineer features, select and train algorithms, evaluate results, and produce code that can run on standard research infrastructure.
To make the test comparable, the same datasets and the same tasks as in the DREAM challenges were used: analysis of the vaginal microbiome to assess the risk of preterm birth, and analysis of blood or placental samples to estimate gestational age. In practice, gestational age often remains an estimate, and error in estimation can affect care planning, timing of additional check-ups, and preparation for delivery.
The result was not “one AI that solves everything.” Only half of the tested tools produced code and models that proved usable enough for further analysis, which the authors interpret as a reminder that generative AI is not reliable without human verification. But with those more successful tools, the key advantage was speed: code that an experienced programmer would write in hours or days, AI generated in minutes. This allowed younger researchers, including a graduate-level student and a high-school student, to reach working models under mentorship, supervision, and verification.
Comparison with human teams and where AI really “gained” time
In scientific competitions and lab projects, human teams usually spend a large part of their time on technical steps that are necessary but often invisible outside the field: checking file formats, harmonizing variables across studies, choosing metrics, ensuring reproducibility, documenting package versions, and rerunning experiments after every correction. In this test, generative AI showed the most “strength” precisely there: it generated an analysis skeleton and parts of finished code, which researchers then ran, checked, corrected, and adapted.
The authors also emphasize another aspect: faster prototyping of models can accelerate negative results. If at an early stage it becomes clear that a certain type of features or algorithms does not yield stable predictions, researchers can pivot sooner to other hypotheses and measurement approaches, instead of “grinding” in the same direction for months.
It is important to understand that “speed” does not automatically translate into “clinical readiness.” A predictive model may have good statistical accuracy on historical data, yet be impractical in a hospital setting if it requires samples that are hard to standardize, relies on rare laboratory parameters, or if the result cannot be explained to physicians and patients. In that sense, the project is best read as a demonstration of a way of working, not as a finished diagnostic test.
Open science as a prerequisite and the question of trust in results
A common thread in both the DREAM challenges and this AI experiment is more open sharing of data and methods. When data from multiple studies can be compared and reanalyzed, it is easier to test model robustness, uncover hidden biases, and avoid false “wins” that arise from the specifics of a single cohort. Published descriptions of earlier DREAM work also emphasize microbiome-data harmonization techniques and strict separation of training and validation sets to reduce the risk of information “leakage.”
Generative AI in that context opens two opposing possibilities. On the one hand, it enables faster repetition of analyses and comparison of multiple approaches, which in science is often a path to more stable conclusions. On the other hand, it increases the risk that someone will rely on code that “looks convincing” but hides an error or misinterprets the data structure. The project’s authors therefore stress the need for continuous oversight: AI can make mistakes, can “hallucinate” functions that do not exist, or skip steps that are crucial for validation.
In practice, that means standards of reproducibility and transparency must be tightened, not relaxed. Code must be reviewable, versioned, and tested; success metrics must be clearly defined; and models, especially for sensitive health outcomes, must be validated across different populations. Without that, acceleration in development may result only in faster spread of unreliable conclusions.
What such an approach could change in pregnancy research
If generative AI tools prove stable in tasks like these, the change could be visible on several levels. First, smaller labs and younger researchers could test ideas faster without large budgets for engineering teams, potentially democratizing access to analytics. Second, competitive frameworks like DREAM could gain a “second phase,” in which human teams focus more on interpretation and biological meaning, while the technical part of the pipeline is standardized and automated. Third, it could speed the path to clinically relevant biomarkers—provided the results are confirmed in prospective studies and protocols for safe use are developed.
However, experts in the field remind us that predicting preterm birth is not only a mathematical question. Even a very good model will not be useful if there is no clear plan for what to do when an algorithm flags a pregnant woman as “high risk,” or if the health system is overloaded and lacks capacity for additional monitoring. That is why the literature on preterm birth increasingly emphasizes the need to combine biological signals with social determinants of health, an area in which even the fastest code will still require interdisciplinary work.
Limits and next steps: from demonstration to practice
The current work shows that generative AI can accelerate building and testing models on existing datasets, but also that quality depends on the tool, the instructions, and human verification. The next step, which will determine the real impact on patients and clinicians, is moving from retrospective analyses to studies that follow pregnant women in real time, with strict ethical standards, privacy protection, and clearly defined responsibility for decisions.
In the meantime, the results serve as a signal that biomedical analytics is changing: the skill of writing code remains important, but equally important is the ability to formulate precise questions, set controls, understand data limitations, and recognize the moment when technology does the wrong job. In that sense, generative AI can be a powerful tool, but only if it remains within the framework of scientific discipline—where speed, transparency, and verifiability are equally mandatory.
Sources:- UCSF News – report on the UCSF and Wayne State University experiment with generative AI chatbots and DREAM data (link)- Cell Reports Medicine – article on the microbiome DREAM challenge and predicting preterm birth from the vaginal microbiome (link)- CDC – overview of preterm birth indicators and definition of preterm birth in the U.S. (updated November 8, 2024) (link)- CDC/NCHS – provisional data on births and the preterm birth rate in the U.S. for 2024 (Vital Statistics Rapid Release, No. 038) (link)- WHO – fact sheet on preterm birth and global estimates (fact sheet, May 10, 2023) (link)- March of Dimes – 2025 Report Card for the United States showing the rate and regional differences in preterm births (link)- Center for Data to Health (CD2H) – explanation of the DREAM framework and methodology (link)
Find accommodation nearby
Creation time: 4 hours ago