In research that relies on data from maps and monitoring stations, it is often assumed that “uncertainty is resolved” as soon as a computer model guesses the values to be predicted well. But in real analyses, scientists and decision-makers are usually interested not just in the forecast, but also in the question of relationship: is a certain exposure associated with an outcome, in which direction, and what is the strength of the effect. It is precisely here – in the estimation of the association of variables across space – that a team of researchers from MIT has shown that common methods for calculating confidence intervals can severely break down in spatial settings and produce intervals that look convincing but are wrong.
Let’s imagine a scenario from public health: an environmental scientist in a county wants to estimate whether exposure to air pollution is associated with lower birth weights. In the era of large datasets, a natural step is to train a machine-learning model that captures complex, non-linear relationships – because such models often excel at prediction. The problem arises when something else is asked of the model: not “how much will the baby weigh”, but “what is the association between exposure and birth weight” and with what certainty can we say that.
Standard machine learning methods can deliver estimates and, sometimes, uncertainty for the prediction itself. But when the goal is to establish an association between a variable (e.g., fine particles in the air) and an outcome (e.g., birth weight), researchers rely on confidence intervals: a range of values expected to “cover” the true effect with a certain probability. In spatial problems – where data differs depending on location – the MIT team warns that this range can be completely wrong, and in a way that leads the user to a wrong conclusion: the method might claim “high confidence” while the estimate has missed the actual value.
Why “95% confident” sometimes doesn’t hold
Spatial analysis of association deals with how a variable and an outcome are related within a geographical area. An example might be the relationship between tree canopy cover and elevation in the US, or the link between rainfall and the yield of a crop. The researcher here often has “source” data collected at specific locations, and wants to estimate the relationship at another location where measurements do not exist or are rare. In an ideal case, the model gives an estimate and an interval that realistically expresses uncertainty.
In practice, the authors warn, the opposite often happens: the method might claim that it is, for example, 95 percent sure that the interval “captured” the true relationship, while the actual value is not within that range at all. In other words, the confidence interval looks authoritative, but is actually – wrong. Such “falsely secure” intervals are particularly risky when results are used for environmental protection policies, public health recommendations, or estimates of economic effects on the ground, because the numbers can create an impression of solid proof where there is none.
The key cause lies in the assumptions on which classical interval construction procedures rest. In statistics, assumptions function as rules of the game: if they hold, the conclusions are valid; if they do not hold, numbers can mislead. In spatial data, some of the most common assumptions break in multiple places.
Three assumptions that break in spatial data
1) i.i.d. assumption (independent and identically distributed)
Many methods start from the idea that observations are mutually independent and from the “same” distribution. In the spatial world, this is often not true. An example often cited is the arrangement of monitoring stations: locations of air quality sensors are not random, but are chosen with regard to infrastructure, population density, industry, traffic, and the existing measurement network. This means that including one location in the data strongly affects which other locations are represented.
2) assumption of a perfectly correct model
Part of the procedures for confidence intervals implicitly assumes that the model is “correct”. But in real applications, models are approximations: they miss variables, simplify processes, and incorrectly describe noise. When the model is off, intervals relying on its correctness can be unrealistically narrow and overconfident.
3) similarity of source and target data
In spatial problems, there is often a difference between the data on which the model was learned and the place where one wants to make inferences. Example: a model is trained on urban pollution measurements (because sensors are more common in cities), and is then used to estimate relationships in a rural area without stations. Urbanization, traffic, and industry change the characteristics of the air, so the “target” area is systematically different. Such a distribution shift can introduce bias into the association estimate – and nullify the nominal reliability of the interval.
In combination, these three cracks create space for a serious problem: the model might miss the effect, and the interval continues to “behave” as if everything is fine. For journalists and public institutions, this is particularly sensitive, because in public communication confidence intervals are often translated into claims like “scientifically proven” or “with high certainty”, without insight into how much the assumptions are even satisfied.
“Smoothness” as a more realistic assumption
Instead of insisting on i.i.d. and on the overlap of source and target locations, the authors introduce an assumption that is more intuitive in many spatial processes: that data changes smoothly across space. In mathematical language, this is described by the Lipschitz condition – the idea that a change in space cannot produce an arbitrarily large jump in value, but that there is an upper limit to “how fast” the relationship can change.
For fine particles in the air, the example is almost tangible: we do not expect the pollution level on one city block to be drastically different than on the next city block. Instead of jumps, a more common image is one of gradual decline as we move away from emission sources. In such conditions, smoothness is an assumption closer to what actually happens in the environment than the i.i.d. “idealization”.
On this basis, the MIT team proposes a procedure that directly accounts for the possibility of bias caused by non-random location selection and distribution shift. The goal is not just to get an association estimate, but to build a confidence interval that still has meaningful coverage – that is, which truly, as often as it claims, contains the true value of the parameter of interest.
What is new in the approach and why it matters
According to the description in the paper, the new method constructs valid frequentist confidence intervals for spatial associations with minimal additional assumptions: a certain form of spatial smoothness and homoscedastic Gaussian error. Crucial is also what the method does not require: the authors emphasize that they do not rely on the complete correctness of the model nor on “covariate overlap” between locations where learning takes place and locations where the effect is estimated.
In practice, this means that the method can be used even when measurements are crowded in cities, and inference is sought for the periphery or rural areas – a scenario that often appears in epidemiology and environmental studies. When the noise level is known, the authors state that intervals can be valid even in finite samples; when noise is not known, they offer a variance estimation procedure that is asymptotically consistent.
In comparisons on simulations and real data, the authors report that their procedure is the only one that consistently delivers reliable intervals in situations where standard approaches can completely fail. In other words, it is not about a cosmetic improvement, but an attempt to “fix the instrument” often used for drawing conclusions about variable relationships across space.
From forecast to explanation: what this means for the environment, economy, and medicine
In the public eye, machine learning is often perceived as a tool for “more accurate forecasts”. But in science and politics, a forecast is just the beginning. If healthcare estimates where to invest in prevention, if a city plans traffic policies, or if the effect of afforestation on microclimate is estimated, the question is: how strong is the association and how sure are we in that estimate?
Here the role of the confidence interval turns into a practical trust filter. If the interval falsely suggests high certainty, decisions can be based on a wrong estimate of the effect, and that can mean redirection of resources or wrong interventions. On the other hand, an interval that realistically reflects uncertainty allows for more rational planning: both when the effect is present, and when it is small, and when data are not yet sufficient to conclude with confidence.
The authors place their work in a wide range of applications: from environmental sciences (pollution, rainfall, forest management) through epidemiology, to economic analyses relying on spatial data. In all these areas there is a common need: to distinguish a “model that predicts well” from a “model we can trust when it speaks about relationships”.
NeurIPS 2025: from theory to community
The paper was presented at the NeurIPS 2025 conference, one of the world's most influential conferences for machine learning and artificial intelligence. On the official program page, a poster presentation is listed under the title “Smooth Sailing: Lipschitz-Driven Uncertainty Quantification for Spatial Associations”, with authors David Burt, Renato Berlinghieri, Stephen Bates, and Tamara Broderick, held on December 3, 2025, as part of the conference program.
Simultaneously, a version of the paper is available as a preprint on arXiv, with a note that it is a NeurIPS 2025 reference and that the first versions were received on February 9, 2025, with later revisions. The authors also published a reference code implementation, which is crucial in methodological papers so that results can be reproduced and verified on other datasets.
More information about the paper and related materials is available at: arXiv paper page, official NeurIPS 2025 poster card and code repository.
Find accommodation nearby
Creation time: 9 hours ago