Postavke privatnosti

AI AnomalyMatch in the Hubble Telescope Archive: ESA astronomers extracted nearly 1400 anomalies in 2.5 days

Find out how the AnomalyMatch tool, developed by ESA researchers, searched nearly 100 million cutouts from the Hubble Legacy Archive and in a short time extracted more than 1300 rare phenomena, including hundreds of previously undocumented anomalies. We bring you what this means for the search for gravitational lenses, galaxy collisions, and other cosmic 'needles in a haystack'.

AI AnomalyMatch in the Hubble Telescope Archive: ESA astronomers extracted nearly 1400 anomalies in 2.5 days
Photo by: ESA/ArianeGroup/ ESA/ArianeGroup

AI searched the Hubble archive and extracted nearly 1400 rare cosmic anomalies in two to three days

Astronomical archives are growing faster than humans can review them, and this has become one of the key questions of modern astronomy: how to find rare phenomena that change our understanding of the universe in oceans of data. The NASA/ESA Hubble Space Telescope has been operating in orbit since 1990 and has decades of observations behind it. Much of that legacy is consolidated in the Hubble Legacy Archive (HLA), a repository of processed images and catalogs intended for searching and re-analysis.

In a study signed by David O’Ryan and Pablo Gómez, scientists associated with the European Space Agency (ESA), a method was demonstrated that combines artificial intelligence and expert verification to quickly extract unusual and scientifically interesting objects in vast sets of images. Their system, named AnomalyMatch, systematically ranked about 99.6 million image cutouts from the HLA, and then the authors reviewed the highest-ranked candidates. After verification, more than 1300 anomalies were confirmed, and more than 800 cases were marked as previously undocumented in the literature they used for checks.

What is considered an anomaly in astronomy and why is it important

In this context, “anomaly” does not necessarily mean something inexplicable, but anything that statistically deviates from the usual appearance in astronomical images. These are objects and configurations that appear rarely but carry significant information about extreme processes: colliding and interacting galaxies, ring galaxies, “jellyfish” galaxies with tails of gas and young stars, gravitational lenses creating arcs and rings, unusual jets, projection overlaps of sources, and other morphological extremes.

Such examples often serve as natural laboratories. Galaxy collisions reveal how gravity and gas together shape galactic evolution; gravitational lenses allow the study of very distant galaxies and the estimation of mass distribution, including the share of dark matter; and “jellyfish” show how the environment in galaxy clusters can “strip” gas and accelerate the quenching of star formation. The problem is that they are rare: if they appear in only parts per thousand among millions of images, manual browsing becomes too slow, expensive, and prone to interesting cases simply getting lost in the noise.

In the classical way of working, anomalies are found by targeted searches of smaller samples or by chance – a researcher “stumbles upon” an unusual object while doing something completely different. Given the explosion of data, this is becoming less and less sustainable. In the era of wide-field sky surveys and massive catalogs, the key question is no longer just “what do we know”, but also “what have we not even had time to look at yet”.

Why the Hubble Legacy Archive is an ideal testing ground

The Hubble Legacy Archive is particularly interesting for such experiments because it covers a long time span and a great diversity of targets, from nearby nebulae to deep fields with thousands of galaxies. According to official archive information, the HLA focuses on observations up to October 1, 2017, while for newer data it relies on connected systems and additional “high-level” products. This time boundary does not diminish the importance of the HLA: archival data often gain new value when new algorithms and new scientific questions appear, as they allow for re-“combing” with different criteria than at the moment when the images were created.

Until now, Hubble's archive was mainly used specifically, for example, to search for images of a known object or for thematic samples. A completely systematic review of the entire archive, with one procedure and the same criteria, was logistically almost impossible. AnomalyMatch is therefore important also as a demonstration of a concept: instead of the archive being just a storage, it becomes an active field for discovering what is “hidden in plain sight” in the data.

How AnomalyMatch works

Fewer labels, more data

Common machine learning approaches work best when they have many labeled examples. But rare classes in astronomy often do not have thousands of confirmed cases, and sometimes it is a matter of dozens. That is why AnomalyMatch combines semi-supervised learning (few labeled + many unlabeled data) and active learning (an expert iteratively checks the model's suggestions and thereby improves it). The idea is practical: the model first makes a rough selection, and then learns from human feedback to reduce the number of “false alarms” and increase precision for what is truly interesting.

From ranking list to catalog

In practice, the process looks like this:
  • The model is initially trained on a limited number of confirmed examples of rare morphologies and on a large amount of typical sources.
  • The neural network goes through a large set of images and assigns a rank to each cutout – how likely it is to be outside usual patterns.
  • The expert reviews the top of the list, confirms or rejects suggestions, and thereby creates better labels for the next cycle.
  • After several iterations, the model becomes more precise in separating real anomalies from processing artifacts, noise, edge cases, and projection overlaps.
An important message is that speed is not the same as confirmation. AI does not “explain” physics here, but saves time on selection. The final scientific interpretation still requires expert review, and often additional data such as spectroscopy, comparison with other instruments, or more detailed photometry.

What was found: more than 1300 confirmed anomalies

In the search of the HLA, the system processed approximately 99.6 million image cutouts. After ranking and expert verification, the authors confirmed more than 1300 anomalies, and marked more than 800 as previously undocumented in the literature used in the checks. The paper itself also lists examples by category, including a large number of interacting galaxies, candidates for gravitational lenses, and other rare morphologies.

Among the extracted categories, the following stand out in particular:
  • candidates for gravitational lenses and lensing arcs, which serve as natural telescopes and as a tool for measuring lens mass
  • “jellyfish” galaxies, which indicate gas stripping and changes in star formation in dense environments
  • colliding galaxies and interacting galaxies, which enable statistical study of merging and its consequences
  • ring galaxies and other unusual morphologies associated with dynamic disturbances
Such diversity is not accidental: the goal was not to find one specific class, but to systematically highlight everything that deviates from the typical appearance of sources in the archive.

Gravitational lenses: rare systems with great scientific gain

Gravitational lensing is among the key phenomena of modern cosmology. A massive galaxy or cluster of galaxies can bend the light of a more distant object and create deformed images in the form of arcs, multiple images, or almost complete rings. In favorable cases, the lens magnifies the brightness of distant galaxies and allows the study of structures that would otherwise be too faint. At the same time, the geometry of lensing provides information about the mass distribution of the lens, including the contribution of dark matter.

That is why new candidates are valuable even before final confirmation: they become a starting point for further checks and observations. But lenses are hard to find, especially in heterogeneous archival images where the object was not necessarily imaged with that intention. Arcs can be faint, “crumpled” by background noise, or mixed with other sources. Algorithmic ranking helps precisely here: to recognize repeating visual patterns and extract them from the mass of data, and then leave the final decision to the expert.

“Jellyfish”, collisions, and rings: what they tell us about galaxy evolution

Jellyfish galaxies and gas stripping

“Jellyfish” galaxies are recognizable by tails of gas and young stars trailing behind the galaxy as it passes through a denser medium, for example within galaxy clusters. Such tails point to a process in which gas is “stripped”, which can dramatically change the future of the galaxy because gas represents fuel for the creation of new stars. Each new candidate is valuable for comparisons: in what environments do tails form, how long do they last, how does the rate of star formation change, and how dependent is the process on the mass of the galaxy and the speed of passage through the medium.

Collisions and interactions as a foundation of growth

Galaxies grow through interactions and mergers, but each collision has its own geometry, mass ratio, amount of gas, and gravitational environment. Therefore, a single “beautiful” example is not enough: samples are needed. Large collections of candidates help move from anecdotes to statistics, test how often collisions create tidal tails, when episodes of intense star formation are ignited, how morphology changes through merger phases, and how such processes affect the growth of central black holes and the distribution of stellar mass.

Ring galaxies and shock waves

Ring galaxies are often associated with the passage of another object through the disk, which can trigger a wave of gas compression and the formation of a ring of enhanced star formation. But a ring-like appearance can also arise due to projection or overlap of sources, so verification is necessary. The combination of AI ranking and human interpretation proves practical here: the algorithm narrows the search, and the astronomer then assesses whether it is a real structure or a visual trick, and determines what additional analyses are needed for confirmation.

Where the human fits in: AI speeds up the search but does not “conclude”

Scientists warn that unusual shapes in astronomical images can be a consequence of the instrument, processing, or noise, especially in the edge parts of detectors or with very weak signals. Therefore, human verification remains key, just like additional observations before an object enters the “confirmed rare” category with a clear physical interpretation. AnomalyMatch is thus best described as a time multiplier: instead of randomly browsing images for days, the system gives a ranking list and directs attention to the most likely cases, while the human retains control over assessment and conclusions.

Citizen science and the new role of algorithms

Citizen science projects, in which volunteers help classify galaxies, have already shown that human perception can be extremely effective, especially with morphologies that algorithms struggle to capture when signals are weak or complex. But the volume of modern archives is growing faster than can be compensated by human labor, even with a large number of participants. In this sense, AI tools do not have to be a replacement, but a filter and partner: they can extract potentially interesting cases by pre-selection, and citizen science and experts can then confirm, reject, and supplement classifications. Such a “hybrid” approach opens the possibility for rare phenomena to be discovered faster, while retaining verifiability and quality.

Broader context: Euclid and incoming waves of data

This approach is developing at a time when astronomy is entering an era of massive surveys. ESA's Euclid mission is already generating large amounts of data for cosmology and the structure of the universe, and similar challenges are carried by other wide-field projects. In that environment, the ability to quickly recognize rare objects becomes a strategic advantage: it enables faster tracking of candidates, better planning of additional observations, and more efficient use of limited instrument time.

At the same time, work on the Hubble archive also shows another dimension: archives from past decades are not exhausted. On the contrary, as tools develop, so do the chances that objects which went unnoticed for years will come to light in already existing data. For science, this means that the value of a mission can be extended far beyond its “active” period, and for the public that discoveries do not happen only on new telescopes, but also in old images – when someone looks at them in a new way.

Open source and catalog as a call to the community

Associated with the paper on anomalies are publicly available code and data repositories, which enables independent verification and further upgrading. Such openness changes the dynamic: instead of results being kept within one team, the catalog can become a starting point for additional research – from detailed confirmation of gravitational lenses, through targeted search for rare galaxy shapes, to training models for specific subclasses. This also accelerates the research cycle itself: rare objects come “in line” faster, and questions that once required months of manual work can turn into a problem of selection and priority, with clear human verification as the final step.

Sources:
- arXiv – abstract and full text of the paper “Identifying Astrophysical Anomalies in 99.6 Million Cutouts from the Hubble Legacy Archive Using AnomalyMatch” (link)
- arXiv – methodological paper on AnomalyMatch and description of semi-supervised and active learning (link)
- ESA (GitHub) – official repository of the AnomalyMatch project (link)
- STScI – official description of the Hubble Legacy Archive and information on archive scope (link)
- ESA Datalabs – platform for working with large scientific data sets (link)

Find accommodation nearby

Creation time: 4 hours ago

Science & tech desk

Our Science and Technology Editorial Desk was born from a long-standing passion for exploring, interpreting, and bringing complex topics closer to everyday readers. It is written by employees and volunteers who have followed the development of science and technological innovation for decades, from laboratory discoveries to solutions that change daily life. Although we write in the plural, every article is authored by a real person with extensive editorial and journalistic experience, and deep respect for facts and verifiable information.

Our editorial team bases its work on the belief that science is strongest when it is accessible to everyone. That is why we strive for clarity, precision, and readability, without oversimplifying in a way that would compromise the quality of the content. We often spend hours studying research papers, technical documents, and expert sources in order to present each topic in a way that will interest rather than burden the reader. In every article, we aim to connect scientific insights with real life, showing how ideas from research centres, universities, and technology labs shape the world around us.

Our long experience in journalism allows us to recognize what is truly important for the reader, whether it is progress in artificial intelligence, medical breakthroughs, energy solutions, space missions, or devices that enter our everyday lives before we even imagine their possibilities. Our view of technology is not purely technical; we are also interested in the human stories behind major advances – researchers who spend years completing projects, engineers who turn ideas into functional systems, and visionaries who push the boundaries of what is possible.

A strong sense of responsibility guides our work as well. We want readers to trust the information we provide, so we verify sources, compare data, and avoid rushing to publish when something is not fully clear. Trust is built more slowly than news is written, but we believe that only such journalism has lasting value.

To us, technology is more than devices, and science is more than theory. These are fields that drive progress, shape society, and create new opportunities for everyone who wants to understand how the world works today and where it is heading tomorrow. That is why we approach every topic with seriousness but also with curiosity, because curiosity opens the door to the best stories.

Our mission is to bring readers closer to a world that is changing faster than ever before, with the conviction that quality journalism can be a bridge between experts, innovators, and all those who want to understand what happens behind the headlines. In this we see our true task: to transform the complex into the understandable, the distant into the familiar, and the unknown into the inspiring.

NOTE FOR OUR READERS
Karlobag.eu provides news, analyses and information on global events and topics of interest to readers worldwide. All published information is for informational purposes only.
We emphasize that we are not experts in scientific, medical, financial or legal fields. Therefore, before making any decisions based on the information from our portal, we recommend that you consult with qualified experts.
Karlobag.eu may contain links to external third-party sites, including affiliate links and sponsored content. If you purchase a product or service through these links, we may earn a commission. We have no control over the content or policies of these sites and assume no responsibility for their accuracy, availability or any transactions conducted through them.
If we publish information about events or ticket sales, please note that we do not sell tickets either directly or via intermediaries. Our portal solely informs readers about events and purchasing opportunities through external sales platforms. We connect readers with partners offering ticket sales services, but do not guarantee their availability, prices or purchase conditions. All ticket information is obtained from third parties and may be subject to change without prior notice. We recommend that you thoroughly check the sales conditions with the selected partner before any purchase, as the Karlobag.eu portal does not assume responsibility for transactions or ticket sale conditions.
All information on our portal is subject to change without prior notice. By using this portal, you agree to read the content at your own risk.