Why do we effortlessly discern every word in our native language, while in a foreign language everything merges into a monotonous noise? A question that plagues language learners and intrigues scientists has received a convincing neuroscientific answer these days. Two complementary studies from the University of California, San Francisco (UCSF) record, for the first time at this level of detail, how the superior temporal gyrus (STG) in the human brain "learns" the statistics and sound patterns of the language we are exposed to over time – and then, in a fraction of a second, marks where a word begins and where it ends. The research team led by neurosurgeon Edward Chang showed that the STG responds not only to basic sounds (vowels and consonants) but also to complete word forms and to boundaries between words. When we listen to languages we know well, specialized neural circuits light up in the STG; when listening to an unknown language – the same circuits remain "dark".
The authors of the paper explain that when we speak at a natural pace, we do not leave pauses between words. Nevertheless, speakers hear clear boundaries without effort. Until recently, it was assumed that boundaries were recognized by parts of the brain serving meaning comprehension, rather than primary sound processing. Newer findings shift the focus to the STG – an auditory-language hub located above the temporal lobe – which was traditionally associated with recognizing sounds (vowels and consonants) and phonetic features. Now, however, it has been shown that the phonotactic regularities of a language (what is allowed in real speech and what is not), typical rhythms, and word frequencies are "imprinted" in the STG with years of exposure. When such regularities exist in the STG's memory, boundaries between words emerge almost "automatically".
What exactly did the researchers measure and who did they focus on
In the larger of the two studies, brain activities were recorded in 34 volunteers who already had electrodes implanted for clinical monitoring of epilepsy. Most spoke English, Spanish, or Mandarin Chinese as their native language, and eight participants were bilingual. All listened to sentences in three languages – some familiar, some completely unknown – while researchers analyzed activity patterns in the STG using machine learning. When the language was known, enhanced responses aligned with word-related features appeared in the STG: word boundaries, frequency, and language-specific sound sequences. These responses did not occur when subjects listened to a language they did not master. In other words, the STG processes universal acoustic-phonetic features in all languages, but only experience with a specific language "amplifies" the signals that accompany the words of that language.
The second study goes a step deeper: how exactly does the STG mark the beginning and end of a word? High temporal resolution recordings show a characteristic "reset" – a short, sharp drop in activity at the moment a word ends – after which neural populations instantly transition into a state ready for the next word. This "reboot" must happen at a speed of several times per second, because fluent speech typically contains multiple words in one second. Precisely this dynamics explains how a listener can follow speech without slowing down or losing the thread, even when words are short or glued together by coarticulatory transitions.
Why this is important: STG as a bridge between sound and lexicon
In classical models of language listening, it was assumed that the STG processes "lower" levels – acoustics and phonetics – and that word and meaning recognition belongs to "higher" language areas. New findings strongly support a different, distributed view: information about complete word forms spills over in the STG already on a very early time scale. In other words, the brain does not wait for semantics to decide what a word is; populations of neurons whose activity coincides with boundaries and whole words already exist in the STG, and this recognition stems from experience with the sound of the language. Therefore, segmentation is not just a consequence of "understanding content", but also the result of years of learning sound patterns.
This insight precisely explains the sensitive difference between a native and a foreign language. In the native language, the brain is "trained" by millions of exposures: it recognizes typical combinations of consonants and vowels, distributions of syllable lengths, and even the frequency of individual words. This makes it fast and efficient in segmentation. In a foreign language, all these parameters are not stably learned, so the STG does not amplify signals at places where boundaries should be. The result is the experience of a continuous audio tape.
Bilingualism and language experience: can the brain have two "sets of rules"?
Participants who spoke two languages fluently showed enhanced boundary signals in both languages – but not in a third, unknown one. This suggests that the STG learns language-specific statistics in parallel for multiple languages, without necessary mixing, provided that exposure is sufficient and long-term. In practice, this explains why advanced bilingual speakers "hear" words equally well in both languages, even though their phonotactic patterns (rules about permitted sound sequences) can be very different. For bilingualism researchers, these data are precious because they offer a neurophysiological measure of progress – instead of relying exclusively on comprehension tests, one can now monitor "boundary enhancement" in the STG as an objective biomarker of language acquisition.
Methodology: from ECoG to machine learning models
The precision of these findings rests on two technological innovations. First, intracranial recordings of brain activity (ECoG and related techniques) were used in patients who were under clinical supervision anyway. These recordings enable temporal resolution at the millisecond level and spatial resolution at the level of cortical millimeters – which is incomparably more detailed than non-invasive methods. Second, the analysis relied on machine learning models that extracted patterns associated with word segmentation and specific sound sequences of known languages from the recordings. In combination, these two pillars of methodology enabled the recording of fine dynamics: the moment of activity drop at the end of a word, the speed of "resetting", and the strength of the response to frequent words and typical sound combinations.
It is particularly significant that the STG – a region often described as the "auditory analogue" for language – showed a dual role: universal phonetic processing and specific traces of lexical segmentation. The fact that these traces intensify only when we listen to a known language is a strong argument that segmentation is a consequence of learning, and not a rigid, innate characteristic.
"Reset" between words: dynamics that enable fluent listening
In the second study, the authors document the rhythm at which the STG "resets" activity at the end of a word. A sharp drop is visible on the recordings, a sort of boundary marker, followed by a rapid rise in activity at the beginning of the next word. This dynamics can best be imagined as a trigger that ensures processing does not spill over from one word to the next. Without such resetting, boundaries would be "blurred", and the listener would quickly lose the thread. Since an average sentence contains two to three words per second, the neural segmentation system must be extremely swift and stable at the same time.
By using natural narratives, rather than just isolated words or syllables, the researchers confirmed that the same patterns appear in real listening conditions as well. at the level of the neuron population, the STG showed sensitivity to properties of complete words – their length, frequency, and position in the sentence – which is contrary to simplified models that assume exclusively "letter-to-word" processing.
From the laboratory to life: implications for language learning, clinic, and technology
Language learning: If the STG learns sound statistics and word boundaries from exposure, it is reasonable to expect that continuous listening to the target language – especially in the form of natural speech – will accelerate segmentation. Practically, this means that audiobooks, podcasts, or conversations with native speakers are steps that "feed" the STG with the data needed to distinguish words. The point is not just about vocabulary; the point is about rhythm, prosody, and typical sound sequences.
Clinic: The findings shed light on why damage to temporal regions – even with preserved hearing – can result in serious speech comprehension difficulties. If the STG fails to segment the signal, a person can "hear" but not "grasp" speech. This can explain the symptoms of certain aphasias and help in planning neurosurgical procedures and rehabilitation.
Speech recognition technology: A comparison with today's automatic speech recognition (ASR) models naturally arises. Modern neural networks increasingly utilize composition – from sound to phonemes, from phonemes to words – but the best systems also learn direct representations of words. Findings from the STG suggest that ASR systems could profit from explicit "reset" mechanisms at word boundaries and from learning language-specific phonotactic rules, just like the human brain.
How does the brain "know" where the word boundary is? A small school of phonotactics
Word boundaries are not just a function of pauses – often there are no pauses at all. Instead, segmentation relies on a series of rules and regularities. For example, in many languages, certain consonant combinations almost never begin a word but often appear within a word; the STG, under the influence of experience, begins to amplify signals precisely at places where a boundary is, statistically, most probable. A similar role is played by word frequency (frequent words "pop out" faster) and prosody – stress and rhythm – which help in boundary prediction at a physiological level.
Such "statistical literacy" of the STG does not mean that segmentation is exclusively bottom-up. On the contrary, the authors emphasize that early acoustic processing and higher linguistic processes occur in a loop. But the key novelty is that information about complete words is already present at the STG level, which does not depend on meaning, but on the sound pattern the brain has learned through years of exposure.
Why a foreign language "sounds like one long letter" – and how to bridge that
When we listen to a foreign language for the first time, we do not have a reliable map of permitted sequences and typical boundaries. The consequence is that the STG does not amplify signals at the "right" places, so we listen to a continuous stream that is not easily "cut" into words. Good news: as exposure grows, the STG adjusts its neural weights – acquires new phonotactic statistics and begins to embed boundaries. From this stems a practical recommendation for language learning: abundant, diverse, and regular listening to authentic material, even without full understanding of meaning, can accelerate segmentation and consequently facilitate vocabulary learning.
Word boundaries across languages: what is common, and what is different
English, Spanish, and Mandarin were chosen in the study because they offer an interesting spectrum of phonological and prosodic properties. English is known for combining complex consonant clusters and variable stress; Spanish is more rhythmic, with clearer syllabic boundaries; Mandarin is a tonal language, in which pitch carries distinguishing information. Despite these differences, the STG showed a shared sensitivity to basic, "phonetic" features in all languages – but enhancement at boundaries and words appeared exclusively when we know the language. In bilingual participants, enhancement was visible in both languages, confirming that the brain can maintain multiple "sets of rules" without mutual conflict.
Derived lessons for teaching and curricula
Pedagogically speaking, findings suggest that listening instruction should highlight steps that support segmentation. This includes working with short, natural clips, with progressive reduction of support (transcripts, visual cues), and exercises focused on typical sound sequences and prosodic patterns of the target language. Two-phase activities are also useful: first "listen without understanding" for STG calibration, and then processing meaning. Thus, both components are supported – statistical sound learning and semantic understanding.
From November 7 to November 19, 2025: timeline of publications
It concerns two publications released in mid-November 2025: an article in the journal Neuron (November 7, 2025) documenting the dynamics of encoding complete word forms and resetting at boundaries, and an article in Nature (November 19, 2025) which separates common and language-specific processing components in the STG, including amplified signals at word boundaries in the native (or well-known) language. Both papers are leading in the ambitious research line coordinated by neurosurgeon Edward Chang, and the releases are accompanied by summaries on university pages and scientific services.
Who these findings can help right now
Clinicians who plan and perform surgeries near the STG, because a more precise map of functions reduces the risk of postoperative difficulties with speech comprehension. Speech therapists and rehabilitation teams designing interventions for patients with damage to temporal regions. Methodologists and language teachers who structure listening exercises with an emphasis on segmentation. Engineers designing speech recognition systems and translation tools because the STG offers biological inspiration for better algorithms.
What we still don't know – and where the next steps are going
Although the results are strong, questions remain open: how universal is the "reset" across different types of speakers and recording conditions? How does a child's STG acquire these rules in early years – is the path the same as in an adult foreign language learner or is there a critical period? How fast can the STG "retrain" to new phonotactics during intensive immersion in a language? And finally, can targeted training (e.g., peripheral nerve stimulation, of which there are experiments) accelerate the acquisition of segmentation?
Practical tips in light of new insights
- Accelerate exposure to language sound. Daily listening to natural speech (podcasts, radio, conversations) "feeds" the STG with patterns needed for segmentation.
- Practice with transcripts, but gradually phase them out. First listen with text to stabilize the pattern, then remove the support and test "hearing only".
- Focus on rhythm and typical sound sequences. Short exercises in recognizing typical word beginnings/endings enhance sensitivity to boundaries.
- Use multiple speakers and registers. Variety "trains" the STG to distinguish invariant rules from idiosyncratic styles.
In short: the new papers bring a neurological basis for the experience we all have – we hear our native language as a series of clear words because our STG has learned the statistics of its sound for years. A foreign language differs not because it is illogical or "difficult", but because our brain has not yet learned its segmentation rules. Fortunately, the STG is plastic: with enough exposure, that language too begins to "unravel" into recognizable words – and much faster than we think.
Find accommodation nearby
Creation time: 7 hours ago