Overtone focusing in biphonic tuvan throat singing
- Christopher Bergevin ,
- Chandan Narayan,
- Joy Williams,
- Natasha Mhatre,
- Jennifer KE Steeves,
- Joshua GW Bernstein,
- Brad Story
- Physics and Astronomy, York University, Canada;
- Centre for Vision Research, York University, Canada;
- Fields Institute for Research in Mathematical Sciences, Canada;
- Kavli Institute of Theoretical Physics, University of California, United States;
- Languages, Literatures and Linguistics, York University, Canada;
- York MRI Facility, York University, Canada;
- Biology, Western University, Canada;
- Psychology, York University, Canada;
- National Military Audiology & Speech Pathology Center, Walter Reed National Military Medical Center, United States;
- Speech, Language, and Hearing Sciences, University of Arizona, United States
Research Article Feb 12, 2020
Cite as: eLife 2020;9:e50476 doi: 10.7554/eLife.50476
Khoomei is a unique singing style originating from the republic of Tuva in central Asia. Singers produce two pitches simultaneously: a booming low-frequency rumble alongside a hovering high-pitched whistle-like tone. The biomechanics of this biphonation are not well-understood. Here, we use sound analysis, dynamic magnetic resonance imaging, and vocal tract modeling to demonstrate how biphonation is achieved by modulating vocal tract morphology. Tuvan singers show remarkable control in shaping their vocal tract to narrowly focus the harmonics (or overtones) emanating from their vocal cords. The biphonic sound is a combination of the fundamental pitch and a focused filter state, which is at the higher pitch (1–2 kHz) and formed by merging two formants, thereby greatly enhancing sound-production in a very narrow frequency range. Most importantly, we demonstrate that this biphonation is a phenomenon arising from linear filtering rather than from a nonlinear source.eLife digest
The republic of Tuva, a remote territory in southern Russia located on the border with Mongolia, is perhaps best known for its vast mountainous geography and the unique cultural practice of “throat singing”. These singers simultaneously create two different pitches: a low-pitched drone, along with a hovering whistle above it. This practice has deep cultural roots and has now been shared more broadly via world music performances and the 1999 documentary Genghis Blues.
Despite many scientists being fascinated by throat singing, it was unclear precisely how throat singers could create two unique pitches. Singing and speaking in general involves making sounds by vibrating the vocal cords found deep in the throat, and then shaping those sounds with the tongue, teeth and lips as they move up the vocal tract and out of the body. Previous studies using static images taken with magnetic resonance imaging (MRI) suggested how Tuvan singers might produce the two pitches, but a mechanistic understanding of throat singing was far from complete.
Now, Bergevin et al. have better pinpointed how throat singers can produce their unique sound. The analysis involved high quality audio recordings of three Tuvan singers and dynamic MRI recordings of the movements of one of those singers. The images showed changes in the singer’s vocal tract as they sang inside an MRI scanner, providing key information needed to create a computer model of the process.
This approach revealed that Tuvan singers can create two pitches simultaneously by forming precise constrictions in their vocal tract. One key constriction occurs when tip of the tongue nearly touches a ridge on the roof of the mouth, and a second constriction is formed by the base of the tongue. The computer model helped explain that these two constrictions produce the distinctive sounds of throat singing by selectively amplifying a narrow set of high frequency notes that are made by the vocal cords. Together these discoveries show how very small, targeted movements of the tongue can produce distinctive sounds.Introduction
In the years preceding his death, Richard Feynman had been attempting to visit the small republic of Tuva located in geographic center of Asia (Leighton, 2000). A key catalyst came from Kip Thorne, who had gifted him a record called Melody tuvy, featuring a Tuvan singing in a style known as Khoomei, or Xöömij. Although he was never successful in visiting Tuva, Feynman was nonetheless captivated by Khoomei, which can be best described as a high-pitched tone, similar to a whistle carrying a melody, hovering above a constant booming low-frequency rumble. This is a form of biphonation, or in Feynman’s own words, “a man with two voices”. Khoomei, now a part of the UNESCO Intangible Cultural Heritage of Humanity, is characterized as “the simultaneous performance by one singer of a held pitch in the lower register and a melody … in the higher register” (Aksenov, 1973). How, indeed, does one singer produce two pitches at one time? Even today, the biophysical underpinnings of this biphonic human vocal style are not fully understood.
Normally, when a singer voices a song or speech, their vocal folds vibrate at a fundamental frequency (f0), generating oscillating airflow, forming the so-called source. This vibration is not, however, simply sinusoidal, as it also produces a series of harmonics tones (i.e., integer multiples of f0) (Figure 1). Harmonic frequencies in this sound above f0 are called overtones. Upon emanating from the vocal folds, they are then sculpted by the vocal tract, which acts as a spectral filter. The vocal-tract filter has multiple resonances that accentuate certain clusters of overtones, creating formants. When speaking, we change the shape of our vocal tract to shift formants in systematic ways characteristic of vowel and consonant sounds. Indeed, singing largely uses vowel-like sounds (Story, 2016). In most singing, the listener perceives only a single pitch associated with the f0 of the vocal production, with the formant resonances determining the timbre. Khoomei has two strongly emphasized pitches: a low-pitch drone associated with the f0
, plus a melody carried by variation in the higher frequency formant that can change independently (Kob, 2004). Two possible loci for this biphonic property are the source and/or the filter. Figure 1
A source-based explanation could involve different mechanisms, such as two vibrating nonlinear sound sources in the syrinx of birds, which produce multiple notes that are harmonically unrelated (Fee et al., 1998; Zollinger et al., 2008). Humans however are generally considered to have only a single source, the vocal folds. But there are an alternative possibilities: for instance, the source could be nonlinear and produce harmonically-unrelated sounds. For example, aerodynamic instabilities are known to produce biphonation (Mahrt et al., 2016). Further, Khoomei often involves dramatic and sudden transitions from simple tonal singing to biophonation (see Figure 1 and the Appendix for associated audio samples). Such abrupt changes are often considered hallmarks of physiological nonlinearity (Goldberger et al., 2002), and vocal production can generally be nonlinear in nature (Herzel and Reuter, 1996; Mergell and Herzel, 1997; Fitch et al., 2002; Suthers et al., 2006). Therefore it remains possible that biphonation arises from nonlinear source considerations.
Vocal tract shaping, a filter-based framework, provides an alternative explanation for biphonation. In one seminal study of Tuvan throat singing, Levin and Edgerton examined a wide variety of song types and suggested that there were three components at play. The first two (‘tuning a harmonic’ relative to the filter and lengthening the closed phase of the vocal fold vibration) represented a coupling between source and filter. But it was the third, narrowing of the formant, that appeared crucial. Yet, the authors offered little empirical justification for how these effects are produced by the vocal tract shape in the presented radiographs. Thus it remains unclear how the high-pitched formant in Khoomei was formed (Grawunder, 2009). Another study (Adachi and Yamada, 1999) examined a throat singer using magnetic resonance imaging (MRI) and captured static images of the vocal tract shape during singing. These images were then used in a computational model to produce synthesized song. Adachi and Yamada argued that a “rear cavity” was formed in the vocal tract and its resonance was essential to biphonation. However, their MRI data reveal limited detail since they were static images of singers already in the biphonation state. Small variations in vocal tract geometry can have pronounced effects on produced song (Story et al., 1996) and data from static MRI would reveal little about how and which parts of the vocal tract change shape as the singers transition from simple tonal song to biphonation. To understand which features of vocal tract morphology are crucial to biophonation, a dynamic description of vocal tract morphology would be required.
Here we study the dynamic changes in the vocal tracts of multiple expert practitioners from Tuva as they produce Khoomei. We use MRI to acquire volumetric 3D shape of the vocal tract of a singer during biphonation. Then, we capture the dynamic changes in a midsagittal slice of the vocal tract as singers transition from tonal to biphonic singing while making simultaneous audio recordings of the song. We use these empirical data to guide our use of a computational model, which allows us to gain insight into which features of vocal tract morphology are responsible for the singing phonetics observed during biophonic Khoomei song (e.g., Story, 2016). We focus specifically on the Sygyt (or Sigit) style of Khoomei (Aksenov, 1973).ResultsDiscussionMaterials and methodsAppendix 1ReferencesDecision letterAuthor responseArticle and author informationMetrics
Categories and tags
- Research Article
- Physics of Living Systems
- Tuvan throat singing
- acoustic phonetics
- speech biomechanics