M. Castellengo and N. Henrich Bernardoniba: Interplay between harmonics and formants in singing : when vowels become music

Interplay between harmonics and formants in singing : when vowelsbecome music

M. Castellengo and N. Henrich Bernardoniba
LAM/d’Alembert, 11 rue de Lourmel, 75015 Paris, FrancebGIPSA-lab, 11 rue des Math ́ematiques, 38402 Grenoble, France

Nathalie Henrich-Bernardoni

In human speech, the production of vowels consists in strengthening some specific areas of the harmonic spectrum, known as formants, by adjusting vocal-tract acoustical resonances with articulators such as tongue, lips, velum, jaw, and larynx. In singing, a compromise is often sought between the frequency of harmonics and resonance frequencies, sometimes at the expense of vowel perception. In some vocal cultures, this link between harmonic frequency and resonance frequency is skilfully adjusted. A melody is generated independently of the tonal melody related to vocal-fold vibrations.
This is the case of harmonic singing, overtone singing or Xhoomij, practiced in Central Asia, but also of singing by Xhosa women in South Africa. In this paper, the adjustmentsbetween harmonics and formants are explored on a wide range of commercial singing recordings and experimental recordings in laboratory. Three main strategies are described from both acoustical and musical point of view. In a first case, the spectral melody is produced by a play on the first formant (F1). The first harmonic frequency is often kept constant and at low values due to period doubling induced by a ventricular vibration. In a second case, the spectral melody is produced by a play on the second formant (F2), with a higher frequency of the first harmonic. Complex spectral melody can also be developed by a vocal game on the first two formants. In particular, we will illustrate and discuss the cases where the two first formants evolve while remaining in an octave ratio (F2 = 2F1).1Introduction When producing vowels in speech and singing, the fluid-structure interaction between air expelled from the lungs and moving walls induces vocal-folds vibration. This vibration generates a harmonic acoustic source, which propagates through the vocal tract (laryngeal and pharyngeal cavities, mouth and nasal cavities). The vocal-tract area function from glottis to lips is controlled by the speech articulators (tongue, lips, jaw, velum, larynx), which contributes to the adjustment of vocal-tract resonances (Ri). The resonances shape the harmonic voiced sound spectrum, in boosting acoustical energy in frequency bands designated in acoustics by the term formants (Fi). The frequency ratio between the first two formants F1 and F2 is perceptually coded into vowels.C7C6C5C4C3Hz10020030040050080010002000150025003000ii200 HzF2()56789101265 Hz567891012F1Figure 1: Mean values of formant frequencies F1 (blue) andF2 (red) on a musical scale. On left panel, the vowels have been grouped for which the two formants vary conjointly.Several singing techniques illustrate harmonic-resonance adjustments. Possible interactions depending on sung pitch are shown in Figure 1, which presents the mean values of the two first formant frequencies for a male speaking voice. The vowel location on the diagram is only indicative. It depends on individual peculiarities and the chosen language. Besides, values are given for male speech, as the songs studied here are mainly produced by male singers. The first formant F1 ranges from 300 Hz (/i/) to 800 Hz (/a/), which corresponds on a musical scale to E4-G5. It covers the high range in male voices, the medium and high range in female voices. In western classical singing, a tuning between the vocal-folds vibratory frequency (f0 = H1) and vocal-tract first-resonance frequency (R1) is sometimes mandatory to allow a loud and comfortable voice production, such as in the case of soprano high range [1, 2, 3] or, more generally when the sung pitch gets close to R1 [3]. To find a good balance between resonance adjustments and clarity of vowels constitutes a great part of the classical singer’s training. Such singers have to be able to sing a text on a wide range of pitches. In traditional Croatian folk singing [4], in Bulgarian women’s singing [5] or in Broadway Musicals [6], a systematic tuning is observed between the second harmonic (H2=2f0) and R1 for those vowels which do not have a too low first-resonance frequency. This practice gives power and clarity to the voice. It is produced by means of vowels /o/ /ɔ/ /ɛ/ /a/ in a limited pitch range: 220 to 320 Hz for male singers, 350-500 Hz for female singers (see Figure 2).Figure 2: Illustration on a musical scale of vowels and pitches for which a tuning R1:2f0 is possible. The blue notes present the musical pitches.The second formant F2 ranges from 600 Hz for vowel /u/ to 2400 Hz for vowel /i/ within the musical range E5-E7 ( seeFigure 1). Glottal fundamental frequency may come close to resonance frequency only for low-F2 vowels such as /u/ and /o/. In most cases, F2 lies well above f0, and it globally contributes to the voice quality. F2:f0 tunings have been observed in the soprano high range [2]. But most F2:Hi (i>1) tunings observed in the literature are reported for techniques of harmonic singing, which we shall now address. The literature will first be briefly reviewed. The tuning strategies will then be discussed on the basis of a wide range of commercial recordings. These observations will be supplemented by a case study of a Mongolian singer by means of simultaneous acoustical recordings and ultrasound observations of tongue motion. 2Harmonic singing : the state of the art A spectral melody and low-pitch tone – In the singing techniques mentioned above, a melody is produced by varying the vocal-folds vibratory frequency and the resonances are tuned depending on vowel and sound quality. Roles are reversed in harmonic singing.


DiscographyCD “Inédit Mongolie” – Auvidis, W 260009 (1989), tracks: 4 (X1); 5 (X2; X7);

6 (X3).CD “Voices from the center of Asia” – Smithsonian Folkways, SF 400017 (1990), tracks: 1 (K5);

4 (X5); 9 (K11); 14 (K10; X6);

18 (K4). CD “Les voix du monde”, CNRS-Harmonia mundi, CMX 374 1010.12 (1996),

CD-II-37 (K3). CD “The Heart of Dharma”, Ellipsis Arts (1996), track 2 (K3).

Dave Dargie demonstration tape, track A-1 (F).

Alash Ensemble – Singers : Bady Dorzhu-Ondar (K6; K7; K8);

Kongar-ool Ondar (X4).

Bayarbaatar Davaasuren, (2013), Gipsa-Lab (K9).

Data from H. Smith (1967), lama from the Gyutu Monastery near Dalhousie, recorded in 1964 (K2).

BIBLIOGRAPHY References[1]E. Joliveau, J. Smith and J. Wolfe, “Vocal tract resonances in singing: The soprano voice”, J. Acoust. Soc. Am. 116 (4), 2434-2439 (2004)[2]M. Garnier, N. Henrich, J. Smith, J. Wolfe, « Vocal tract adjustments in the high soprano range, J. Acoust. Soc. Am. 127 (6), 3771-3780 (2010)[3]N. Henrich, J. Smith, and J. Wolfe, “Vocal tract resonances in singing: Strategies used by sopranos, altos, tenors, and baritones”, J. Acoust. Soc. Am. 129 (2), 1024-1035 (2011)[4]P. Boersma and G. Kovavic, “ Spectral characteristics of three syles of Croatian folk singing”, J. Acoust. Soc. Am. 119 (3), 1805-1816 (2006)[5]N. Henrich, M. Kiek, J. Smith, and J. Wolfe, “Resonance strategies in Bulgarian women’s singing”, Logopedics Phoniatrics Vocology 32, 171-177 (2007)[6]T. Bourne, M. Garnier, “Physiological and acoustic characteristics of the female music theater voice”, J. Acoust. Soc. Am.131 (2), 1586-1594 (2012)[7]M. Garcia jr, “Mémoire sur la voix humaine; réimpression augmentée de quelques observations nouvelles sur les sons simultanés”, p.24, Paris: Duverger (1840)[8]H. Smith, K.N. Stevens and R.S. Tomlinson, “On an unusual mode of chanting by certain Tibetan lamas”, J. Acoust. Soc. Am.41 (5), 1262-1264 (1967) [9]G. Bloothooft, E. Bringmann, M. Van Cappellen, J.B. Van Luippen, et al. “Acoustics and perception of overtone singing” J. Acoust. Soc. Am.92 (4), 1827-1836 (1992)[10]F. Klingholz, “Overtone singing: productive mechanisms and acoustic data”, J. of Voice 7 (2), 118-122 (1993)[11]H. K. Schutte, D.G. Miller and J.G. Sveč, “Measurement of formant frequencies and bandwith in singing”, J. of Voice 9 (3), 290-296 (1995)[12]L. Dmitriev, B. Chernov and V. Maslow, “Functioning of the Voice Mechanism in Double Voice Touvinian Singing”, Folia Phoniatrica 36, 193-197 (1983)[13]L. Fuks, B. Hammmarberg and J. Sundberg, “A self-sustained vocal-ventricular phonation mode: acoustical, aerodynamic and glottographic evidences”, TMH-QPSR3, 49-59 (1998) [14]J. G. Sveč, H. K. Schutte and D. G. Miller, “A subharmonic vibratory pattern in normal vocal folds”, J. of Speech and Hearing Research39, 135-143 (1996)[15]L. Bailly, N. Henrich and X. Perlorson, “Vocal fold and ventricular vocal fold vibration in period-doubling phonation: physiological description and aerodynamic modeling”, J. Acoust. Soc. Am. 127 (5), 3212-3222 (2010)[16]A.N. Askenov, “Tuvin folk music”, Asian Music4 (2), 7- 18 (1973)[17]D. Dargie, “Xhosa music: its techniques and instruments, with a collection of songs”, Cape Town: David Philip[18]H. Zemp and T. Q. Hai, “Recherches expérimentales sur le chant diphonique”, Cahiers d’ethnomusicologie4, 27-68 (1991)[19]T. C. Levin and M. E. Edgerton, “The Throat Singers of Tuva”, Scientific American 218 (3), 70-77(1999) and related video files (X-rays) [20]J. Curtet, “La transmission du höömij, un art du timbre vocal : ethnomusicology et histoire du chant diphonique mongol”, Thèse de doctorat, Université de Rennes 2. [21]M. Kob, “Analysis and modeling of overtone singing in the sygyt style”, Applied acoustics65 (12), 1249-1259 (2004)[22]C. Tsai, Y. Shau and T. Hsiao, “False vocal fold surface waves during Sygyt singing: A hypothesis”, Proc. ICVBP, (2004)[23]S. Adachi and M. Yamada, “An acoustical study of sound production in biphonic singing, Xöömij”, J. Acoust. Soc. Am. 105 (5), 2920-2932 (1999)[24]K.-I. Sakakibara, H. Imagawa, T. Konishi, K. Kondo et al, “Vocal fold and false vocal fold vibrations in throat singing and synthesis of Khöömei”, Proc. ICMC,(2001)[25]P. Lindestad, M. Södersten, B. Merker and S. Granqvist, “Voice source characteristcs in Mongolian “throat singing” studied with high-speed imaging technique, acoustic spectra, and inverse filtering”, J. of voice15 (1), 78-85 (2001)[26]P. Cosi and G. Tisato, “On the magic of overtone singing”,Voce, Parlato. Studi in onore di Franco Ferrero, 83-100 (2003)[27]T. Hueber, G. Chollet, B. Denby, M. Stone, “Acquisition of ultrasound, video and acoustic speech data for a silent-speech interface application”, Proc. of ISSP, 365-369 (2008)[28]H. Zemp and T.Q. Hai, “Le chant des harmoniques”, film 16 mm, Paris: Musée de l’Homme and CNRS-AV http://videotheque.cnrs.fr/doc=606


The full article can be read  by clicking the link below




Acoustics and Perception of Overtone Singing.Gerrit Bloothooft, Eldrid Bringmann,Marieke van Capellen, Jolanda B. van Luipen and Koen P. Thomassen in Journal of theAcoustical Society of America, Vol. 92, No. 4, Part 1, pages 1827–1836; October 1992.

Reise ins Asiatische Tuwa.Otto J. Mänchen-Helfen. Verlag Der Bucherkreis, 1931. Pub-lished in English as Journey to Tuva: An Eyewitness Account of Tannu-Tuva in 1929.Translated by Alan Leighton. Ethnographics Press, University of Southern California, 1992

.Principles of Voice Production.Ingo R. Titze. Prentice Hall, 1994

.A Tuvan Perspective on Throat Singing.Mark van Tongeren in Oideion: The PerformingArts Worldwide, Vol. 2, pages 293–312. Edited by Wim van Zanten and Marjolijn van Roon.Centre of Non-Western Studies, University of Leiden, 1995

.The Hundred Thousand Fools of God: Musical Travels in Central Asia (andQueens, New York).Theodore Levin. Indiana University Press, 1997.


1992 Oct;92(4 Pt 1):1827-36.

Acoustics and perception of overtone singing.


Overtone singing, a technique of Asian origin, is a special type of voice production resulting in a very pronounced, high and separate tone that can be heard over a more or less constant drone. An acoustic analysis is presented of the phenomenon and the results are described in terms of the classical theory of speech production. The overtone sound may be interpreted as the result of an interaction of closely spaced formants. For the lower overtones, these may be the first and second formant, separated from the lower harmonics by a nasal pole-zero pair, as the result of a nasalized articulation shifting from /c/ to /a/, or, as an alternative, the second formant alone, separated from the first formant by the nasal pole-zero pair, again as the result of a nasalized articulation around /c/. For overtones with a frequency higher than 800 Hz, the overtone sound can be explained as a combination of the second and third formant as the result of a careful, retroflex, and rounded articulation from /c/, via schwa /e/ to /y/ and /i/ for the highest overtones. The results indicate a firm and relatively long closure of the glottis during overtone phonation. The corresponding short open duration of the glottis introduces a glottal formant that may enhance the amplitude of the intended overtone. Perception experiments showed that listeners categorized the overtone sounds differently from normally sung vowels, which possibly has its basis in an independent perception of the small bandwidth of the resonance underlying the overtone. Their verbal judgments were in agreement with the presented phonetic-acoustic explanation.


Journal of Voice

Volume 7, Issue 2, June 1993, Pages 118-122
Journal of Voice

Overtone singing: Productive mechanisms and acoustic data

Department of Phoniatrics, Ludwig-Maximilians-University, Munich, Germany

Accepted 29 May 1992, Available online 4 March 2006.


Overtone singing is where one person sings in two voices, the first voice represented by the fundamental and the second by an enhanced harmonic. Overtone singing is performed in chest register. Tuning of the first or second formant and a reduction of the formant bandwidth down to 20 Hz make harmonics prominent. Narrowing the pharynx, velar constriction, variation of the small mouth opening, and a tension of the walls of the mouth cavity are used. Changing prominent harmonics has the effect of creating an overtone melody with sustained tones, tone steps, and trillos.

Key Words

Singing voice
Formant tuning
Overtone enhancement
Voice quality

View full text




Theodore C. Levin and Michael E. Edgerton: THE THROAT SINGERS OF TUVA


Ted Levin


Michael E.Edgerton

Testing the limits of vocal ingenuity, throat-singers can create sounds unlike anything in ordinary speech and song—carrying two musical lines simultaneously, say, or harmonizing with a waterfall

From atop one of the rocky escarpments that criss-cross the south Siberian grasslands and taiga forestsof Tuva, one’s first impression is of an unalloyed si-lence as vast as the land itself. Gradually the ear habituatesto the absence of human activity. Silence dissolves into asubtle symphony of buzzing, bleating, burbling, cheeping,whistling—our onomatopoeic shorthand for the sounds ofinsects, beasts, water, birds, wind. The polyphony unfoldsslowly, its colors and rhythms by turns damped and rever-berant as they wash over the land’s shifting contours.For the seminomadic herders who call Tuva home, thesoundscape inspires a form of music that mingles with theseambient murmurings. Ringed by mountains, far from majortrade routes and overwhelmingly rural, Tuva is like a musi-cal Olduvai Gorge—a living record of a protomusical world,where natural and human-made sounds blend.Among the many ways the pastoralists interact with andrepresent their aural environment, one stands out for itssheer ingenuity: a remarkable singing technique in which asingle vocalist produces two distinct tones simultaneously.One tone is a low, sustained fundamental pitch, similar tothe drone of a bagpipe. The second is a series of flutelikeharmonics, which resonate high above the drone and maybe musically stylized to represent such sounds as the whistleof a bird, the syncopated rhythms of a mountain stream orthe lilt of a cantering horse.In the local languages, the general term for this singing iskhöömeior khoomii,from the Mongolian word for “throat.”In English it is commonly referred to as throat-singing. Somecontemporary Western musicians also have mastered thepractice and call it overtone singing, harmonic singing orharmonic chant. Such music is at once a part of an expres-sive culture and an artifact of the acoustics of the humanvoice. Trying to understand both these aspects has been achallenge for Western students of music, and each of us—one a musical ethnographer (Levin), the other a composerwith an interest in extended vocal techniques (Edgerton)—has had to traverse the unfamiliar territory of the other.Sound MimesisIn Tuva, legends about the origins of throat-singing assertthat humankind learned to sing in such a way long ago.The very first throat-singers, it is said, sought to duplicatenatural sounds whose timbres, or tonal colors, are rich inharmonics, such as gurgling water and swishing winds. Al-though the true genesis of throat-singing as practiced today isobscure, Tuvan pastoral music is intimately connected to anancient tradition of animism, the belief that natural objectsand phenomena have souls or are inhabited by spirits.According to Tuvan animism, the spirituality of mountainsand rivers is manifested not only through their physical shapeand location but also through the sounds they produce or can80Scientific AmericanSeptember 1999The Throat-Singers of TuvaVOICE OF A HORSE in Tuvan music, the igil—played hereby Andrei Chuldum-ool on the grasslands of southern Siberia(also above)—is a two-stringed upright fiddle made fromhorse hide, hair and gut and used to re-create equine sounds.Sound mimicry, the cultural basis of Tuvan music, reaches itsculmination in throat-singing.

THE THROAT SINGERS OF TUVATesting the limits of vocal ingenuity, throat-singers can create sounds unlike anything in ordinary speech and song—carrying two musical lines simultaneously, say, or harmonizing with a waterfall