Ken-ichi Sakakibara, Hiroshi Imagawa, Seiji Niimi: Vocal fold and false vocal fold vibrations in throat singing and synthesis of khoomei



Vocal fold and false vocal fold vibrations in throat singing and synthesis of khoomei
  • 2 Files
Vocal fold and false vocal fold vibrations in throat singing andsynthesis of kh¨omei
Ken-Ichi Sakakibara
, Hiroshi Imagawa
,Tomoko Konishi, Kazumasa Kondo,Emi Zuiki Murano
, Masanobu Kumada
, and Seiji Niimi
NTT Communication Science Laboratories,
The University of Tokyo,
National Rehabilitation Center for the Disabled,
International University of Health and Welfare
We observed laryngeal movements in throat singing using physiological methods: the simultaneous recording of singing sounds, EGG, and high-speed digital images. We observed vocal fold and false vocal fold vibration and estimated the vibration patterns. We also estimated the laryngeal voices by using an inverse filtering method and simulated the vibration pattern using a new physical model:
-mass model. From these observations, we propose a laryngeal voice model for throat singing and synthesis system of throat singing.
1 Introduction
Throat singing is a traditional singing style of peo-ple who live around the Altai mountains. Kh¨omeiin Tyva and Kh¨omij in Mongolia are representa-tive styles of throat singing. Throat singing is some-times called biphonic singing, multiphonic singing,overtone singing, or harmonic singing because two ormore distinct pitches (musical lines) are produced si-multaneously in one tone. One is a low sustainedfundamental pitch, called a drone, and the secondone is a whistle-like harmonic that resonates high (inthe range from 1 kHz to 3 kHz) above the drone.Many variations of singing styles in throat singingare classified according to singers and regions. How-ever, it is possible to objectively classify these varia-tions in the terms of a source-filter model in speechproduction.The laryngeal voices of throat singing can be clas-sified into (i) a pressed voice and (ii) a kargyraa voicebased on listener’s impression, acoustical character-istics, and the singer’s personal observation on voiceproduction. The pressed voice is the basic laryngealvoice in throat singing and used as drone. The kar-gyraa voice is a very low pitched voice that rangesout of the modal register.The production of the high pitched overtone ismainly due to the pipe resonance of the cavity fromthe larynx to the point of articulation in the vo-cal tract [1]. In Tyvan kh¨omei, sygit is a stylewhere singers articulate by touching the tongue tothe palate and kh¨omei is one where they articulateby pursing the lips.We have physiologically observed two different la-ryngeal voices and estimated the patterns of the vo-cal fold and false vocal fold vibrations [6]. We havealso simulated the vibration patterns by a physicalmodeling of the larynx: 2
2-mass model. Basedon the physiological observations and the simulation,we propose a new laryngealvoice model and synthesissystem for throat singing.
2 Physiological observations
2.1 Methods
We observed laryngeal movements in throat singingdirectly and indirectly by simultaneous recording of high-speed digital images, EGG (Electroglottogra-phy) waveforms, and sound waveforms (Fig. 1). Thehigh-speed digital images were captured through afiberscope inserted into the nose cavity of a singerat 4501 frames/s. Sound and EGG waveforms weresampled at 12 b/s and 18 kHz sf [4]. Two singers,who are normal, participated as subjects. One stud-ied kh¨omei in Tyva and the other studied kh¨omij in Mongolia.
Fig.1: High-speed digital image system.
2.2 Results
Common laryngeal movements are observed amongtwo singers for each of the two laryngeal voices.
contact: K.-I. Sakakibara,
, NTT Communication Science Labs, 3-1, Morinosato Wakamiya, Atsugi-shi, 243-0198, Japan
Pressed voice
In pressed-voice production, the following features of the laryngeal movements were observed. (1) Overallconstriction of the supra-structures of the glottis wasobserved, thus it was difficult to directly observe vi-brations of vocal folds (VFs). (2) Vibration of thesupra-structures of the glottis, whose edges are pre-sumably false vocal folds (FVFs), was observed indigital high-speed images. (3) The period of FVFsvibrations was almost equal to the period of the EGGwaveform. (4) The slope of the EGG curve changedin the beginning of the closed phase of the FVFs, theimpedance of the EGG reached the maximal valuewhen the FVFs were open, and reached the minimalvalue when they were closed (Fig. 2). The graph atthe bottom of Fig. 2 depicts the locus of the edge of FVFs. The upper line (the lower line) is the locus of the left (right, respectively) edges of FVFs.
Kargyraa voice
In kargyraa-voice production, the following featuresof the laryngeal movement were observed. (1) Over-all constriction at the supra-structures of the glottiswas observed. (2) The constriction was looser thanthat in the case of the pressed voice. (3) Vibrationof the supra-structures of the glottis, whose edges arepresumably FVFs. (4) The phases of FVF vibrationsare observed to alternate between almost completelyclosed and open. (5) Vibration of the VFs was ob-served during the open period of the FVFs. (6) Thedouble period of vibration of the FVFs were equalto the period of the sound waveform. (7) When theFVFs almost completely closed, the power of soundbecame weaker. (8) In the EGG waveform, two dif-ferent shapes alternated, and the period of the EGGwaveform was equal to that of the sound waveform(Fig. 3).
Fig. 2: Pressed voice(from above, sound, EGG, edges of FVF).Fig. 3: Kargyraa voice(from above, sound, EGG, edges of FVF).
2.3 Discussion
Two common features were observed among themechanisms of the two different laryngeal voice pro-ductions: (1) Overall constriction of the supra-structures of the glottis and (2) vibration of thesupra-structures of the glottis, which presumably areFVFs. These features are not observed in vowel pro-duction in ordinary speech. The differences amongthe two different laryngeal voice productions are (1)narrowness of the constriction and (2) the manner of FVF vibration.The EGG waveforms for the pressed voice andkarygraa voice represent the contact area of thesupra-structures of the glottis as well as that of theVFs. However, taking into account the high-speeddigital images and sound waveforms, the EGG wave-forms can be assumed to mainly represent the contactarea of VFs. Thus, we can conclude that VF vibra-tions and FVF vibrations have the opposite phase inthe pressed-voice case . In the kargyraa voice, theFVFs can be assumed to close once for every two pe-riods of closure of the VFs, and this closing blocksairflow and contributes to the generation of the sub-harmonic tone of kargyraa.In a previous study, the open quotient (OQ) inthroat singing was estimated to be smaller from theacoustical feature [2]. However, for both the pressedand kargyraa voice, our physiological observationsuggests that the OQ is difficult to estimate becauseof the contribution of the supra-structuresof the glot-tis. Therefore the OQ was not estimated.In the synthesis of the throat singing sounds, aspointed out in [1], glottal source modeling is neededfor reproduction of the timber. Our physiological ob-servations suggests that the glottal source model of throat singing should include the FVF vibrations aswell as the VF vibrations [7].
3 Laryngeal voice model of throat singing
In this paper, we define the glottal airflow as the air-flow through glottis to the area between FVFs andthe laryngeal airflow as the airflow through the areabetween FVFs to the pharynx.
Glottal airflow estimation
From recorded sounds, we estimated laryngealairflowusing the inverse filtering technique. In the pressedvoice, the estimated laryngeal airflow curve had asmall notch just after the curve reached a peak, andthe closing of the VFs was apparently not complete
(Fig. 4). In the kargyraa voice, the estimated la-ryngeal airflow curve has two peaks in each period.From our physiological observation, the VFs vibratetwice in each period of the FVF vibration, and theestimated laryngeal airflow curve showed that in oneof the two vibrations of VFs, the closing of VFs werenot completed (Fig. 5).
Fig. 4: Inverse filtered laryngeal airflow of pressedvoices for two singers.
Fig. 5: Inverse filtered laryngeal airflow of kargyraavoices for two singers.
All the power spectra of the estimated glottal air-flows showed an increase of power in the range from1 to 3 kHz, which is where the second formant fre-quency which corresponds the whistle-like overtoneappears in throat singing (Fig. 6–8).
Fig. 6: Inverse filtered airflow spectrum of normal voicefor two singers.Fig. 7: Inverse filtered airflow spectrum of pressed voicefor two singers.Fig. 8: Inverse filtered airflow spectrum of karygraavoice for two singers.
A 2
2-mass model
For a physical simulation of the VF and FVF vi-brations, we propose a 2
2-mass model as a self-oscillating model of VF and FVF vibrations (Fig.9). This model was devised by introducing a two-mass model for the FVFs to the ordinary two-massmodel for the VFs. The mechanical transmission of vibrations between the VFs and FVFs were not con-sidered. The laryngeal ventricle is a cylinder whosesectional area is uniformally 5 cm
and height is 16 cmand not deformed. In the simulation the 2
2-massmodel oscillated stably. The simulation of laryngealmovements using the 2
2-mass model agreed withthe above assumptions for the two laryngeal move-ment patterns of throat singing for both the pressedand kargyraa voices (Fig. 10). The 2
2-mass modelcan simulate ordinary glottal source in the same wayas the two-mass model by setting suitable model pa-rameters [3].
VocalfoldsFalsevocalfoldsLaryngealVentricleVocal tractTrachea
Fig. 9: 2
2-mass model for the VFs and FVFs.
Sound waveformLaryngeal airflow
1000 cc/s
Fig. 10: Laryngeal airflow obtained by using 2
2-massmodel(left: pressed voice, right: kargyraa voice).
Laryngeal voice model
From the physiological observations and estimatedlaryngeal voices, we assume (1) in pressed-voice pro-duction, VFs and FVFs vibrate in almost oppositephase; (2) in karygraa-voice production, two closed
phases of the VFs appeared in one period of a glottalvolume flow waveform, and VFs were incompletelyclosed at one of the two closed phases. Under theseassumptions, we propose a laryngeal voice model forthroat singing and synthesized throat singing sounds.Our proposed laryngeal voice model is obtainedas follows: We generate almost sine-shaped glottalairflow, because the glottal flow of the throat singingmust be symmetric from Fig. 4 (Step 1). The glottalairflow is modulated by the vibration of the FVFs(Step 2). Turbulent noise is added according to theopen width of the FVFs (Step 3). The output is con-voluted with the transfer function of the laryngealventricle (Step 4)[3].
Laryngeal ventricle resonanceglottal airflowAg: glottal areaFalse glottalareaLaryngealairflow
Fig. 11: Block diagram for laryngeal voice model.
4 Synthesis of throat singing
Based on a Klatt synthesizer [5], we propose synthe-sis model for throat singing, which has the proposedlaryngeal voice model as source and time-varying for-mants obtained from recorded throat singing soundsas resonating filters (Fig. 12). Compared with an or-dinary glottal airflow model, some improvements of the timbre were observed.
We observed the laryngeal movements in throatsinging. The VF and FVF vibrations were observed.The FVF vibrations contribute to production of boththe two laryngeal voices of throat singing. We also es-timated the laryngeal voice source and simulated thelaryngeal movements by using a 2
2-mass model.Based on these observations, we proposed a laryn-geal source model and synthesis model for throatsinging. These models can also simulate the normalvoice. Consequently, all the power spectrum of thesimulated glottal airflows showed the increase of thepower on the range less than 3 kHz where the secondformant frequency which corresponds the whistle-likeovertone in throat singing. Our study indicates theglottal source also contributes the whistle-like over-tone production as well as the articulation of thetongue and lips.
Fig. 12: Block diagram of kh¨o¨omei synthesizer.Fig. 13: Synthesized laryngeal airflows, synthesizedsounds by kh¨omei synthesis system, and power spectraof sythesized souds (left: pressed voice, right: kargyraavoice).
We wish to thank Seiji Adachi, Zoya Kyrgys,Koichi Makigami, Naotoshi Osaka, Yoshinao Shiraki,and Masahiko Todoriki for their help and useful dis-cussion.
[1] S. Adachi and M. Yamada. An acoustical study of soundproduction in biphonic singing x¨omij.
 J. Acoust. Soc.Am.
, 105(5):2920–2932, 1999.[2] G. Bloothooft, E. Bringmann, M. van Cappellen, J. B. vanLuipen, and K. P. Thomassen. Acoustics and perceptionof overtone singing.
 J. Acoust. Soc. Am 
, 92(4):1827–1836,1992.[3] H. Imagawa, K.-I. Sakakibara, T. Konishi, E. Z. Murano,and S. Niimi. Throat singing synthesis by a laryngealvoice model based on vocal fold and false vocal fold vi-brations.
 Tech. Rep. IECE 
, SP2000-140:71–78, Feb. Japanese.[4] S. Kiritani, H. Imagawa, and H. Hirose. Vocal cord vibra-tion in the production of consonants-observation by meansof high-speed digital imaging using a fiberscope.
 J. Acoust.Soc. Jpn. (E)
, 17:1–8, 1996.[5] D. H. Klatt. Software for a cascade/parallel formant syn-thesizer.
 J. Acoust. Soc. Am.
, 67(3):971–995, 1980.[6] T. C. Levin and M. E. Edgerton. The throat singers of tuva.
 Scientific America 
, (Sep.1999):80–87, 1999.[7] K.-I. Sakakibara, S. Adachi, T. Konishi, K. Kondo, E. Z.Murano, M. Kumada, M. Todoriki, H. Imagawa, and S. Ni-imi. Observation of vocal fold vibrations in tyvan and mon-golian throat singing.
 Tech. Rep. Musical Acoust., Acoust.Soc. Jpn 
, 19-4:41–48, Sep. 2000. in Japanese.
Ken-ichi Sakakibara
Seiji Niimi



Author: tranquanghai1944

Ethnomusicologist, composer and vietnamese traditional musician

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: