Seiji ADACHI & Masashi YAMADA : An acoustical study of sound production in biphonic singing, Xöömij

A theory that the high melody pitch of biphonic singing, Xöömij, is produced by the pipe resonance of the rear cavity in the vocal tract is proposed. The front cavity resonance is not critical to the production of the melody pitch. This theory is derived from acoustic investigations on several three-dimensional shapes of a Xöömij singer’s vocal tract measured by magnetic resonance imaging. Four different shapes of the vocal tract are examined, with which the melody pitches of F6, G6, A6, and C7 are sung, along with the F3 drone of a specific pressed voice. The second formant frequency calculated from each tract shape is close to the melody pitch within an error of 36 cents. Sounds are synthesized by convolving a glottal source waveform provided by the Rosenberg model with transfer functions calculated from the vocal tract shapes. Two pitches are found to be successfully perceived when the synthesized sounds are listened to. In a frequency range below 2 kHz, their spectra have a strong resemblance to those of the sounds actually sung. The synthesized sounds, however, fail to replicate the harmonic clustering at 4–5 kHz observed in the actual sounds. This is speculated to originate from the glottal source specific to the “pressed” timbre of the drone.
  1. 1. B. Chernov and V. Maslov, “Larynx—Double-sound generator,” Proc. 11th Int. Conf. of Phonetic Science, Tallinn, Estonia, pp. 40–43 (1987). Google Scholar
  2. 2. Q. H. Trân and D. Guillou, “Original research and acoustical analysis in connection with the Xöömij style of biphonic singing,” in Musical Voices of Asia (Heibonsha, Tokyo, 1980), pp. 162–173. Google Scholar
  3. 3. T. Muraoka, K. Wagatsuma, and M. Horiuchi, “Acoustic analysis of the Mongolian singing Xöömij,” Proc. Fall Meet. Acoust. Soc. Jpn., pp. 385–386 (1983) (in Japanese). Google Scholar
  4. 4. S. Adachi, S. Kinoshita, H. Tamagawa, and M. Yamada, “MRI measurement of the vocal-tract shape while singing Xöömij and the synthesis based on the acoustic tube model,” Tech. Rep. Musical Acoustics MA96-10, 9–16 (1996) (in Japanese). Google Scholar
  5. 5. S. Adachi, S. Kinoshita, T. Komoike, H. Tamagawa, and M. Yamada, “Study on sound production in Xöömij—Part 1: MRI measurement of the vocal-tract shape and the synthesis based on the acoustic tube model,” Proc. Spring Meet. Acoust. Soc. Jpn., pp. 645–646 (1996) (in Japanese). Google Scholar
  6. 6. T. Komoike, S. Kinoshita, M. Yamada, S. Adachi, and I. Nakayama, “Study on sound production in Xöömij—Part 2: Perceptual experiment with synthesized sound,” Proc. Spring Meet. Acoust. Soc. Jpn., pp. 647–648 (1996) (in Japanese). Google Scholar
  7. 7. S. Adachi and M. Yamada, “An acoustical study of sound production in biphonic singing, Xöömij,” Proc. 1997 Japan-China Joint Meeting on Musical Acoustics, pp. 21–26 (1997). Google Scholar
  8. 8. B. H. Story, I. R. Titze, and E. A. Hoffman, “Vocal tract area functions from magnetic resonance imaging,” J. Acoust. Soc. Am. 100, 537–554 (1996). Google ScholarScitation, CAS
  9. 9. J. Dang, K. Honda, and H. Suzuki, “Morphological and acoustical analysis of the nasal and the paranasal cavities,” J. Acoust. Soc. Am. 96, 2088–2100 (1994). , Google ScholarScitation, CAS
  10. 10. R. Caussé, J. Kergomard, and X. Lurton, “Input impedance of brass musical instruments—Comparison between experiments and numerical models,” J. Acoust. Soc. Am. 75, 241–254 (1984). , Google ScholarScitation
  11. 11. M. M. Sondhiand J. Schroeter, “A hybrid time-frequency domain articulatory speech synthesizer,” IEEE Trans. Acoust., Speech, Signal Process. ASSP-35, 955–967 (1987). , Google ScholarCrossref
  12. 12. A. E. Rosenberg, “Effect of glottal pulse shape on the quality of natural vowels,” J. Acoust. Soc. Am. 49, 583–590 (1971). , Google ScholarScitation
  13. 13. The synthesized tones can be heard on the World Wide Web at∼adachi/Xoomij/Sound/. , Google Scholar
  14. 14. J. Dangand K. Honda, “Acoustic characteristics of the piriform fossa in models and humans,” J. Acoust. Soc. Am. 101, 456–465 (1997). Google ScholarScitation, CAS
  15. 15. D. G. Childersand C. K. Lee, “Vocal quality factors: Analysis, synthesis, and perception,” J. Acoust. Soc. Am. 90, 2394–2410 (1991). , Google ScholarScitation, CAS
  16. 16. K. Ishizakaand J. L. Flanagan, “Synthesis of voiced sounds from a two-mass model of the vocal cords,” Bell Syst. Tech. J. 51, 1233–1268 (1972). , Google ScholarCrossref
  17. 17. M. Yamada, “Stream segregation in Mongolian traditional singing, Xöömij,” Proc. Int. Sym. Musical Acoustics, Dourdan, pp. 540–545 (1995). , Google Scholar
  18. 18. J. L. Flanagan, Speech Analysis, Synthesis and Perception, 2nd ed. (Springer-Verlag, New York, 1972), Chap. 3, pp. 36–38. Google Scholar
  19. © 1999 Acoustical Society of America.

The Journal of the Acoustical Society of America 105, 2920 (1999);