WIKIPEDIA : Chant diphonique

Chant diphonique

Sauter à la navigation Sauter à la recherche

Le chant diphonique est une technique vocale permettant à une personne de produire un timbre vocal caractérisé par deux notes de fréquences différentes. Il s’agit donc de faire du chant polyphonique (à plusieurs voix) au moyen d’un seul organe vocal combinant d’une part divers types de voix (de poitrine, de tête…) et d’autre part divers positionnements de la langue ou des lèvres. La seconde voix, ou harmonique, est dans un rapport exact de fréquences avec celle de la voix de base, H1, bourdon ou encore fondamental. Elle peut être égale à deux fois la fréquence du bourdon (H2), trois fois (H3), quatre (H4), etc.

Bien que la plupart des techniques d’émissions diphoniques soient fondées sur un agencement ou un usage particulier des cavités bucco-nasales, on distingue aussi le chant de gorge ou encore chant harmonique permettant également de produire plusieurs sons à la fois : un bourdon grave est produit avec la gorge tandis que des harmoniques aiguës sont produites simultanément par amplification et résonance.

Ce type de chant est pratiqué depuis longtemps dans diverses musiques traditionnelles du monde, plus particulièrement en Haute-Asie (chez les Mongols, Touvains, Khakasses, Bachkirs, Altaïens, Mongols du Tibet notamment, voir Khöömei), et de manière plus discrète parmi les Sardes d’Italie, les Rajasthanis d’Inde et les Xhosa d’Afrique du Sud.

Variétés du chant diphonique

En Asie, chez les Touvains, il existe quatre techniques principales avec bourdon du plus grave au plus aigu selon les styles kargyraa, borbannadyr, ezengileer et sygyt :

  • dans le style kargyraa le fondamental a un timbre spécial (cor de chasse) avec une fréquence variant entre 55 Hz (la 0) et 65 Hz (do 1) ; les harmoniques se promènent entre H6, H7, H8, H9, H10 et H12. Chaque harmonique correspond à une voyelle déterminée ;
  • le fondamental dans le style borbannadyr (autour de 110 Hz) reste fixe, et a un timbre plus doux que celui du kargyraa. Le chanteur peut produire deux formants harmoniques au-dessus du fondamental. La parenté technique entre kargyraa et borbannadyr permet au chanteur d’alterner les deux styles dans la même pièce musicale ;
  • le style sygyt possède un fondamental plus aigu (entre 165 Hzmi2 et 220 Hzla2) selon les chanteurs. La mélodie harmonique utilise les harmoniques H9, H10 et H12 (maximum jusqu’à 2 640 Hz) ;
  • le style ezengileer est une variante de sygyt, caractérisé par un rythme dynamique particulier, venant de l’appui périodique des pieds du cavalier sur les étriers.

Les types de chant diphonique des Touvains sont fondés sur les mêmes principes d’émission sonore que ceux de la guimbarde. La mélodie est créée par les harmoniques d’un fondamental, engendrés par le résonateur d’Helmholtz que constitue la cavité buccale humaine dont on modifie les dimensions. Pour la guimbarde, c’est la lame vibrante qui attaque le résonateur. Pour le chant diphonique, ce sont les cordes vocales qui seront ajustées sur des hauteurs différentes, ce qui crée plusieurs fondamentaux, donc plusieurs séries d’harmoniques.

D’autres techniques secondaires ou moins connues ont été « retrouvées », à savoir sigit moyen, kargiraa de steppe ou kargyraa de montagne, stil oidupa (inspiré du kargyraa et appelé d’après le nom du créateur, est considéré comme le premier style urbain).

Chez les Mongols, il existe six techniques différentes de chant diphonique ou khöömei ((хөөмий) : khamryn khöömii (Хамрын хөөмий, khöömii nasal), bagalzuuryn khöömii (Багалзуурын хөөмий, khöömii pharyngé), tseejin khöndii khöömii (цээжин хөндий хөөмий, khöömii de la cavité thoracique), khevliin khöömii (хэвлийн хөндий, khöömii abdominal), khargiraa khöömii (khöömii narratif avec un fondamental très grave) et isgerex (la voix de flûte dentale). Les chanteurs D. Sundui et Tserendavaa sont reconnus.

Chez les Khakasses est utilisé le style xaj. Avant la domination russe, les Khakashs avaient des styles de chant diphonique proches de ceux pratiqués par les Touvains, à savoir sygyrtyp (comme le sygyt), kuveder ou kylenge (comme le ezengileer) et kargirar (comme le kargyraa).

Chez les Altaïens on trouve un style semblable kaj pour accompagner les chants épiques en plus des styles kiomioi, karkira et sibiski (respectivement ezengileer, kargyraa, sygyt).

Chez les Bachkirs il y a le style uzlau proche du ezengileer.

Chez les Tibétains des monastères Gyuto et Gyüme, le chant des tantras et des mantras use du chant de gorge. Leur tradition remonte à un groupe de maîtres indiens, le plus connu étant le yogin Padmasambhava, qui visitèrent le Tibet au VIIIe siècle et, plus récemment, au fondateur de l’un des quatre courants du bouddhisme tibétain, Tsongkhapa (1357-1419) qui aurait introduit le chant diphonique et le style de méditation. Il tenait, dit on, ce type de chant de sa divinité protectrice, Maha Bhairava qui, bien qu’étant une incarnation du Bodhisattva de la compassion (Avalokiteśvara) possédait un esprit terrifiant. Le visage central de Maha Bhairava est celui d’un buffle en colère. Aujourd’hui encore, les maîtres de cette école aiment comparer leur chant au beuglement d’un taureau.

Il existe plusieurs manières de réciter les prières : la récitation dans un registre grave avec vitesse modérée ou rapide sur des textes sacrés, les chants avec trois styles (ta chanté avec des mots clairement prononcés sur une échelle pentatonique ; gur avec un tempo lent utilisé dans les cérémonies principales et au cours des processions ; yang avec une voix extrêmement grave sur des voyelles produisant l’effet harmonique pour communiquer avec les Dieux). Les moines tibétains du monastère Gyüto sortent un bourdon extrêmement grave et un harmonique H10 correspondant à la tierce majeure au-dessus de la 3e octave du bourdon, tandis que les moines du monastère Gyüme produisent un bourdon grave et un harmonique 12 équivalant la quinte au-dessus de la 3e octave du bourdon. On dit que le chant des moines Gyutö correspond à l’élément Feu et celui des moines Gyüme exprime l’élément Eau. Ces moines obtiennent cet effet harmonique en chantant la voyelle O avec la bouche allongée et les lèvres arrondies.

Chez les Inuits on trouve le chant de gorge inuit.

À Formose (Taïwan), les Bunun, une des minorités ethniques, chantent les voyelles dans une voix très tendue et font sortir quelques harmoniques dans un chant à l’occasion de la récolte des millets (pasi but but).

En Inde, un Rajasthanais, enregistré en 1967, est arrivé à utiliser la technique du chant diphonique proche du style sygyt pour imiter la guimbarde et la flûte double satârâ. Cet enregistrement unique représente la seule trace de l’existence du phénomène du chant diphonique au Rajasthan.

En Afrique du Sud, les Xhosa pratiquent le chant diphonique (découvert en 1983), surtout les femmes. Cette technique s’appelle umngqokolo ngomqangi imitant l’arc musical umrhube. Ngomqangi est le nom d’un coléoptère. Selon une chanteuse, cette technique à double voix simultanée est inspirée du bruit du coléoptère placé devant la bouche utilisé comme bourdon en modulant la cavité buccale pour varier les harmoniques produits.

Il faut faire la distinction entre le chant diphonique (chant créant une mélodie d’harmoniques) et le chant à résonance harmonique (chant accompagné par moments par des effets harmoniques).

Dans certains types de chants, l’émission des voyelles est très résonantielle, ce qui permet aux chanteurs de créer un deuxième formant non intentionnel (le chant bouddhique japonais shōmyō, certains chants polyphoniques d’Europe de l’Est), ou intentionnel (le phénomène quintina — la 5e voix virtuelle, résultant de la fusion des 4 voix produites ensemble — des chants sacrés sardes).[Ce passage est incompréhensible.]

En Italie, en Sardaigne, dans la région de la Barbagia, il existe deux styles de chant polyphonique. Le cuncordu est une forme de musique sacrée et emploie des voix normales. En revanche, le a tenore est une musique profane qui a des caractéristiques de chant diphonique. Le canto a tenore est pratiqué par un groupe de quatre chanteurs dont chacun a un rôle distinct.

Aspect acoustique

Plusieurs techniques de chant diphonique existent, telles le kargyraa, qui consiste à faire vibrer certains tissus présents au-dessus des cordes vocales, produisant une note grave — une octave en dessous de la note chantée — évoquant certains chants sacrés tibétains ; la technique de gorge est appelée sygyt, etc

Cette voix se caractérise par l’émission conjointe de deux sons, l’un dit « son fondamental » ou bourdon, qui est tenu à la même hauteur tout le temps d’une expiration, pendant que l’autre, plus aigu, dit « son harmonique » (qui est l’un des harmoniques naturels du son fondamental constituée d’un formant qui se déplace dans le spectre pour donner une certaine mélodie) varie au gré du chanteur. Ce son harmonique a un timbre proche à celui de la flûte (voix flûtée) ou à celui de la guimbarde (voix guimbarde).

La mise en évidence du bourdon est relativement facile, grâce aux sonagrammes. Le chant classique se caractérise par un doublement de l’écartement des raies harmoniques lorsque le chant passe à l’octave. Le chant diphonique présente un écartement égal des raies (ceci est prévisible puisque le bourdon demeure constant) pendant le passage d’une octave où l’on voit le déplacement du formant. En effet, on peut mesurer avec facilité la distance entre les raies pour chaque son émis ; dans ce cas, la perception de la mélodie du chant diphonique se fait par le biais du déplacement du formant dans le spectre sonore. Ceci n’est vraiment possible que si le formant se concentre dans l’aigu. L’énergie sonore est principalement divisée entre le bourdon et la deuxième voix constituée de deux harmoniques, au plus trois harmoniques.

Il a été parfois dit qu’une troisième voix pouvait être produite avec les techniques touvines, mais il est impossible d’affirmer que la troisième voix est contrôlée. On peut établir un parallèle entre chant diphonique et guimbarde. La guimbarde produit comme le chant diphonique plusieurs « voix » différentes : le bourdon, le chant et le contre chant. Cette troisième voix ressemble à un contre chant, peut être délibéré, mais sans doute non contrôlé.

Champ de liberté

Du point de vue du champ de liberté (désignant l’étendue des performances et comprenant le champ des formes musicales en intensité, en hauteur, en timbre), le chant diphonique équivaut au chant normal sauf en ce qui concerne l’ambitus. Le temps d’exécution dépend évidemment de la cage thoracique du chanteur et de sa respiration, mais également de l’intensité sonore en rapport avec le débit d’air.

Le champ de liberté concernant l’intensité est par contre relativement restreint et le niveau des harmoniques est lié au niveau du bourdon. Le chanteur a intérêt à garder un bourdon d’intensité suffisante afin de faire émerger un maximum d’harmoniques. Les harmoniques sont d’autant plus claires que le formant est étroit et intense.

Il est admis que pour une tonalité judicieuse (en fonction de l’exécutant et de la pièce musicale à interpréter), un chanteur peut moduler ou choisir entre les harmoniques 3 et 13. L’ambitus est fonction de la tonalité. Si la tonalité est en do2, la réalisation se fait sur 14 harmoniques du 6e au 20e, ceci représentant une octave et une sixte. Si la tonalité est élevée, par exemple do3, le choix se fait entre les harmoniques 3 et 10 soit 8 harmoniques, représentant également une octave et une sixte.

L’ambitus du chant diphonique est plus restreint que celui du chant normal. Si en théorie le chanteur choisit la tonalité qu’il veut entre do2 et do3, en pratique, il réalise instinctivement un compromis entre la clarté de la deuxième voix et l’ambitus de son chant. Si la tonalité est élevée, par exemple, do3, le choix des harmoniques se trouve restreint, mais la deuxième voix est alors très claire. Dans le cas d’une tonalité en do2, la deuxième voix est plus confuse, alors que l’ambitus atteint son maximum. La clarté des sons peut s’expliquer par le fait que dans le premier cas, le chanteur ne peut sélectionner qu’un harmonique, alors que dans le deuxième cas il peut en sélectionner presque deux. Pour la question de l’ambitus, la mise en action des résonateurs buccaux est indépendante de la tonalité des sons émis par les cordes vocales ; le chanteur sélectionne toujours les harmoniques dans la même zone du spectre que ceux-ci soient écartés ou resserrés.

Le chanteur choisit la tonalité instinctivement pour avoir à la fois l’ambitus maximum et le maximum de clarté, le meilleur compromis se trouvant entre DO2 et le la2 : on peut ainsi produire avec les harmoniques à partir d’un son fondamental entre do2 et la2 une mélodie couvrant jusqu’à deux octaves.

Perception de la hauteur des sons

La hauteur des sons tient plus de la psycho-acoustique que de la physique. Le Sono-graphe permet d’obtenir l’image du son étudié. Les manuels d’acoustiques disent que la hauteur des sons harmoniques, comportant un fondamental de fréquence F et une suite d’harmoniques F1,F2… multiples de F, est donnée par la fréquence du premier son fondamental. Ceci n’est pas tout à fait exact car il est possible de supprimer électroniquement ce fondamental sans pour cela changer la hauteur subjective du son perçu. Si cette théorie était exacte, une chaîne électro-acoustique ne reproduisant pas l’extrême grave changerait la hauteur des sons. Il n’en est rien car le timbre change mais pas la hauteur. Certains chercheurs proposent une autre théorie plus cohérente : la hauteur des sons est donnée par l’écartement des raies harmoniques ou la différence de fréquence entre deux raies harmoniques. Que devient la hauteur des sons dans ce cas pour les spectres sonores dit à « partiels » (les partiels sont les harmoniques qui ne sont pas des multiples entiers du fondamental) ? Dans ce cas, l’individu perçoit une moyenne de l’écartement des raies dans la zone qui l’intéresse.

On désigne par l’expression « spectre à formant » le renforcement en intensité d’un groupe d’harmoniques constituant un formant, c’est-à-dire une zone de fréquences où l’énergie est grande. En rapport avec l’existence de ce formant, une deuxième notion de la perception de hauteur se fait jour. On s’est en effet aperçu que la position du formant dans le spectre sonore donnait la perception d’une nouvelle hauteur. Dans ce cas, il s’agit de la position du formant dans le spectre. Cette théorie doit être nuancée, car des conditions s’imposent.

La disparition du formant ne change pas la hauteur des sons. La perception de la hauteur par la position du formant n’est possible que si celui-ci est très aigu, à savoir que l’énergie du formant n’est répartie que sur deux ou trois harmoniques. Si la densité d’énergie du formant est grande, et que le formant est étroit, celui-ci donnera une information de hauteur en plus de la tonalité globale du morceau chanté, ouvrant la possibilité technique du chant diphonique / diplophonique / biformantique.

Mécanismes de production sonore

Un résonateur est une cavité pouvant résonner dans un domaine de fréquences. Le système excitateur – le pharynx et les cordes vocales – émet un spectre harmonique, et les résonateurs amplifient celui-ci. Un bon chanteur est capable de choisir ces fréquences : lorsqu’un chanteur porte la voix pour se faire entendre dans une grande salle il adapte ses résonateurs (volume de la cavité buccale, de la section de l’ouverture de la bouche et de la position des lèvres) pour émettre le maximum d’énergie.

Pour un chant diphonique, il faut deux voix : le bourdon, la première, provient du fait que celui-ci est intense à l’émission et qu’il ne subit pas le filtrage des résonateurs. Son intensité, supérieure à celle des harmoniques, lui permet de survivre grâce à un rayonnement buccal et nasal. En fermant la cavité nasale, le bourdon diminuait en intensité : d’une part une source de rayonnement est fermée, et d’autre part le débit d’air est réduit de même que l’intensité sonore émise au niveau des cordes vocales.

L’intérêt d’avoir plusieurs cavités est primordial. Seul le couplage entre plusieurs cavités permet d’avoir un formant aigu tel que l’exige le chant diphonique. La tonalité du son monte si la bouche est grande ouverte. Pour mettre en évidence la formation d’un formant aigu, on a essayé de produire deux sortes de chant diphonique : l’un avec la langue au repos, la bouche devenant une grande et unique cavité, et l’autre avec la pointe de la langue remontant et touchant la voûte palatine, divisant ainsi la bouche en deux cavités. Dans le premier cas, les sons ne sont pas clairs. On entend très bien le bourdon mais la deuxième voix est difficile à entendre et la mélodie s’impose difficilement à l’écoute. Avec une cavité buccale unique, l’énergie du formant se disperse sur trois ou quatre harmoniques et la sensation de la deuxième voix devient beaucoup plus faible et l’effet diphonique disparaît. Par contre, quand la langue divise la bouche en deux cavités, le formant aigu et intense réapparaît.

Le chant diphonique nécessite un réseau de résonateurs sélectifs qui filtre uniquement les harmoniques désirés par le chanteur. Dans le cas d’un couplage serré entre les deux cavités, celles-ci donnent une résonance unique très aiguë. Si le couplage devient lâche, le formant a une intensité moins grande, et on étale l’énergie sonore dans le spectre. Si ces cavités se réduisent à une seule cavité, la courbe pointue devient encore plus ronde et on aboutit au premier exemple évoqué, consistant en un chant diphonique très flou (langue en position de « repos »).

Réalisation du chant diphonique

On peut produire les deux sons simultanés grâce à trois méthodes distinctes :

  • avec une cavité buccale : la langue peut être à plat, en position de repos, ou la base de la langue légèrement remontée sans jamais toucher la partie molle du palais. Seules la bouche et les lèvres bougent. Par cette variation de la cavité buccale en prononçant les deux voyelles ü et i liées sans interruption (comme si l’on disait « oui » en français), on perçoit une faible mélodie des harmoniques qui ne dépasse guère l’harmonique 8.
  • avec deux cavités buccales : on chante avec la voix de gorge, on prononce la lettre L, dès que la pointe de la langue touche le centre de la voûte palatine, on maintient cette position, on prononce ensuite la voyelle Ü avec toujours la pointe de la langue collée fermement contre le point de fixation entre le palais dur et le palais mou, on contracte les muscles du cou et ceux de l’abdomen pendant le chant comme si on essayait de soulever un objet très lourd, on donne un timbre très nasalisé en l’amplifiant à travers les fosses nasales, on prononce ensuite les deux voyelles I et Ü (ou bien O et A) liées mais alternées l’une après l’autre en plusieurs fois. Ainsi sont obtenus le bourdon et les harmoniques en pente ascendante et descendante selon le désir du chanteur. On varie la position des lèvres ou celle de la langue pour moduler la mélodie des harmoniques. La forte concentration musculaire augmente la clarté harmonique.
  • avec la base de la langue remontée et mordue par les molaires supérieurs pendant que le son de gorge est produit sur les deux voyelles I et Ü liées et répétées plusieurs fois pour créer une série d’harmoniques descendants et ascendants. Cette série d’harmoniques se situe entre 2 kHz et 3,5 kHz. Cette méthode ne permet pas le contrôle de la mélodie formantique et n’est qu’une démonstration expérimentale sur les possibilités de timbre harmonique.

Dans les années 1980, l’analyse comparée des spectogrammes fibroscopiques, stroboscopiques, laryngoscopiques et ceux du Sona-Graph a permis de classer pour la première fois les différents styles de chant diphonique d’Asie et d’Afrique du Sud en fonction des résonateurs, des contractions musculaires et des ornementations :

  • en mettant en évidence le bourdon harmonique et la mélodie fondamentale, ce qui est le contraire du principe initial du chant diphonique traditionnel ;
  • en croisant les deux mélodies (fondamentale et harmoniques) et en explorant le chant triphonique ;
  • en mettant en évidence les trois zones harmoniques sur la base d’un même son fondamental.

Utilisation thérapeutique du chant diphonique ?

Le Dr Tomatis a développé une théorie selon laquelle il existerait une relation entre harmonie et santé (mentale ou physique). Des musiciens ont voulu faire du chant diphonique un nouvel outil pour des applications thérapeutiques (Trần Quang Hải, Jill Purce, Jonathan Goldman, Dominique Bertrand, Véronique et Denis Fargeot, Philippe Barraqué, Bernard Dubreuil, Emmanuel Comte, Catherine Darbord).

Le pouvoir supposé du chant dépend de la mélodie, des qualités harmoniques de la voix et de la puissance du fondamental. Les principaux objectifs du chant diphonique, quand il est utilisé avec des visées thérapeutiques, est de rétablir la concentration et l’équilibre psychologique (voir thérapie vocale). On retrouve ces objectifs dans certaines pratiques chamaniques ou dans le chant des moines tibétains[réf. souhaitée].

Jill Purce (Royaume-Uni), par exemple, propose un travail fondé sur la respiration et le chant diphonique auprès de personnes qui bégaient, éprouvent des sensations de blocage dans la gorge, sont effrayées par leur propre voix ou qui souffrent d’inhibition, de troubles respiratoires, d’anxiété, de fatigue[réf. nécessaire].

Le chant diphonique a également été utilisé dans le but de diminuer la douleur physique pendant l’accouchement. Mais il n’existe aucune étude confirmant l’efficacité de cette méthode.

Aspect historique

La découverte et l’étude du chant diphonique remonte au XIXe siècle. M. Rollin, professeur au Conservatoire de Paris, au XIXe siècle, a dit qu’à la Cour de Charles le Téméraire, un baladin chantait à deux voix simultanées, la deuxième étant à la quinte de la première. Manuel Garcia junior, dans son Mémoire sur la voix humaine présenté à l’Académie des Sciences le 16 novembre 1840, a signalé le phénomène à double voix chez les paysans russes. Plusieurs voyageurs ont rapporté dans leurs récits de voyages qu’au Tibet se pratiquait le dédoublement de la voix pendant certaines récitations de mantras. Mais cette constatation n’était pas prise au sérieux.

En 1934, des chercheurs russes enregistrèrent des disques 78 tours de chant diphonique chez les Touvains ; étudiés par Aksenov, ils sont l’objet d’un article (publié en 1964 en URSS et traduit en allemand en 1967 et en anglais en 1973) considéré comme le premier sur le chant diphonique d’une grande valeur scientifique. Depuis, de nombreux chercheurs, acousticiens, ethnomusicologues, ont essayé de « dévoiler » les mystères du chant diphonique. On peut en citer quelques-uns : Lajos Vargyas (Hongrie, 1967), Emile Leipp (France, 1971), Gilles Léothaud (France, 1971), Roberte Hamayon et Mireille Helffer (France, 1973), Suzanne Borel-Maisonny (France, 1974), Trần Quang Hải (France, 1974), Richard Walcott (États-Unis, 1974), Sumi Gunji (Japon, 1980), Roberto Laneri (1983), Lauri Harvilahti (Finlande, 1983), Alain Desjacques (France, 1984), Ted Levin (États-Unis, 1988), Carole Pegg (Grande-Bretagne, 1988), Graziano Tisato (Italie, 1988), Hugo Zemp (France, 1989), Mark Van Tongeren (Pays-Bas, 1993).

Des appellations diverses furent proposées par des chercheurs français au cours des trente dernières années : « chant diphonique » (Emile Leipp, Gilles Léothaud en 1971, Tran Quang Hai en 1974), « voix guimbarde » (Roberte Hamayon et Mireille Helffer, 1973), « chant diphonique solo » (Claudie Marcel-Dubois, 1978), « chant diplophonique » (diplo en grec signifiant « deux », la diplophonie, terme d’origine médicale, désigne l’existence simultanée de deux sons de hauteur différente dans le larynx, Tran Quang Hai, 1993) et « chant biformantique » (chant à deux formants, Tran Quang Hai, 1994). Le terme de « chant harmonique » est plus délicat car chaque chant, quel que soit le type de voix, est créé par une série d’harmoniques renforcés différemment et sélectionnés suivant la volonté du chanteur pour créer une mélodie.

Des chanteurs ou compositeurs comme Trần Quang Hải (France, 1975), Demetrio Stratos (Italie, 1977), Roberto Laneri (Italie, 1978), David Hykes et son Harmonic Choir (États-Unis, 1983), Joan La Barbara (États-Unis, 1985), Meredith Monk (États-Unis, 1980), Michael Vetter (Allemagne, 1985), Christian Bollmann (Allemagne, 1985), Michael Reimann (Allemagne, 1986), Noah Pikes (Angleterre, 1985), Tamia (France, 1987), Quatuor Nomad (France, 1989), Valentin Clastrier (France, 1990), Bodjo Pinek (Yougoslavie, 1987), Josephine Truman (Australie, 1987), Iegor Reznikoff (France, 1989), Rollin Rachelle (Pays-Bas, 1990), Thomas Clements (France, 1990), Sarah Hopkins (Australie, 1990), Mauro Bagella (Italie, 1995), Lê Tuân Hùng (Australie, 1996),Véronique et Denis Fargeot (2003, 2008, 2013) ont introduit l’effet du chant diphonique dans les musiques actuelles (world music, new age, etc.) et dans la musique électro-acoustique.

Des musicothérapeutes, tels que Jill Purce (Royaume-Uni), Dominique Bertrand (France), Catherine Darbord (France), Philippe Barraqué (France) ont utilisé la technique du chant diphonique comme moyen thérapeutique reprenant une tradition chamanique, parfois combiné avec la gymnastique holistique dans le but de soigner les gens par les vibrations harmoniques et les mouvements corporels.

Films

Maîtres de chant diphonique (Masters of Mongolian Overtone Singing- Mongol khöömiich), un documentaire de Jean-François Castell, Les Films Du Rocher/La Curieuse, 2010 – Prix Bartok du meilleur film ethnomusicologique au 30e Festival International Jean Rouch 2011 2013 – Prix vague émeraude du Festival 7e art et science, Noirmoutier, 2012 – Prix Coup de pouce du Festival du film de chercheur, Nancy, 2011 – Prix Bartók de la Société française d’ethnomusicologie au 30e Festival Jean Rouch, Bilan du film ethnographique, 2011 – Meilleur documentaire au Festival Aux quatre coins du monde, 2010 – Sélection « Coup de cœur » au Festival Écrans de l’aventure DVD – mars 2012 : « Maitres de chant diphonique », 53 minutes (+ 30 minutes bonus), Version française, anglaise et mongole, Coproduction Les Films du Rocher / La Curieuse

Le Chant des harmoniques, coauteurs : Tran Quang Hai & Hugo Zemp, réalisateur : Hugo Zemp, CNRS Audio-visuel, film 16 mm, 38 minutes, couleur, 1989 – Grand Prix du film scientifique à Parnü (Estonie), 1990, Prix Spécial de Recherche Scientifique, Palaiseau, 1990, Grand Prix du Film scientifique, Montréal, 1991 (réédition en DVD en 2005 – version française et en DVD en 2006 – version anglaise)

Le Chant diphonique, coauteurs : Tran Quang Hai & Luc Souvet, DVD, 28 minutes, CRDP, Saint Denis, Ile de la Réunion, 2004.

Références

Bibliographie

  • (en) Jonathan Goldman, Healing Sounds: The Power of Harmonics.
  • Philippe Barraqué, À la source du chant sacré, éditions Diamantel, 1999.
  • Véronique et Denis Fargeot, La Voix tibétaine – Chants harmoniques sacrés (CD), collection Reliance, 2003.
  • Philippe Barraqué, La Guérison harmonique (techniques de chant diphonique), éditions Jouvence, 2004.
  • Ezzu, Alberto, (2009). Il Canto degli Armonici – Storia e tecniche del canto difonico, éditions Musica Practica, Torino.
  • Catherine Darbord : Chant harmonique, résonance intérieure : Méthode d’apprentissage (CD), éditions Prikosnovenie (2011).
  • Cyprien Bole, 2012, Chanter seul à deux voix, méthode complète de chant diphonique, livre et CD, éditions Les 2 oreilles, p. 1-134 (ISBN 978-2-7466-5068-8).
  • Véronique et Denis Fargeot, Chant harmonique – Voix tibétaine 2 (CD), collection Reliance, 2013.
  • Emmanuel Comte, Le Son d’Harmonie Livre avec CD inclus, éditions Medson 2012 (ISBN 978-2-9810345-2-6).

Liens externes

 

Investigazioni (Diplofonie e Triplofonie) – Demetrio Stratos

Investigazioni (Diplofonie e Triplofonie) – Demetrio Stratos

Ajoutée le 8 oct. 2010

Source: Cantare la voce (1978, Cramps) “Voice in today’s music is a transmission channel that does not transmit anything. The western vocal hypertrophy has rendered almost insensitive the modern singer to the various aspects of the vocality, isolating him in the fencing of determined linguistic’s structures”

Bayarbaatar Davaasuren (Chant diphonique ‘khöömii’ & Vièle cheval ‘morin-khuur’)

Bayarbaatar Davaasuren (Chant diphonique ‘khöömii’ & Vièle cheval ‘morin-khuur’)

Ajoutée le 16 déc. 2011

Musique traditionnelle mongole enregistrée le vendredi 20 janvier 2011 au Toboggan (Décines-69), dans le cadre du 12/14, petite forme musicale en complément du spectacle ‘Contes de la terre du ciel bleue’ par le Groupe Musiques Vivantes de Lyon.

PIERO COSI, GRAZIANO TISATO : ON THE MAGIC OF OVERTONE SINGING

PIERO COSI, GRAZIANO TISATO : ON THE MAGIC OF OVERTONE SINGING
Posted on January 6, 2015 by haidiphonie
Standard

ON THE MAGIC OF
OVERTONE SINGING
Piero Cosi, Graziano Tisato
*ISTC-SFD – (ex IFD) CNR
Istituto di Scienze e Tecnologie della Cognizione – Sezione di Fonetica e Dialettologia
(ex Istituto di Fonetica e Dialettologia) – Consiglio Nazionale delle Ricerche
e-mail: cosi@csrf.pd.cnr.it tisato@tin.it
www: http://nts.csrf.pd.cnr.it/Ifd
I really like to remember that Franco was the first person I met when I approached the “Centro di Studio per le Ricerche di Fonetica” and I still have a greatly pleasant and happy sensation of that our first warm and unexpectedly informal talk. It is quite obvious and it seems rhetorical to say that I will never forget a man like Franco, but it is true, and that is, a part from his quite relevant scientific work, mostly for his great heart and sincere friendship.
1. ABSTRACT
For “special people” scientific interests sometimes co-occur with personal “hobbies”. I remember Franco talking to me about the “magic atmosphere” raised by the voice of Demetrio Stratos, David Hykes or Tuvan Khomei1 singers and I still have clear in my mind Franco’s attitude towards these “strange harmonic sounds”. It was more than a hobby but it was also more than a scientific interest. I have to admit that Franco inspired my “almost hidden”, a part from few very close “desperate” family members, training in Overtone Singing2. This overview about this wonderful musical art, without the aim to be a complete scientific work, would like to be a small descriptive contribute to honor and remember Franco’s wonderful friendship.
2. THE THROAT-SINGING TRADITION
“Khomei” or “Throat-Singing” is the name used in Tuva and Mongolia to describe a large family of singing styles and techniques, in which a single vocalist simultaneously produces two (or more) distinct tones. The lower one is the usual fundamental tone of the voice and sounds as a sustained drone or a Scottish bagpipe sound. The second corresponds to one of the harmonic partials and is like a resonating whistle in a high, or very high, register. For convenience we will call it “diphonic” sound and “diphonia” this kind of phenomenon.
Throat-Singing has almost entirely been an unknown form of art until rumours about Tuva and the peculiar Tuvan musical culture spread in the West, especially in North
1 We transcribe in the simplest way the Tuvan term, for the lack of agreement between the different authors: Khomei, Khöömii, Ho-Mi, Hö-Mi, Chöömej, Chöömij, Xöömij.
2 This is the term used in the musical contest to indicate the diphonic vocal techniques.
America, thanks to Richard Feynman [1]3, a distinguished American physicist, who was an ardent devotee of Tuvan matters.
This singing tradition is mostly practiced in the Central Asia regions including Bashkortostan or Bashkiria (near Ural mountains), Kazakhstan, Uzbekistan, Altai and Tuva (two autonomous republics of the Russian Federation), Khakassia and Mongolia (Fig. 1), but we can find examples worldwide: in South Africa between Xosa women [3], in the Tibetan Buddhist chants and in Rajastan.
The Tuvan people developed numerous different styles. The most important are: Kargyraa (chant with very low fundamentals), Khomei (it is the name generally used to indicate the Throat-Singing and also a particular type of singing), Borbangnadyr (similar to Kargyraa, with higher fundamentals), Ezengileer (recognizable by the quick rhythmical shifts between the diphonic harmonics), Sygyt (like a whistle, with a weak fundamental) [4]. According to Tuvan tradition, all things have a soul or are inhabited by spiritual entities. The legends narrate that Tuvan learnt to sing Khomei to establish a contact and assimilate their power trough the imitation of natural sounds. Tuvan people believe in fact that the sound is the way preferred by the spirits of nature to reveal themselves and to communicate with the other living beings.

TISATO 1 MAP
Figure 1. Diffusion of the Throat-Singing in Central Asia regions.
In Mongolia most Throat-Singing styles take the name from the part of the body where they suppose to feel the vibratory resonance: Xamryn Xöömi (nasal Xöömi), Bagalzuuryn Xöömi (throat Xöömi), Tseedznii Xöömi (chest Xöömi), Kevliin Xöömi (ventral Xöömi, see Fig. 13), Xarkiraa Xöömi (similar to the Tuvan Kargyraa), Isgerex (rarely used style: it sounds like a flute). It happens that the singers itself confuse the different styles [5]. Some very famous Mongol artists (Sundui and Ganbold, for example) use a deep vibrato, which is not traditional, may be to imitate the Western singers (Fig. 13).
The Khakash people practice three types of Throat-Singing (Kargirar, Kuveder or Kilenge and Sigirtip), equivalent to the Tuvan styles Kargyraa, Ezengileer and Sygyt. We
3 Today, partly because of Feynman’s influence, there exists a society called “Friends of Tuva” in California, which circulates news about Tuva in the West [2].
find again the same styles in the peoples of the Altai Mountains with the names of Karkira, Kiomioi and Sibiski. The Bashkiria musical tradition uses the Throat-Singing (called Uzlau, similar to the Tuvan Ezengileer) to accompany the epic chants. In Uzbekistan, Kazakhstan and Karakalpakstan we find forms of oral poetry with diphonic harmonics [6].
The Tibetan Gyuto monks have also a tradition of diphonic chant, related to the religious believes of the vibratory reality of the universe. They chant in a very low register in a way that resembles (see later the difference) the Tuvan Kargyraa method. The aim of this tradition is mystical and consists in isolating the 5th or the 10th harmonic partial of the vocal sound. They produce in this way the intervals of 3rd or 5th (in relation to the fundamental) that have a symbolic relation with the fire and water elements (Fig. 14) [4].

TISATO 2
Figure 2. Spectral section of a vocal (up) and a diphonic vocal (down).
3. SEPARATION OF THE AUDITORY IMAGE IN THROAT-SINGING
What is so wonderful in Throat-Singing? It is the appearance of one of the harmonic partials that discloses the secret musical nature of each sound. When in Throat-Singing the voice splits in two different sounds, we experience the unusual sensation of a pure, discarnate, sine wave emerging from the sound. It is the same astonishment we feel when we see a rainbow, emerging from the white light, or a laser beam for the first time.
The natural sounds have a complex structure of harmonic or inharmonic sinusoidal partials, called “overtones” (Fig. 2). These overtones are not heard as distinct sounds, but their relative intensity defines our perception of all the parameters of sound (intensity, pitch, timbre, duration). The pitch corresponds to the common frequency distance between
the partials and the timbre takes into account all the partials as a whole. The temporal evolution of these components is what makes the sound of each voice or instrument unique and identifiable.
In the harmonic sounds, as the voice, the components are at the same frequency distance: their frequency is a multiple of the fundamental tone (Fig. 2). If the fundamental frequency is 100 Hz, the 2nd harmonic frequency is 200 Hz; the 3rd harmonic frequency is 300 Hz, and so on. The harmonic partials of a sound form a natural musical scale of unequal temperament, as whose in use during the Renaissance [7]. If we only take into consideration the harmonics that are easy to produce (and to perceive also), i.e. from the 5th to the 13th, and if we assume for convenience a C3 131 Hz as starting pitch, we can get the following musical notes:
Harm. N. Freq. (Hz) Note Interval with C3
5 655 E5 3rd
6 786 G5 5th
7 917 A+ 6th +
8 1048 C6 Octave
9 1179 D6 2nd
10 1310 E6 3rd
11 1441 F6+ 4th +
12 1572 G6 5th
13 1703 A6- 6th-
The series of 8th, 9th, 10th, 12th, 13th harmonic and the series from 6th to 10th are two possible pentatonic scales to play. Note that the frequency differences between these scales and the tempered scale are on the order of 1/8th of a tone (about 1.5%).
The Throat-Singing allows extracting the notes of a natural melody from the body of the sound itself.
The spectral envelope of the overtones is essential for the language comprehension. The glottal sound is filtered by the action of the vocal tract articulation, shaping the partials in the voice with some characteristic zones of resonance (called formants), where the components are intensified, and zones of anti-resonance, where the partials are attenuated (Fig. 2-3). So, the overtones allow us to tell apart the different vocal sounds. For example the sounds /a/, /e/, /i/, /o/, etc. uttered or sung at the same pitch, nevertheless sound different to our ears for the different energy distribution of the formants (Fig. 2).
The auditory mechanisms “fuse” the partials in one single “image”, which we identify as voice, musical instrument, noise, etc. [8]. In the same way, the processing of visual data tends to group different dots into simple shapes (circle, triangle, square, etc.). The creation of auditory images is functional to single out and to give a meaning to the sonic sources around us.
The hearing mechanisms organize the stream of perceptive data belonging to different components of different sounds, according to psychoacoustics and Gestalt principles. The “grouping by harmonicity”, for example, allows the fusion in the same sound of the frequency partials, which are multiples of a common fundamental. The “common fate” principle tells that we integrate the components of a complex sound, which show the same amplitude and frequency behaviour (i.e. similar modulation and microvariation, similar attack and decay, similar vibrato, etc.) [8]. If one of these partials reveals a particular evolution (i.e. it is mistuned or has not the same frequency and amplitude modulation, etc.),
it will be heard as a separate sound. So the Throat-Singing is a marvelous example to understand the illusory nature of perception and the musical structure of the sound.

TISATO 3 FIG 3
Figure 3. Resonance envelope for an uniform vocal tract (left). A constriction on the pharynx moves the formants so that the intensity of partials in the 2500-3500 Hz region increases (right).
4. FUNDAMENTAL TECHNIQUES IN THROAT-SINGING
In the Throat-Singing the singer learn to articulate the vocal tract so that one of the formants (usually the first or the second) coincide with the desired harmonic, giving it a considerable amplitude increase (even more than 30 dB, see in Fig. 2 the 10th harmonic) and making it perceptible. Unlike the normal speech, the diphonic harmonic can exceed a lot the lower partials intensity (Fig. 2). Soprano singers use similar skill to control the position of the 1st formant, tuning it to the fundamental with the proper articulation (i.e. proper opening of the mouth), when they want to sing a high note [9].
There are many different methods to produce the diphonic sound [5-6], but we can summarize them in two possible categories, called “single cavity method” or “two cavities method”, that are characterized by the use or not of the tongue, according to the proposal of Tran Quang Hai [4].
4.1 SINGLE CAVITY METHOD
In this method, the tongue doesn’t move and remains flat or slightly curved without touching the palate. In this case the vocal tract is like a continuous tube (Fig. 3). The selection of the diphonic harmonic is obtained by the appropriate opening of the mouth and the lips. The result is that the formants frequency raises if the vocal tract lengthens (for example with a /i/) and that the formants frequency lowers, if it extends (for example with a /u/). With this technique the 1st formant movement allows the selection of the partials. As we can see in Fig. 4, we cannot go beyond 1200 Hz. The diphonic harmonic is generally feeble, masked by the fundamental and the lower partials, so the singers nasalize the sound to reduce their intensity [10-11].

TISATO 4 FIG 4
Figure 4. Opening the mouth controls the 1st formant position. The movement of the tongue affects the 2nd formant and allows the harmonic selection in a large frequency range.
4.2 TWO CAVITIES METHODS
In this method, the tongue is raised so to divide the vocal tract in two main resonators, each one tuned on a particular resonance. By an appropriate control, we can obtain to tune two separate harmonics, and thereby to make perceptible, not one but two (or more) pitches at the same time (Fig. 9-12).
There are three possible variants of this technique:
The first corresponds to the Khomei style: to select the desired harmonic the tip of the tongue and the tongue body moves forward (higher pitch) and backward (lower pitch) along the palate.
The second is characteristic of the Sygyt style: the tip of the tongue remains fixed behind the upper teeth while the tongue body rises to select the harmonics.
In the third variant, the movement of the tongue root selects the diphonic harmonic. Shifting the base of the tongue near the posterior wall of the throat, we obtain the lower harmonics. On the contrary, moving the base of the tongue forward, we pull out the higher harmonics [6].
A different method has been proposed by Tran Quang Hai to produce very high diphonic harmonics (but not to control the selection of the desired component). It consists
to keep the tongue pressed by the molars, while singing the vowels /u/ and /i/, and maintaining a strong contraction of the muscles at the abdomen and the throat [4].
The advantage of the two cavities techniques is that we can use the 2nd formant to reinforce the harmonics that are in the zone of best audibility. In this case the diphonic harmonic reaches the 2600 Hz (Fig. 4). Furthermore the movement of the tongue affects the formants displacement in opposite directions. The separation of the 1st and the 2nd formant produces in between a strong anti-resonance (Fig. 2), which helps the perception of the diphonic harmonic.
In all these methods it is useful a slight discrete movement of the lips to adjust the formants position.
5. REINFORCING THE DIPHONIC SOUND
There are three main mechanisms required to reinforce the effect of segregation of the diphonic sound:
• The appropriate movement of the lips, tongue, jaw, soft palate, throat, to produce a fluctuation in the amplitude of the selected harmonic, so that it differentiates from the other partials that remain static. The auditory mechanisms are tuned to capture the more subtle changes in the stream of auditory information, useful to discriminate the different sounds [8].
• The nasalization of the sound. In this way we create an anti-resonance at low frequency (<400 Hz) that attenuates the lower partials responsible for the masking of the higher components [10-11]. The nasalization provokes also the attenuation of the third formant [12], which improves the perception of the diphonic harmonic (Fig. 2).
• The constriction of the pharynx region (false ventricular folds, arytenoids, root of the epiglottis), which increases the amplitude of the overtones in the 2000-4000 Hz region (Fig. 2). This is also what happens in the “singer’s formant”, the technique used by the singers to reinforce the partials in the zone of best audibility and to avoid the masking of the voice by the orchestra, generally very strong in the low frequency range [9]. For this reason the Throat-Singing technique requires a tuning extremely precise and selective, in order to avoid the amplification of a group of harmonic partials, as in the “singer’s formant”.
6. VOICE MULTIPHONICS
We disregard in this paper the polyphonic singing that could produces some diphonic effects: for example the phenomenon of the quintina in the Sardinia religious singing, where the coincidence of the harmonics of 4 real voices produces the perception of a 5th virtual voice (Fig. 5) [13].
There are in the literature many terms to indicate the presence of different perceptible sounds in a single voice: Khomei, Throat-Singing, Overtone Singing, Diphonic Singing, Biphonic Singing, Overtoning, Harmonic Singing, Formantic Singing, Chant, Harmonic Chant, Multiphonic Singing, bitonality, diplophonia, vocal fry, etc.
According to the pioneer work in the domain of the vocal sounds made by The Extended Vocal Techniques Ensemble (EVTE) of San Diego University and bearing in mind that there is little agreement regarding classifications [4], [14-15], the best distinctive criterion for the diphonia seems to be the characterization of the sound sources that produce the perception of the diphonic or multiphonic sound [16].
Following this principle, we can distinguish between Bitonality and Diphonia:
• Bitonality: In this case there are two distinct sound sources that produce two sounds. The pitches of the two sounds could be or not in harmonic relationship. This category includes: diplophonia, bitonality and vocal fry.
• Diphonia: The reinforcement of one (or more) harmonic partial(s) produces the splitting of the voice in two (or more) sounds. This category includes: Khomei, Throat-Singing, Overtone Singing, Diphonic Singing, Biphonic Singing, Overtoning, Harmonic Singing, Chant, Harmonic Chant.

TISATO 5 FIG 5 - Copie
Fig. 5 Sardinia religious folk singing. The pitches of the 4 voices of the choir are F1 88 Hz, C2 131 Hz, F2 176 Hz, A3# 230 Hz. The 8th harmonic of the F1, the 6th of the C2, the 4th of the F2 and the 3rd of the A# coincide at 700 Hz and produce the perception of a 5th voice.
6.1 BITONALITY
Diplophonia: The vibration of the vocal folds is asymmetrical. It happens that after a normal oscillatory period, the vibration amplitude that follows is reduced. There is not the splitting of the voice in two sounds, but the pitch goes down one octave lower and the timbre assumes a typical roughness. For example, assuming as fundamental pitch a C3 130.8 Hz, the resulting pitch will be C2 65.4 Hz. If the amplitude reduction happens after two regular vibrations, the actual periodicity triplicates and then the pitch lowers one octave and a 5th. The diplophonic voice is a frequent pathology of the larynx (as in unilateral vocal cord paralysis), but can be also obtained willingly for artistic effects (Demetrio Stratos was an expert of this technique) [16-18].
Bitonality: The two sound sources are due to the vibration of two different parts of the glottis cleft. This technique requires a strong laryngeal tension [16-17]. In this case there is not necessarily a harmonic relationship between the fundamentals of the two sounds. In the Tuvan Kargyraa style, the second sound source is due to the vibration of the supraglottal structures (false folds, arytenoids, aryepiglottic folds that connects the arytenoids and the epiglottis, and the epiglottis root). In this case generally (but not always) there is a 2:1 frequency ratio between the supraglottal closure and vocal folds closure. As in the case of Diplophonia, the pitch goes down one octave lower (or more) [19-21].
Vocal fry: The second sound is due in this case to the periodic repetition of a glottal pulsation of different frequency [14]. It sounds like the opening of a creaky door (another common designation is “creaky voice”). The pulse rate of vocal fry can be controlled to produce a range from very slow single clicks to a stream of clicks so rapid to be perceived as a discrete pitch. Therefore vocal fry is a special case of bitonality: the perception of a second sound depends on a pulses train rate and not on the spectral composition of the single sound.
6.2 DIPHONIA
Diphonic and Biphonic refer to any singing that sounds like two (or more) simultaneous pitches, regardless of technique. Use of these terms is largely limited to academic sources. In the scientific literature the preferred term to indicated Throat-Singing is Diphonic Singing.
Multiphonic Singing indicates a complex cluster of non-harmonically related pitches that sounds like the vocal fry or the creaky voice [14]. The cluster may be produced expiring as normal, or also inhaling the airflow.
Throat Singing is any technique that includes the manipulation of the throat to produce a melody with the harmonics. Generally, this involves applying tension to the region surrounding the vocal cords and the manipulation of the various cavities of the throat, including the ventricular folds, the arytenoids, and the pharynx.
Chant generally refers to religious singing in different traditions (Gregorian, Buddhist, Hindu chant, etc.). As regards the diphonia, it is noteworthy to mention the low singing practiced by Tibetan Buddhist monks of the Gyuto sect. As explained before, they reinforce the 5th or the 10th harmonic partial of the vocal sound for mystical and symbolic purposes (Fig. 14). This kind of real diphonia must be distinguished from resonantial effects (enhancement of some uncontrolled overtones) that we can hear in Japanese Shomyo Chant [4] and also in Gregorian Chant.
Harmonic Singing is the term introduced by David Hykes to refer to any technique that reinforces a single harmonic or harmonic cluster. The sound may or may not split into two or more notes. It is used as a synonym of Overtone Singing, Overtoning, Harmonic Chant and also Throat-Singing.
Overtone Singing can be considered to be harmonic singing with an intentional emphasis on the harmonic melody of overtones. This is the name used by Western artists that utilizes vowels, mouth shaping, and upper-throat manipulations to produce melodies and textures. It is used as a synonym of Harmonic Singing, Overtoning, Harmonic Chant and also Throat-Singing.

TISATO 6 FIG 6
Fig. 6 Tuvan Khomei Style. The fundamental is a weak F#3+ 189 Hz. The diphonic harmonics are the 6th (C#6+ 1134 HZ), 7th (E6 1323 Hz), 8th (F#6+ 1512 Hz), 9th (G#6+ 1701 Hz), 10th (A#6+ 1890 Hz) and 12th (C#7+ 2268 Hz).
7. KHOMEI STYLES
Although there is no widespread agreement, Khomei comprises three major basic Throat-Singing methods called Khomei, Kargyraa, and Sygyt, two main sub methods called Borbangnadyr and Ezengileer and various other sub styles.
Khomei means “throat” or “pharynx” and it is not only the generic name given to all throat-singing styles for Central Asia, as underline above, but also a particular style of singing. Khomei is the easiest technique to learn and the most practiced in the West. It produces clear and mild harmonics with a fundamental usually within the medium range of the singer’s voice (Fig. 6). In Khomei style there are two (or more) notes clearly audible. Technically the stomach remains relaxed and there is a low-level tension on larynx and ventricular folds, whereas Sygyt style requires a very strong constraint of these organs (Fig. 7). The tongue remains seated flatly between the lower teeth as in the Single Cavity technique, or raises and moves as in the Two Cavities techniques. The selection of the desired harmonic comes mainly from a combination of different lips, tongue and throat movements.
Sygyt means “whistle” and actually sounds like a flute. This style is characterized by a strong, even piercing, harmonic and can be used to perform complex and very distinct melodies (Fig. 10). It has its roots in the Khomei method and has the same range for the fundamental. Sygyt is sung with a half-open mouth and the tip of tongue placed behind front teeth as if pronouncing the letter “L”. The tongue tip is kept in the described position, while the tongue body moves to select the harmonic. This is the same technique described above for the Khomei method. The difference is in the timbre quality of the sound lacking of energy in the low frequencies. To produce a crystal-clear, flute-like overtone,
characteristic of the Sygyt style, it is necessary to learn how to filter out the lower harmonic components, that usually mask the overtone sensation.

TISATO 7 FIG 7
Figure 7. Position of the arytenoids in Khomei (left) and Sygyt style [21].
Crucial for achieving this goal is a considerable pressure from the belly/diaphragm, acting as a bellows to force the air through the throat. Significant tension is required in the throat as well, to bring the arytenoids near the root of the epiglottis (Fig. 7). In this way, we obtain the displacement of first 3 formants in the high frequency zone (Fig. 3). The result is that the fundamental and the lower harmonics are so attenuated to be little audible (Fig. 10).
It is possible to sing Sygyt either directly through the center of the mouth, or, tilting the tongue, to one side or the other. Many of the best Sygyt singers “sing to the side”: directing the sound along the hard surfaces of the teeth enhances the bright, focused quality of the sound.
Kargyraa style produces an extremely low sound that resembles the roaring of a lion, the howling of a wolf, and the croaking of a frog and all these mixed together (Fig. 9). Kargyraa means “hoarse voice”. As hawking and clearing the throat before speaking Kargyraa is nothing else than a deep and continuous hawking. This hawking must rise from the deepest part of the windpipe; consequently low tones will start resonating in the chest. Overtones are amplified by varying the shape of the mouth cavity and the position of the tongue. Kargyraa is closely linked to vowel sounds: the selection of diphonic harmonic corresponds to the articulation of a particular vowel (/u/, /o/, //, /a/, etc.), which the singer learnt to associate with the desired note.
This technique is a mixture of Diphonia and Bitonality (see 6.1): in fact the supraglottal structures start to vibrate with the vocal folds, but at a half rate. The arytenoids also can vibrate touching the root of the epiglottis, hiding the vocal folds and forming a second “glottic” source [21]. The perceived pitch will be one octave lower than normal (Fig. 9), but also one octave and a 5th lower [20]. In the case of Tran Quang Hai voice, the fibroendoscopy reveals the vibration and the strong constriction of the arytenoids that hide completely the vocal folds (Fig. 8).
We must distinguish this technique from the Tibetan Buddhist chant, which is produced with the vocal folds relaxed as possible, and without any supraglottal vibration. The Tibetan chant is more like the Tuvan Borbangnadyr style with low fundamentals.

TISATO 8 FIG 8
Figure 8. Simulation of the Kargyraa style by Tran Quaang Hai: the arytenoids move against the root of the epiglottis and hide the vocal folds [21].
Borbangnadyr is not really a style, as are Khomei, Sygyt and Kargyraa, but rather a combination of effects applied to one of the other styles. The name comes from the Tuvan word for “rolling”, because this style features highly acrobatic trills and warbles, reminiscent of birds, babbling brooks, etc. While the name Borbangnadyr is currently most often used to describe a warbling applied to Sygyt, it is also applied to some lower-pitched singing styles, especially in older texts. The Borbangnadyr style with low fundamentals sounds like the Tibetan Buddhist chant.
Rather the pitch movement of the melody, Borbangnadyr generally focuses the attention on three different harmonics, the 8th, 9th, and 10th, which periodically take their turn in prominence (Fig. 11). In this style the singer easily can create a triphonia effect between the fundamental, a second sound corresponding to the 3rd harmonic at an interval of 5th, and the tremolo effect on the higher harmonics.
Ezengileer comes from a word meaning “stirrup” and features rhythmic harmonic oscillations intended to mimic the sound of metal stirrups, clinking to the beat of a galloping horse (Fig. 12). Ezengileer is a variant of Sygyt style and differs considerably from singer to singer, the common element being the “horse-rhythm” of the harmonics.
8. OVERTONE SINGING IN THE WEST
In the West the Overtone Singing technique has unexpectedly become very popular, starting into musical contests and turning very soon to mystical, spiritual and also therapeutic applications. The first to make use of a diphonic vocal technique in music was Karlheinz Stockhausen in Stimmung [22]. He was followed by numerous artists and amongst them: the EVTE (Extended Vocal Techniques Ensemble) group at the San Diego University in 1972, Laneri and his Prima Materia group in 1973, Tran Quang Hai in 1975, Demetrio Stratos in 1977 [17-18], Meredith Monk in 1980, David Hykes and his Harmonic Choir in 1983 [23], Joan La Barbara in 1985, Michael Vetter in 1985, Christian Bollmann in 1985, Noah Pikes in 1985, Michael Reimann in 1986, Tamia in 1987, Bodjo Pinek in 1987, Josephine Truman in 1987, Quatuor Nomad in 1989, Iegor Reznikoff in 1989, Valentin Clastrier in 1990, Rollin Rachele in 1990 [24], Thomas Clements in 1990, Sarah Hopkins in 1990, Les Voix Diphoniques in 1997.

TISATO 9 FIG 9
Figure 9. Vasili Chazir sings “Artii-sayir” in the Kargyraa Tuvan style. The fundamental pitch is B1 61.2 Hz. The diphonic harmonics are the 6th (F#4- 367 HZ), 8th (B4 490 Hz), 9th (C#5 550 Hz), 10th (D#5- 612 Hz) and 12th (F#5- 734 Hz). The diphonic (but not perceptible) harmonics 12th-24th are in octave with the previous one. In the 2600-2700 Hz region, a steady formant amplifies the 43rd and 44th harmonics.

TISATO 10 FIG 10
Figure 10. Tuvan Sygyt style. The fundamental is a weak E3+ 167 Hz. The melody uses the 8th (E6+ 1336 Hz), 9th (F#6+ 1503 Hz), 10th (G#6+ 1670 Hz) and 12th (B6+ 2004 Hz). There is a rhythmic shift between contiguous harmonics each 900 ms. In the 3000-3200 Hz zone, we can see a second resonance region.

TISATO 11 FIG 11.jpg
Figure 11. Tuvan Borbangnadyr style. The fundamental is a weak F#2 92 Hz. We can see on the harmonics 7-11 the effect of a periodic formantic shift (6 Hz about).

TISATO 12 FIG 12
Figure 12. Tuvan Ezengileer style. The fundamental is A#2 117 Hz.
The most famous proponent of this type of singing is David Hykes. Hykes experimented with numerous innovations including changing the fundamental (moveable drone) and keeping fixed the diphonic formant, introducing text, glissando effects, etc., in numerous works produced with the Harmonic Choir of New York (Fig. 15) [23].
9. ACOUSTIC ANALYSIS
In the recent past, some work has been done on the analysis of Khomei, and more has been done on Overtone Singing generally. The focus on this research has been on the effort to discover exactly how overtone melodies are produced. Hypotheses as to the mechanics of Overtone Singing range from ideas as to the necessary physical stance and posture used by the singer during a performance, to the actual physical formation of the mouth cavity in producing the overtones.
Aksenov was the first to explain the diphonia as the result of the filtering action of the vocal tract [25-27]. Some years later Smith et al. engaged in an acoustical analysis of the Tibetan Chant [28]. In 1971, Leipp published an interesting report on Khomei [29]. Tran Quang Hai carried out a deep research on all the diphonic techniques [4-5][30]. The mechanism of the diphonia was demonstrated in 1989 by two different methodologies. The first applied direct clinical-instrumental methods to study the vocal tract and vocal cords [31-32]. The optic stroboscope revealed the perfect regularity of the vocal folds vibration. The second method made use of a simple linear prediction model (LPC) to analyse and synthesize the diphonic sound [33-34]. The good quality of the resynthesis demonstrated that the diphonia is due exclusively to the spectral resonance envelope. The only difference between normal and diphonic sound consists in the unusual narrow bandwidth of the prominent formant.
Several researchers seem to agree that the production of the harmonics in Throat-Singing is essentially the same as the production of an ordinary vowel. Bloothooft reports an entire investigation of Overtone Singing, based on the similarity of this kind of phonation to the articulation of vowel [10].
Other authors, on the contrary, argue that the physical act of creating overtones may originate in vowel production, but the end product, the actual overtones themselves, are far from vowel-like [35]. They stated, in fact, that for both acoustic and perceptual reasons, the production of an overtone melody cannot be described as vowel production.
Acoustically, a vowel is distinctive because of its formant structure. In Overtone Singing, the diphonic formant is reduced to one or a few harmonics, often with surrounding harmonics attenuated as much as possible. Perceptually, Overtone Singing usually sounds nothing like an identifiable vowel. This is primarily because, a major part of the overtone-sung tone has switched from contributing to the timbre of the tone to provoking the sensation of melody and such a distorted “vowel” can convey little phonetic information.
10. CONCLUDING REMARKS
All musical sounds contain overtones or tones that resonate in fixed relationships above a fundamental frequency. These overtones create tone color, and help us to differentiate the sounds of different music instruments or one voice and another.
Different cultures have unique manifestations of musical traditions, but, what it is quite interesting, is that some of them share at least one aspect in common: the production of overtones in their respective vocal music styles. Among these, each tradition has also its own meanings and resultants from Overtone Singing, but they are often related to a common sphere of spirituality. Overtones in Tibetan and Gregorian Chant, for example, are linked with spirituality, and even health and well being. Overtones in Tuvan Khomei have at least three different meanings: shamanistic, animistic, and aesthetic.

TISATO 13 FIG 13
Figure 13. Mongolia: Ganbold sings a Kevliin Xöömi (ventral Xöömi, similar to Tuvan Sygyt.). The pitch is G3# 208 Hz. The diphonic harmonics are 6th (D#6 1248 Hz), 7th (F#6- 1456 Hz), 8th (G#6 1664 Hz), 9th (A#6+ 1872 Hz), 10th (C7- 2080 Hz), 12th (D#7 2496 Hz). There is a 6 Hz strong vibrato.

TISATO 14 FIG 14
Figure 14. Tibetan Gyuto Chant in the Yang style. The pitch is a weak A1 56 Hz. In the beginning, the singer chant a vowel /o/ that reinforces the 5th partial (and the 10th). In the choir part, the articulation of the prayers produces a periodic emerging of all the scale of the harmonics up to the 30th. There is also a fixed resonance at 2200 Hz.

TISATO 15 FIG 15
Figure 15. David Hykes and the Harmonic Choir. In this 100 s passage from “Hearing the Solar Winds” [23], the pitch moves slowly from A3, A#3, B3, C4, A3, to the final G3. The diphonic harmonics change in the range 6th-12th.
11. ACKNOWLEDGMENTS
We would like to thank Sami Jansson [36] and Steve Sklar [15] for the useful information they made available to us via their respective web sites.
REFERENCES
[1] Feynman (http://www.feynmanonline.com/), website.
[2] Friends of Tuva (http://www.fotuva.org/), website.
[3] Dargie D., “Some Recent Discoveries and Recordings in Xhosa Music”, 5th Symposium on Ethnomusicology, University of Cape Town, International Library of African Music (ed) , Grahamtown, 1985, pp. 29-35.
[4] Tran Quang Hai, Musique Touva, 2000, (http://www.baotram.ovh.org/tuva.html), website.
[5] Tran Quang Hai, Zemp H.,“Recherches expérimentales sur le Chant Diphonique”, Cahiers de Musiques Traditionnelles, Vol. 4, Genève, 1991, pp. 27-68.
[6] Levin Th., Edgerton M., The Throat Singers of Tuva, 1999,
(http://www.sciam.com/1999/0999issue/0999levin.html), website
[7] Walcott R., “The Chöömij of Mongolia – A spectral analysis of Overtone Singing”, Selected Reports in Ethnomusicology, UCLA, Los Angeles, 1974, 2 (1), pp. 55-59.
[8] Bregman A., Auditory scene analysis: the perceptual organization of sound, MIT Press, Cambridge, 1990.
[9] Sundberg J., The science of the singing voice, Northern Illinois University Press, De Kalb, Illinois, 1987.
[10] Bloothooft G., Bringmann E., van Capellen M., van Luipen J.B., Thomassen K.P., “Acoustic and Perception of Overtone Singing”. In Journal of the Acoustical Society of America, JASA Vol. 92, No. 4, Part 1, 1992, pp. 1827-1836.
[11] Stevens K., Acoustic Phonetics, MIT Press, Cambridge, 1998.
[12] Fant G., Acoustic theory of speech production, Mouton, The Hague, 1960.
[13] Lortat-Jacob B., “En accord. Polyphonies de Sardaigne: 4 voix qui n’en font qu’une”, Cahiers de Musiques Traditionnelles, Genève, 1993, Vol. 6, pp. 69-86.
[14] Kavasch D., “An introduction to extended vocal techniques”, Report of CME, Univ. of California, San Diego, Vol. 1, n. 2, 1980, pp. 1-20.
[15] Sklar S., Khöömei Overtone Singing, (http://www.atech.org/khoomei), website.
[16] Ferrero F., Ricci Maccarini A., Tisato G., “I suoni multifonici nella voce umana”, Prooceedings of XIX Convegno AIA, Napoli, 1991, pp. 415-422.
[17] Ferrero F., Croatto L., Accordi M., “Descrizione elettroacustica di alcuni tipi di vocalizzo di Demetrio Stratos”, Rivista Italiana di Acustica, Vol. IV, n. 3, 1980, pp. 229-258.
[18] Stratos D., Cantare la voce, Cramps Records CRSCD 119, 1978.
[19] Dmitriev L., Chernov B., Maslow V., “Functioning of the voice mechanism in double voice Touvinian singing”, Folia Phoniatrica, Vol. 35, 1983, pp. 193-197.
[20] Fuks L., Hammarberg B., Sundberg J., “A self-sustained vocal-ventricular phonation mode: acoustical, aerodynamic and glottographic evidences”, KTH TMH-QPSR, n.3, Stockholm, 1998, pp. 49-59.
[21] Tisato G., Ricci Maccarini A., Tran Quang Hai, “Caratteristiche fisiologiche e acustiche del Canto Difonico”, Proceedings of II Convegno Internazionale di Foniatria, Ravenna, 2001, (to be printed).
[22] Stockhausen K., Stimmung, Hyperion A66115, 1968.
[23] Hykes D., David Hykes and the Harmonic Choir, (http://harmonicworld.com), website.
[24] Rachele R., “Overtone Singing Study Guide”, Cryptic Voices Productions (ed), Amsterdam, 1996, pp. 1-127.
[25] Aksenov A.N., Tuvinskaja narodnaja muzyka, Mosca, 1964.
[26] Aksenov A.N., “Die stile der Tuvinischen zweistimmigen sologesanges”, Sowjetische Volkslied und Volksmusikforschung, Berlin, 1967, pp. 293-308.
[27] Aksenov A.N., “Tuvin folk music”, Journal of the Society for Asian Music, Vol. 4, n. 2, New York, 1973, pp. 7-18.
[28] Smith H., Stevens K.N., Tomlinson R.S., “On an unusual mode of singing of certain Tibetan Lamas”, Journal of Acoustical Society of America, JASA. 41 (5) , USA, 1967, pp. 1262-4.
[29] Leipp M., “Le problème acoustique du Chant Diphonique”, Bulletin Groupe d’Acoustique Musicale, Univ. de Paris VI, n. 58, 1971, pp. 1-10.
[30] Tran Quang Hai, “Réalisation du chant diphonique”, Le Chant diphonique, Institut de la Voix, Limoges, dossier n° 1, 1989, pp. 15-16.
[31] Pailler J.P., “Examen video du larynx et de la cavité buccale de Monsieur Trân Quang Hai”, Le Chant Diphonique, Institut de la Voix, Limoges, dossier n° 1, 1989, pp. 11-13.
[32] Sauvage J.P., “Observation clinique de Monsieur Trân Quang Hai”, Le Chant Diphonique, Institut de la Voix, Limoges, dossier n° 1, 1989, pp. 3-10.
[33] Tisato G., “Analisi e sintesi del Canto Difonico”, Proceedings VII Colloquio di Informatica Musicale (CIM), Cagliari, 1989, pp. 33-51.
[34] Tisato G., Ricci Maccarini A., “Analysis and synthesis of Diphonic Singing”, Bulletin d’Audiophonologie, Vol. 7, n. 5-6, Besançon, 1991, pp. 619-648.
[35] Finchum H., Tuvan Overtone Singing: Harmonics Out of Place,
(http://www.indiana.edu/~folklore/savail/tuva.html), website.
[36] Jansson S., Khöömei Page (http://www.cc.jyu.fi/~sjansson/khoomei.htm), website.
[37] Leothaud G., “Considérations acoustiques et musicales sur le Chant Diphonique”, Le Chant Diphonique, Institut de la Voix, Limoges, dossier n° 1, 1989, pp. 17-43.
[38] Zarlino G., Istitutioni Harmoniche, Venice, 1558.
http://www.researchgate.net/profile/Piero_Cosi/publication/228780318_ON_THE_MAGIC_OF_OVERTONE_SINGING/links/09e4150a363d7236ff000000.pdf

TAKEDA, Shoichi and MURAOKA, Teruo: Analysis of Acoustical Features of Biphonic Singing Voices Male and Female Xöömij and Male Steppe Kargiraa

Analysis of Acoustical Features of Biphonic Singing Voices Male and Female Xöömij and Male Steppe Kargiraa TAKEDA, Shoichi and MURAOKA, Teruo
1 Teikyo Heisei University; 2 Musashi Institute of Technology 1 2289-23 Uruido Ichihara-shi, Chiba 290-0193 JAPAN
E-mail: takeda@thu.ac.jp E-mail: muraoka@ee.ec.musashi-tech.ac.jp

ABSTRACT
This paper clarifies spectral features of Mongolian or Tuva’s biphonic singings such as Xöömij, Steppe Kargiraa, etc. Spectra of five types of Xöömij sounds sung by male singers showed that a resonance with a high Q value is necessary if a listener is to perceive two pitches, and the spectra of all the sounds were found to have second-formant peaks corresponding to the higher-pitch voice. Similar second-formant peaks were observed in Xöömij sounds sung by a female singer. In Steppe Kargiraa /a/ sounds sung by a male singer, we found that first formants have acute peaks instead.

INTRODUCTION
Traditional Asian biphonic singings, among which the Mongolian “Xöömij [1]” may be best known, are produced by a single singer articulating two voices simultaneously: a “drone,” which is bass voice of almost constant low-pitch, and a “melody tone” of high-pitch. Xöömij is most popular in West Mongolia [2], and its singing technique is thought to have spread to European countries and been used in epical chants such as “two voices from a single mouth” in Yugoslavia [3]. Steppe Kargiraa is another example of biphonic singings sung in Tuva, Siberia located in the centre of Asia.
The origin of Xöömij is still uncertain. It was once thought to have been a kind of conjuration, but today is most widely believed to have sprung from a vocal imitation of murmuring streams or the echoes in the Altai mountain-chain [3, 4]. It has also been suggested that Xöömij is an imitation of the sounds of the Morin Khuur [3] and was used to pacify female animals separated from their young; a way in which it is still used [5].
This paper pursues the process of Xöömij generation by using the results of spectral analysis. Taking into account the results of previous acoustical analyses [6, 7], we formulated the following three hypotheses:

1. There actually is, in addition to a glottal source, an independent sound source (such as a whistle). (Hypothesis of Independent Sound Sources) [8]
2. Some portion of the vocal tract vibrates at a high frequency, and the product of the modulation of that high frequency vibration with a glottal source is perceived as the melody tone. (Hypothesis of Modulation)
3. A sharp resonance formed by a peculiar vocal tract shape selectively enhances some harmonics of the glottal source, and this resonance is perceived as the melody tone. (Hypothesis of Resonance)

Past soundspectrographic analyses [1], [9] did not prove any of the hypotheses because the amount of data analyzed was insufficient and the measurements were not accurate enough. We [6, 7], [17] first tested whether the “Hypothesis of Resonance” would be supported by the results of a detailed spectral analysis of a typical example of Xöömij singing and then repeated the analysis [18] using a Xöömij recording obtained under better conditions and using a state-of-the-art computer system. We then examined whether or not our results would hold for other types of Xöömij singing [11-13]. We first investigate the mechanism of Xöömij generation by using numerical speech signal analyses such as short-time FFT analysis, LPC analysis, and cepstrum analysis. Observing the harmonic structures of Xöömij sound waveforms and tracing the transitions of formant frequencies and the accompanying Q (quality of resonance) values, we obtained results consistent with the “Hypothesis of Resonance.”
Adachi & Yamada recently also used FFT and LPC as part of their research on vocal tract shapes during Xöömij singing [10], [16]. They used four Xöömij samples sung by one singer (the type of Xöömij is unknown), and their results also support the Hypothesis of Resonance.

NUMERICAL SIGNAL ANALYSES [18]
We investigated the three hypotheses using Xöömij material. After careful auditory examinations, we selected a recording of unaccompanied single Xöömij singing entitled “Gooj Nanaa” (the singer is unknown) recorded on the LP “Folk Songs [Asian version]” (JVC SKX25017 25018, Japan). The signal was digitized (16-bit samples) at a sampling rate of 22.05 kHz for calculation of formant frequencies, bandwidth, and Q values. For spectrum display the sampling rate was only 11.025 kHz. Short-time FFT was again applied to 1024 data samples and LPC analyses were carried out with a 30-msec Hamming window weighting. The order of LPC analysis for a sectional spectrum display was 10 and that for a 3D time-varying spectrum pattern display was 12. The orders were determined empirically by observing each spectrum.

Figure 1 is an expanded view of the middle part of a Xöömij waveform, where the waveform is considered almost stationary. The melody-pitch heights that were obtained by music score transcription approximately coincided with the second formant frequency F2. This suggests that the movements of F2 are perceived as melody in Xöömij singing. To trace the variation of F2, we calculated the successive spectrum envelopes shown in
Fig. 2.

A distinctive feature of our analysis that a formant that forms the melody tone is revealed by the use of the LPC method. As shown in Fig. 2, this formant is extracted clearly and quantitatively. Notable findings are that the intensities of the second formants of Xöömij sound waveforms are quite different from those of normal speech and that the Q values of F2 range from 6 to 98 and have an average value of 32.
According to the data in the literature [14, 15], the estimated Q of formants in normal speech is at most 30. The spectra of a Xöömij sound signal have a harmonic structure consistent with the Hypothesis of Resonance.

SPECTRAL FEATURES OF VARIOUS XÖÖMIJ ARTICULATIONS [11-13], [17, 18]
The detailed spectral investigation described in the previous section supports the Hypothesis of Resonance but was based on the analysis of only a single Chest Xöömij sample. A stronger conclusion could be drawn from the analysis of many samples of Xöömij with different articulations.
We further investigated samples of five types of Xöömij singing in order to find out whether there are spectral differences between the different types. The samples we analyzed were (1) Nasal Xöömij, (2) Oral-Nasal Xöömij, (3) Glottal Xöömij, (4) Chest Xöömij, and (5) Throat Xöömij.
This classification is based on where the singer believes the resonance point to be, and there is no proof that the resonance is actually at that place. These Xöömij samples were sung by male Mongolian singer Ganbold and were recorded on a CD entitled “Mongolian Songs” (KING RECORD, KICC-5133, Japan (1988)).
For sound pieces in which each of the present authors perceived two tones, sharp peaks could be observed in their spectra. These peaks correspond to the second formant frequencies F2, which thus are strikingly enhanced and are heard as the melody tone. This was commonly found for each type of Xöömij investigated in the present study, thus supporting the Hypothesis of Resonance.

FORMANT TRANSITIONS FROM NORMAL VOWELS TO XÖÖMIJ SOUNDS [18]
We also tried to clarify the spectral features of the transition from the sounds of normal vowels to Xöömij sounds. It is widely recognized that the phonetic impressions of Xöömij sounds somehow resemble [i], [e], or [u] sounds and that Xöömij initially sounds similar to an [u] when the melody tone is not heard clearly. We asked a Japanese Xöömij singer to articulate [(1) Normal vowel_ (2) Xöömij _ (3) Normal vowel] with one breath. The specific vowels used in this exercise were the four Japanese vowels [i], [u], [e], [o], and the singer was asked to pronounce them as normally as possible. It must be noted that our Japanese Xöömij singer’s control of Xöömij articulation was inferior to that of expert Mongolian Xöömij singers because our singer was not as well trained as expert Mongolian Xöömij singers. The analysis results were summarized using an F1-F2 diagram.
As shown in the F1-F2 diagram in Fig 3, shifts of the F1-F2 combinations toward the region of [i] were always observed. This suggests that the location of the stricture during Xöömij singing is almost the same as its location during the articulation of the vowel [i]. In the transitions from vowels to Xöömij, F1 shifted to about 250 Hz, while F2 shifted into the range of 1.8 kHz 2.3 kHz and its remarkable Q-
increases were also observed. The frequency range of F2 is almost the same as that of the melody tone.

ACOUSTICAL FEATURES OF FEMALE XÖÖMIJ VOICES
This section describes acoustical features of female Xöömij voices. It is known to be difficult for females to sing Xöömij songs.
Analysis was conducted using voices of Mongolian female singer Sainkho Namtchylak recorded on a CD entitled “Lost Rivers” (FMP CD 42, Germany (1992)).
The signal was digitized (16-bit samples) at a sampling rate of 16 kHz for spectrum display. Short-time FFT and LPC analyses were carried out with a 30-msec Hamming window weighting.

Figures 4 (a) shows a short-time spectrum of monophonic part of a female Xöömij sound waveform, and (b) shows that of biphonic part. A sharp peak can be observed in the spectrum in Fig. 4 (b), whose sound is perceived as two pitches. This peak corresponds to the second formant frequency F2, which is strikingly enhanced and is heard as the higher pitch. This was commonly found for each sample of female Xöömij voices investigated in the present study, thus supporting again the Hypothesis of Resonance.
A conspicuous difference from male Xöömij voices is in that the harmonic structure of the spectrum of a female Xöömij sound waveform is coarse compare to that of a male one.This coarse harmonic structure may be the reason why it is difficult for female singers to control melody tones.
ACOUSTICAL FEATURES OF MALE STEPPE KARGIRAA VOICES
Another interesting biphonic singing is a Tuva’s singing method called “Steppe Kargiraa,” which is characterized by an extremely low fundamental pitch. Recently the voice-production process has been explained by Imagawa, Sakakibara, Konishi, and Niimi using a glottal source model based on a “false vocal fold [19].” In this section we describe the results of spectral analysis of Steppe Kargiraa sound waveforms that have an auditory impression near a vowel /a/.

Analysis was carried out using voices of two male singers, Fedor Tau and Gundenbiliin Yavgaan. Tau’s voices were recorded on a CD entitled “TUVA Voices from the Center of Asia” (Smithsonian Folkway CD SF 40017, USA (1990)), and Yavgaan’s voices on a CD entitled “Mongolian Xöömij” (King KICW 1004, Japan (1999)). The signal was digitized (16-bit samples) at a sampling rate of 16 kHz for spectrum display. Short-time FFT and LPC analyses were carried out with a 30-msec Hamming window weighting.

Like Xöömij sound waveforms, the spectrum of a Steppe Kargiraa waveform in Fig. 5 (b) shows a prominent formant peak; while that of a normal vowel /a/ in Fig. 5 (a) does not. An interesting finding here is that the peaks yielding melody tones are not the second formant frequencies F2 but the first formant frequencies F1

CONCLUSIONS

We have analyzed spectral features of two types of biphonic singing: Xöömij in Mongolia and Steppe Kargiraa in Tuva. Measuring time-varying formant frequencies and Q values for a typical sample of Xöömij singing, we obtained results suggesting that resonance with an extremely large Q value is required for Xöömij generation. This is consistent with the Hypothesis of Resonance.

To further test this hypothesis, we evaluated samples of four types of Xöömij singing classified according to where the singer believes the resonance point to be. Sharp peaks were found in the spectra of all types of Xöömij. These results support the Hypothesis of Resonance, in which glottal waves and the sharp resonance of their higher harmonics are perceived as biphonic tones.

Another important finding in this work is that the first formant frequencies of Xöömij sound waveforms are constant. Investigating the transitions of formant frequencies from normal vowels to Xöömij sounds, we found that the F1-F2 combination always shifts toward the [i] region, with the first formant frequencies shifting to about 250 Hz.

The results of analyses of spectral features of female Xöömij and male Steppe Kargiraa singings also showed sharp formant peaks in the spectra that yield perception of melody tones. A conspicuous feature of spectra of female Xöömij sound waveforms is that the harmonic structure is coarse compared to those of male Xöömij sound waveforms, which may make female singers control melody tones difficult.

ACKNOWLEDGMENTS
The authors express their sincere appreciations to Professor Kiyoko Motegi at Joetsu Kyoiku University and Mr. Masamitsu Yamakawa, previous senior engineer at JVC Company for their offer a chance to this research. And also thank with all their heart to former Professor Isao Nakamura at Teikyo Heisei University for his invaluable comments, and Messrs. Kikuji Wagatsuma, Yoshiyuki Tsuchikane, and Masato Horiuchi, the research engineers at JVC company for their cooperation to analyses, Dr. Masashi Yamada at the Osaka University of Arts for his offering useful literatures for this research, Mr. Daisuke Naganuma at Teikyo Heisei University (formerly) for his offering Xöömij sounds as a Xöömij singer, Xöömij singer Mr. G. Yavgaan, Mr. Kyoji Hoshikawa, folk music recording producer, Mr. Katsunobu Tokuda at KING RECORD Co., Ltd., President Keiko Kawashima and Ms. Hiroko
Ochiai at Plankton Co. for their offering valuable information on Xöömij. Finally, the authors would like to appreciate Messrs. Masashi Itoga, Katsuhisa Tadokoro, and Masashi Miyashita, former students at the Te ikyo University of Technology (presently Teikyo Heisei University) for their cooperation in the experiments.
This research was partly supported by Grant -in-Aid from Teikyo Heisei University as well as
Grant-in-Aid for Scientific Research on Priority Areas (2) “Diversity of Prosody and its Quantitative
Description” from the Ministry of Education, Culture, Sports, Science and Technology, Japan (No.12132206).

REFERENCES
[1] Trân Q. H. and D. Guillou, “Original research and acoustical analysis in connection with the Xöömij style of biphonic singing,” Musical Voices of Asia, Individual research reports | Mongolia, pp.162-173 (1980).
[2] M. Yamada, “Mongolian biphonic singing Xöömij,” Journal of the Acoustical Society of Japan Vol. 54-9, pp.680-685 (1998).
[3] ” A general survey of Mongolian music,” Asian traditional performing arts 1978,” The Japan Foundation, pp.5-9 (1978.11).
[4] Batzengel, “Urtin duu, Xöömij, and Morin xuur,” Musical Voices of Asia, Seminar information and documentation | Mongolia, pp.52-53 (1980).
[5] H. Hasumi, “Understanding Mongolian music,” Musical Voices of Asia, Seminar information and documentation | Mongolia, pp.142-148 (1980).
[6] T. Muraoka, K. Wagatsuma, and M. Horiuchi, “Acoustic Analysis of the Mongolian singing Xöömij,” Preprint of the Acoustical Society of Japan 2-3-9, pp.385-386 (1983.10).
[7] T. Muraoka, K. Wagatsuma, Y. Tsuchikane, and M. Horiuchi, “On a Consideration of Mongolian Singing Xöömij and its Specialities,” Preprint of the seminar on Musical acoustics, The Acoustical Society of Japan MA84-1, pp.1-6 (1984).
[8] B. Chernov, and V. Maslov, “Larynx -double sound generator,” Proc. 11th Int’1. Cong. Phonetic Sci., pp.40-43 (Tallin, Estonia, 1987).
[9] S. Gunji, “An acoustical consideration of Xöömij,” Musical Voices of Asia, Individual research reports | Mongolia, pp.135-141 (1980).
[10] S. Adachi, and M. Yamada, “An Acoustical Study of Sound Production in Biphonic Singing, Xöömij,” Proceedings of 1997 Japan – China Joint Meeting on Musical Acoustics, pp.21-26 (Tokyo, 1997).
[11] S. Takeda, M. Itoga, Y, Sato and Y, Ueda, “Analysis of Acoustical Features of Mongolian Singing “Khöömij”,” Proc. Acoust. Soc. Jap. 2-7-15, pp605-606 (Oct, 1992).
[12] S. Takeda, M. Itoga, “On the differences in Spectra in Accordance with the Phonemic and Tone-height Differences in Mongolian Singing “Khöömij”,” Proc. Acoust. Soc. Jap. 2-3-3, pp.499-500 (March, 1993).
[13] S. Takeda, M. Itoga, “Analysis of Acoustic Features of Mongolian Singing “Khöömij”,” Technical Report on Musical Information Sci.1-4, pp.1-4 (April, 1993).
[14] J. Ohizumi, and Y. Fujimura, Onsei kagaku (Science of Human Voices), Tokyo University Publishing (1972).
[15] K. Nakata, Onsei (Human voices), Acoustic Engineering Series by the Acoustical Society of Japan (Corona Publishing Co., Ltd., Tokyo, 1977).
[16] S. Adachi, and M. Yamada, “An Acoustical Study of Sound Production in Biphonic Singing, Xöömij,” Journal of the Acoustical Society of America, 105, pp.2920-2932 (May, 1999).
[17] T. Muraoka, S. Takeda, and M. Itoga, “Analysis of Acoustic Features of Mongolian Xöömij Singing,” Journal of the Acoustical Society of Japan Vol. 56-5, pp.308-317 (May, 2000).
[18] T. Muraoka, S. Takeda, and M. Itoga, “An Acoustical Analysis of Mongolian Xöömij Singing,” Journal of the Acoustical Society of America (in submission).
[19] H. Imagawa, K. Sakakibara, T. Konishi, and S. Niimi, “Glottal Source Model for Throat Singing Based on Vocal Fold and False Vocal Fold Vibrations,” Proc. Acoust. Soc. Jap. 1-6-14, pp.255-256 (March 2001).