Werner A. Deutsch & Franz Födermayr: Visualization of Multi – Part Music

Frequency analysis of musical sounds came up to practical applications with the development of the Sound Spectrograph (Koenig, Dunn and Lacey, 1946). From the beginning much care has been taken to choice the frequency resolution and the time window properly in order to highlite important acoustical features as well as perceptual ones. It has been demonstrated by several studies (i.e. Potter, Kopp and Green, 1947) that the aural presentation of speech (and music) and its simultaneous graphic representation produces significantly deeper insight into the generation of acoustical signals and the ongoing perception as listening alone can provide.

Visualization of Multi – Part Music
(Acoustics and Perception)

Werner A. Deutsch (Austrian Academy of Sciences, Acoustics Research Laboratory) and
Franz Födermayr (Institute of Musicology, University of Vienna)

Introduction

Frequency analysis of musical sounds came up to practical applications with the development of the Sound Spectrograph (Koenig, Dunn and Lacey, 1946). From the beginning much care has been taken to choice the frequency resolution and the time window properly in order to highlite important acoustical features as well as perceptual ones. It has been demonstrated by several studies (i.e. Potter, Kopp and Green, 1947) that the aural presentation of speech (and music) and its simultaneous graphic representation produces significantly deeper insight into the generation of acoustical signals and the ongoing perception as listening alone can provide.

Graf (1963) recognized the enormous potential of spectrographic analysis for applications in ethnomusicology. His theoretical concept assumes the acoustical signal to be the primary stimulus which is processed by the human psychophysiological system very much in the same way, even in different ethnic populations. What makes the various differences in interpretation, reception and perception under very similar acoustical stimulus representations prominent, is due to the influence of the so called social-cultural context in which music plays an important role.

Production Models

The pertinent acoustic analysis of musical signals with acoustic laboratory methods (which today can be performed by using a specially equipped laptop computer.) produces basically a complete set of acoustical parameters which can be displayed as graphical images of the spectral content, i.e. the physics of the musical signal in real time or of those performances which have been recorded in advance. The analysis data can be used as input to comprehensive production models of voice( see: Fant, G. (1970) Acoustic theory of speech production. Mouton, The Hague; 2nd edition), musical instruments and musical ensembles. Sound source characteristics, tuning, musical scales, timbre, agogics, free field and room acoustics etc. can be observed on the analysis parameters extracted directly from the musical signal. Musical scales, vibrato, pulsato, beats are measured and detected on the basis of the fundamental frequency analysis data and their related spectral components, timbre is very much determined by the spectral envelope of the signals, duration and rhythms are mainly derived from the energy contour etc.

Perception Models

Whereas production models of the singing voice and musical instruments describe the acoustics of musical sound sources only, perception models deal with the signal processing of the listeners auditory periphery, its associated central pathways and cortical functions. It has to be admitted that psychoacoustics first started from an acoustical engineering approach in order to collect all technical basic data of the human auditory system, as selectivity measured in terms of absolute thresholds, difference limens in frequency, sound pressure level, signal duration and many other psychophysical functions. Most of the early psychoacoustical research was launched by telephone technical laboratories ( Fletcher, H. 1929, 1953), by the need to avoid noise and distortions on the telephone lines or for compensation of the hearing loss of listeners. Engineers, physiologists and neurologists have described the mechanics of the outer and middle ear, the hydromechanics of the inner ear ( Bekesy, G.v. 1960), the hair cell system and the resulting neural response up to the brainstem ganglions as well as acoustical evoked responses on the cortex. For technical and methodological limitations this early research has been done in most cases applying musically less relevant sinusoids, which could be controlled in experimental procedures with sufficient accuracy. This has been critisized frequently by musicologists for dealing rather with musicological non relevant aspects of sound and arbitrary functions of the auditory system instead of referring to the cognitive concepts of music.

Nevertheless, as the work in psychoacoustics progressed, the basic data obtained from the human auditory system contributed to a comprehensive theory of hearing, which today is capable to include highly relevant aspects of auditory localization, speech and music perception. Today psychoacoustical models explain complex perceptual functions, as musical pitch of complex tones, melody contours, consonance-dissonance, simultaneous masking, forward and backward masking, figure-background discrimination as well as Gestalt of musical rhythms etc.

Visualization of polyphony

FFTs and Spectrograms

Applying the psychoacoustic knowledge to spectrographic analysis of polyphony, the visualization of musical signals represents both, the graphical output of psychoacoustic perception models and the physics of sound. The spectral analysis of any arbitrary acoustical signal at a given instant is obtained by its Fourier Transform which produces a pair of real-valued functions of frequency, called the amplitude (or magnitude) spectrum and the phase spectrum. The amplitude spectrum stays moreover as a first approximation for the (neuro-) physiological representation of the signal in the human auditory system, the phase spectrum can be neglected for spectrographical purposes:

As the time variant signal goes on, many closely time windowed overlapping Fourier Transforms have to be computed at short successive intervals (< 30 ms) in order to produce a pseudo-3dimensional continuous graphic display of the sound, the spectrogram. In general narrow band frequency components with slow variations in frequency are detectable as horizontal frequency lines, whereas very fast changes or signal envelopes of a transient nature appear as vertical broad band bars in the spectrogram. Many musical instrument sounds (plucked strings, striked bars etc.) have a very short broad band attack and a narrow band slowly decreasing decay. Thus the onset of a note is easily identified, not so the end of the decay especially in reverberant environments).

Beats: From left to right: simple tone 220 Hz, simple tone 227 Hz, two tone complex 220 Hz + 227 Hz with beating, two tone complex 220 Hz + 240 Hz (light roughness), two tone complex 220 + 260 Hz (roughness), two tone complex (musical fifth).

Interference, Beats and Roughness

Usually directly incident or reflected waves from many sources, sounding simultaneously (musical instruments, singing voices etc.), are superposed at the listeners ear position, producing interference when components of equal frequency appear. Constructive interference takes place when the crests of two waves coincide, resulting the amplitude will be twice that of either wave. Destructive interference occurs when the crests of one wave fall on the troughs of the second and cancellation will be obtained. In case of interference of components slightly different in frequency beats can be perceived. The beat frequency is given by difference between the frequencies sounding together; beats can be detected on the spectrogram as periodic rise and fall in amplitude on a single (horizontal) frequency line. Whenever the frequency difference exceeds a certain value of 20 Hz no beating can be heard anymore and the perception of roughness is raised which has its maximum between 40 and 70 Hz. Increasing the frequency difference further on (see: critical bandwidth) produces two tone perception.

Masking

One of the most difficult phases in the investigation of spectrograms is the decision wether or not a spectral component of a signal which physically exists can be perceived by the auditory system and to what extent. The phenomenon that spectral components of a complex tone are not audible, despite their considerable amplitude measured, is described by the human auditory masking function. Masking is (1) the process by which the threshold of audibility for one sound is raised by the presence of another (masking) sound and (2) the amount by which the threshold of audibility of a sound is raised by the presence of another (masking) sound. The unit customarily used is the decibel (ANSI S3.20-1973). Masking may be seen as a general loss of information or as an undesired decrease of sensitivity of the auditory system but in contrary it is one of the most important auditory functions in order to perform the frequency analysis of the ear. Masking helps to process the sound into perceptual relevant components either belonging to the same or different sounds; it determines which components are resolved by the ear as audible harmonics with spectral pitch as well as it fuses higher harmonics according to the auditory critical bandwidth.

Critical Bands

The critical band in hearing can roughly be described as that frequency band of sound, in between that two spectral components influence one another. This influence can be expressed in terms of masking, loudness summation, roughness, consonance, dissonance etc. The bandwidth of the critical bands remains constant with 100 Hz up to a frequency of 500 Hz and increases up to 17\% of the midfrequency value beyond 500 Hz. Consequently the distribution of the spectral components of any acoustical signal along the basilar membrane of the inner ear is best approximated by the Bark\footnote{according to the acoustician Barkhausen (1926). scale which corresponds to the frequency spacing of the critical bands. A formal expression for the computation of the Bark scale has been given by Zwicker and Terhardt (1980). The unit of frequency (f) is assumed to be in kHz, arctan in radiants:

  • z_c /Bark = 13 arctan (0.76 f/kHz) + 3.5 arctan (f /7.5 kHz)2

As a result of the Bark transformation a much better frequency resolution in the linear low frequency range up to 500 Hz is obtained. The resolution is progressively reduced at higher frequencies. Spectrograms using the Bark scale represent the psychoacoustical frequency spacing of the inner ear and can be interpreted in terms of perceptual relevant spectral frequency distribution.

Relevance-Spectrography

The transformation of the frequency axis into Bark scale and the extraction of irrelevant spectral components from the signal creates a so-called Relevance-Spectrogram which contains those frequency components only which evoke neurophysiological activity (SPL-Ecxess). It represents the signal associated to the neural excitation pattern in the auditory nerve, containing the relevant information parameters for the processing at higher neural levels. Thus the musical interpretation of spectrograms is highly facilitated as irrelevant signal parts can not show up. Moreover by applying an categorized intensity detection procedure (a concept of overmasking) the most prominent spectral peaks of the signal are extracted and figure-background discrimination can be obtained ( Deutsch \& Noll, 1993). This enables the listener to follow the leading voice without interference of the background signal in many cases.

Pitch

The perception of pitch of complex tones has been a topic discussed extensively in psychoacoustics since the well known controversy beween Hermann von Helholtz and Georg Simon Ohm on one side and August Seebeck on the other. The problem, which is still an important question in hearing theories, started from Seebecks observation that the pitch of a complex tone with a missing fundamental still remains at the pitch level of the fundamental frequency. Ohms acoustic law followed Fouriers theorem and stated in contrary, pitches of frequencies which existe objectively (as components of a complex tone) can be heard only. Ohms acoustical law strongly supported Helmholtzs hearing theory according to which the partials of a complex tone are distributed along the basilar membrane (place theory) and resonance is responsible {Note: Helmholtzs experimental setup consisted mainly in resonators, he invented). His acoustical sources have been tuning folks. Seebeck used an acoustic siren, blowing air against the holes of a turning disk. By proper spacing of the holes a complex tone is produced without its fundamental frequency. for the mechanical stimulation of the hair cells. He explained Seebecks missing fundamental phenomenon by arguing nonlinearities in the inner ear would evoke the low frequency pitch, creating an objective product of nonlinearity (difference tone or combination tone between the higher harmonics) at the place of the fundamental frequency.

Modern pitch theory is based on the results of Georg von Bekesys and J. F. Schoutens work. Both have stimulated the research on pitch perception for about 50 years. Bekesys travelling wave theory is strongly supported by physiological experiments (Bekesy, 1960) and Schoutens (1940) observations on the residue pitch made evident, that the ear works in both domains simultaneously: in the frequency domain by means of hydromechanics with a far then perfect result of a Fourier Transform and in the time domain where any onset or even a slight change in the regular vibration of the basilar membrane is detected.

Fianlly pitch has been defined as that attribute of an auditory sensation in terms of which sounds may be ordered on a scale extending from low to high. The unit of pitch was assigned the mel (ANSI S3.20-1973). Thus pitch depends primarily upon the frequency of the sound stimulus, but it also depends upon the sound pressure and the waveform on the stimulus. The pitch of a sound may be described by the frequency or frequency level of that pure tone having a specified sound pressure level that is judged by subjects to have the same pitch.

The discussion on pitch perception came to an premature end when Terhardt (1974) published a model of pitch perception which includes both, the virtual pitch and the spectral pitch. He applied the concept of Gestalt perception, which in musicology frequently is understood to describe sequential melody contours only, on simultaneous sounding partials of a single complex tone. This enables the listener to still perceive the complex tone as a whole even when prominent components are missing (e.g. the fundamental frequency) or when their amplitude is as low that they can not contribute to pitch perception. Thus two general modes of pitch perception have to be encountered: the holistic mode integrating the partials of any complex tone to a good Gestalt, evoking virtual pitches and the analytic mode, focussing more on the spectral components of the sound and isolating individual partials of the complex tone as it is described by the concept of spectral pitch.

The following conclusions for the today work in pitch perception and music transcription have to be drawn:

  • the pitch of a complex tone very likely may be ambiguous,
  • pitch matches have therefore to be done with sinusoids only,
  • spectral pitch and virtual pitch may exist in between the same individuum, responding to the same sound, dependent upon subjective experiences,
  • musical theories of melody and counterpart introduce interpretative framework which not necessarily must correspond with perception.

Example 1: Highland Bagpipe

In the case of drone polyphony at least two psychoacoustical phenomena are generally relevant: masking and interference; the special characteristic of the drone sound is given by its relative stationarity in pitch and timbre throughout the total duration of the musical piece or a part of it, enabling melody tones to interfer with related spectral components of the drone. The following example is taken from a pibroch played on a Piob Mhor (highland bagpipe, Vienna Phonogramm Archive, Tape 17979, J. Brune, 1973). The key of the pipe chanter is usually spoken as A. The two tenor drones are tuned to the octave below the A of the chanter and the bass drone sounds an octave lower still ( Mac Neill, S. & Richardson, 1987). In our example the frequency value of /A/ is 116 Hz. The drone pipes produce a harmonic amplitude spectrum up to 7 kHz. Some partials show slow beats appearantly according to the slight mistuning of both tenor pipes. The ornamental sections of the sound probe are of equal overall duration (820 ms), whereas the sustained melody tones vary in duration from 1920 to 2830 ms. Interference is given mainly between the 4th, 5th, 6th and 8th harmonic of the drone and 1st harmonic of the sustained melody tones (/a3/, /c4 sharp/, /e4/, /a5/) depending upon their amplitude relation.

    Spectrogram: Piob Mhor (highland bagpipe, Vienna Phonogramm Archive, Tape B17979, J. Brune, 1973). Spectrogram unprocessed.


Piob Mhor: according to the irrelevance-threshold signal processed, all spectral components below the masked threshold have been extracted. Approximately 67% of the weaker FFT-amplitudes have been set to zero.

Piob Mhor: difference signal, 67\% of the weaker amplitudes represent the signal below the masked threshold (irrelevance threshold). After being extracted from the original signal these components can be made audible again. The superposition of this spectrogram and the 2nd exactly produces the first spectrogram as well as the difference signal + irrelevance corrected signal = original..

Generally the sustained longer chanter (melody) pipe tones interfere (11s to 16s) with higher harmonics of drone tones, alternating with notes having no interference with the drone (see 8s to 11s) and short melody tones constituing the melismes (at 2s to 8s, 14s). The occurence of beats at each 2nd harmonic of the drone spectrum indicates beating between the two tenor drone pipes with a frequency difference of 0.85 Hz. The beating between the 2nd and the 4th harmonic of the drone with a rate of approximately 1.7 Hz is not of most perceptual importance. This beating does not effect the overall drone sound dominantely. Perceptually more relevant is the beating between the partials of the drone and sustained melody tones seen at 2.6s to 6s, 11s to 13s etc.

The interference of spectral components of both, the drone and the melody tones can be observed already on the spectrogram (fig. 1). Its perceptual relevance as indicated above can be seen in the relevance-spectrogram (fig. 2) from which the masked components of the signal have been removed. What happens to the signal when the masked threshold has been computed is demonstrated in the difference signal (fig. 3). From the lower harmonics of the drone sound, a2 and a3 are not affected by masking, as well as the 6th harmonic (e5). This results in a continous prominence of the fundamental and the fifth of the drone, the first corresponding to the basic tone of the melody, the second corresponding to the dominant tone of the melody. This fact has been mentioned already by Collinson (1970:167); Brune (1981:48) and MacNeill & Richardson (1987:32) but they all explained it by focussing on a strong 3rd harmonic of the bass drone. In contrary the example currently under investigation shows a very week 3rd harmonic of the bass drone and a strong, almost unmasked 3rd harmonic of the tenor pipes.

Several harmonics of the chanter pipes are stroger than the drone and consequently mask their neighbouring partials of the drone. The first partial of a4 of the chanter masks e4 and c-sharp5 of the drone sound and the first partial of e5 of the chanter masks c-sharp and g of the drone sound; whereas the sustained melody tones c-sharp5 and f-sharp5 themselves are partially masked by the harmonics of the drone sound. Taken together, the results of these observations provide psychoacoustical evidence (1) for the characteristic hierarchical structure given by the fifth a-e of the melody, which is strongly supported by the masking phenomenon. (2) The continuous sounding drone enlarges the overall frequency range downward, anchoring the melody into the tonal space.

Example 2: Bulgarian Multi-Part Song

The next example (fig.4 to 6) shows the role of roughness and frequency fluctuations (tremolo) as characteristics of a diaphonic type of Bulgarian multi-part singing (Messner, 1980:passim; Brandl, 1992; Födermayr & Deutsch, 1992:381-384). Masking has no effect in the region of the fundamental frequencies, even at the strongest partials (2 and 4) weak masking can be observed only. It does not influence the constituting elements of the sounds. Thus the partials of the individual voices interact with their full objective existent amplitudes. Throughout the whole piece a characteristic interval between two voices is produced, fairly constant with a width of three quarters of a whole tone. The resulting frequency differences between the fundamental frequencies are in the range of 30 Hz, evoking the sensation of roughness. Even when strong tremolo appears in Tressene figures, the average frequency difference remains close to 150 cents. Generally start and target points of exclamations fall on frequency values of the characteristic interval. The rate of the tremolo ranges between approximately 4 and 8 fluctuations /s which is known close to the ears maximum of sensitivity to frequency modulation.

Long term spectrogram of Bulgarian multi-part song: Balkanton BHA 2067, II 6. The duration of the piece is 39s. The spectrogram shows the segmentation of the song in 3 x 3 parts of equal duration.

Segment No. 3 (8s – 13s) of Bulgarian multi-part song: Balkanton BHA2067, II 6. The spectrogram shows the characterstic interval of 150Cents, several exclamations and two tremolo of 8 and 4 Hz fluctuationrate

Example 3: Epic Chant, Gujarat

The sound of the drone instrument ( Tharisar, Födermayr, 1968) is characterized by a single pitched (233 Hz) harmonic spectrum with decreasing amplitudes. The recitation as well as the sung parts follow the fundamental frequency of the drone sound with distinct variations. Short quasi-stationary tones of the recitation have an ambitus up to several whole tones using the fundamental frequency of the drone as midfrequency value, those of the sung parts are asymmetric and clother to the drone frequency with intervals downwards to a semi tone and upwards to a third. The drone implements a tonal function as finalis of the song. Roughness is produced during the sung parts only due to the interference of the drone and sustained voiced tones.

Long term spectrogram: Epic Chant of the Kunkana, Gujarat (PhA B 12125). The first 3s of the sound example show the drone isolated, followed by drone and recitation (3s – 15.5s) and sung part segments (15.5s – 30s). This example demonstrates the special kind of voicing during the parlando up to the first half duration of the sound segment displayed (up to 15s) and the song section with melodic lines closely related to the drone tones. The drone is given by a friction idiophone (Tharisar).

Epic Chant of the Kunkana, sung part segment, duration 3.5 s. The asymetry of the sung part in relation to the drone frequency can easily be detected from the first and 2nd harmonic.

Example 4: Lullaby in Yodel-technic, Bangombe Pygmies

The interdependence of pitch and timbre has been pointed out already in the section on pitch perception. The Yodel-technique of the Bangombe Pygmies elicitates both different modes of pitch perception: virtual pitch and spectral pitch. Two female voices exhibit the following variations:

  • tone to tone change of voice register: chest – falsetto
  • no isoparametric tone sequences with register change
  • unisono with different register: upper voice chest, lower voice falsetto
  • tone to tone vowel quality change (first and second vowel formant effect), upper voice: vowel /a/ chest, lower voice vowel /i/ falsetto, vowels /a/, /ae/ chest voice

The interaction between pitch, vowel quality and register change causes selective amplification of partials in the area of the vowel formant peak frequency, in the range of the first or 2ndnd partial of the female voices (633 Hz). The harmonics are sufficiently spaced apart to be resolved by the ear, producing virtual as well as spectral pitches. Whenever the fundamental frequency is significantly weaker as the 2ndnd harmonic, spectral pitch can be perceived by the analytic type of listeners. At will the perception can be focussed on the fundamental again and a holistic type of listening occurs.

Lullaby of Bangombe pygmy women (PhA B10840 G. Kubik, 1965): the peak amplitude contour of the solo part shows the A-B-A pattern of fundamental /e5-flat/ – 2nd harmonic /b4-flat/ – fundamental /e5-flat/ and so on. Falsetto tones are marked in diamonds. The inherent pattern of the upper voice is indicated, starting at 114 s.

The perceptual pitch ambiguity can best be described on the basis of the spectrogram: the peak amplitude of the beginning solo part shows the A-B-A pattern of fundamental /e-flat/ – 2ndnd harmonic /b-flat/ – fundamental /e-flat/ etc. According to the virtual pitch perception /e5-flat/ /b4-flat/ /e5-flat/ has to be perceived whereas subjects following the sepctral pitch hear /e5-flat/ /b5-flat/ /e5-flat/. The spectrogramm clearly shows the fundamental frequency contour. The phenomenon described has been addressed by a number of investigators and in detail by Albrecht (1972). By further analysing the spectrogram a melo-rhythmic pattern in the upper voice (120s to 134s) can be identified; it is aready seen as inherent pattern in the beginning of the solo part starting from the third phrase. The perception of the inherent pattern can be explained by the similarity of timbre of neighbouring tones, the falsetto /f/ and /e-flat/ of phase 3 and the chest voice /c/ /b-flat/ as well as /b-flat/ /g/ of phrase 4. Approximately at location 115s (marked with an asterix) /b4-flat/ is perceived instead of /b5-flat/ which exists objectively. This octave error helps to obtain the continuity of the melody in order to support the good Gestalt. Finally even in parts both voices are in unisono the distinction between the individual voices can easily maintained due to the predominant difference ebtween the chest and falsetto register.

In conclusion and for further studies on that line the spectrogram has been proved as an indespensible basis for the evaluation of complex tonal patterns as represented by the example described.

Lullaby of Bangombe pygmy women: duet. The arrows pointing downward indicate spectral components associated witjh the upper voice. Arrows pointing upward indicate those belonging to lower voice.


continuation of previous spectrogram.

Example 5: Overtone Singing: Tran Quang Hai

Overtone singing of the nature given by mongolian and turk people (as well as by Tran Quang Hai’s reproductive performances) is characterized by (1) a sustained fundamental frequency contour and (2) a melody which is composed from harmonic overtones of that fundamental frequency. The overtone phenomenon has been recognized to be an acoustical factor of the special setting of resonances of the human vocal tract. It has been sufficiently explained by the acoustic theory of voice production (Fant, 1960). Moreover this example shows the coincidence of a production model and the corresponding perception model.

Tran Quang Hai: overtone singing, spectrogram.

The acoustic model of the speech production assumes the glottal spectrum as the primary source for voiced sounds and the vocal tract acting as a filter attached on it: the glottal spectrum consists of a series of harmonics produced by glottal air pulses described in a model according to the myoelastic theory of {Berg (1957)} which has been accepted widely. The slope of the {\em source spectrum} depends on the shape of the individual closing and opening of the vocal folds during one fundamental period; a glottal waveform with more sudden closures produces stronger high frequency harmonics and a sharper timbre or voice quality. The fundamental frequency of the voice is determined by the repetition rate of the glottal pulses which is controlled (1) by the laryngeal musculature affecting the tension and the mass distribution of the vocal chords and (2) by changes of subglottal pressure. Decreased subglottal pressure, reduced mass of the vocal chords and increased tension raise the fundamental frequency.

The tube of the human vocal tract with a length of approximately 17,5 cm is attached on top of the laryngeal section. Its cross section can be changed to wider and narrower constrictions by the walls of the pharynx, the tongue, the jaw opening and the lips. The formant frequencies of vowels are related to the length of the tube and its shape. They represent the resonance frequencies of the vocal tract in non nasalized sounds. When the nasal tract is coupled on, by lowering the soft palate, the amplitude of the vowel formants decreases and a more complex resonace/antiresonace behavior of the vocal tract can be observed. The special setting of overtone singing suppresses the formant frequencies of the normal voice and emphasizes a very small frequency range, as narrow that one partial is amplified only. The result is shown in the spectrograms (fig. 12,13); the fundamental frequency is continuously sounding on one sustained low pitch and the melody is controlled by proper changing of the main resonace frequency. Thus overtone melodies can be played by picking out individual harmonics from the complex tone of the glottal pulse.

Tran Quang Hai: overtone singing. The output of the model of voice production (Linear Prediction Coding, 24 coefficients) extracts the first overtone of the fundamental frequency and the harmonics with the peak amplitude. The overtone melody is produced by setting the vocal tract main resonances accordingly.

The point to be emphasized is that in this case a coincidence of a (voice) production model and the associated perception model can be stablished. Nevertheless it has to be examined from case to case which aspects of the production model can be considered as significant for the perception.

Conclusion

Although these examples are of demonstrative nature only they are consistent with the general concept of introducing acoustics, physiology and psychoacoustics into the process of musical analysis. We have excluded for reasons not outranging the size of this contribution only the very challenging approach of {\em Analysis by Synthesis} as it has been applied in speech research since the beginning of vocoder techniques. Resynthesis of musical sounds can be extremly forceful when appropriate sound analysis data are available. As long as the physical parameters of musical sounds have not been evaluated upon their psychoacoustical effects, the perceptual relevance of individual components of complex sounds can be determined by trial and error only. The introduction of perceptual concepts in the analysis of music yields to results typically much better than would be obtained from acoustics alone.

Aknowledgments

Our special thanks to Prof. Dr. Kreysig for reading the english version of this paper and improving its style.

References

Albrecht, Erla M. (1972): Das akustische Residuum. Phil. Diss. Univ. Wien.

ANSI S3.20-1973}: American National Standard; Psychoacoustical Terminology. New York.

Bekesy, Georg von (1960): Experiments in Hearing. New York: McGraw-Hill.

Berg, Jw.van den, J.T. Zantema, and P. Doorenbal, Jr. (1957): On the Air Resistance and the Bernoulli Effect of the Human Larynx. Journal
of the Acoustical Society of America, Vol.29, No.5,p626-631.

Brandl, Rudolf M. (1992): Die Schwebungsdiaphonie im Epiros und verwandte Stile im Lichte der Psychoakustik, in: Schumacher, R. (Hg): von der Vielfalt musikalischer Kultur. Anif 1992:43-79.

Brune, John A. (1981): Piob Mhor und andere britisch-irische Sackpfeifen, in: Schriften zur Volksmusik (Wien, 1981) 41-58.

Collinson, Francis (1970): The traditional and national music of Scotland. London.

Deutsch, W.A. & Anton Noll (1993): Simulation auditorischer Signaltrennung in komplexen musikalischen Signalen durch Übermaskierung. DAGA, Fortschritte der Akustik.

Fant, Gunnar (1970): Acoustic theory of speech production. Mouton, The Hague; 2nd edition.

Fletcher, Harvey (1929): Speech and Hearing. D. van Nostrand Company, Inc. New York.

Fletcher, Harvey (1953): Speech and Hearing in Communication. D. van Nostrand Company, Inc. New York.

Födermayr Franz (1968): Über ein indisches Reibidiophon und die Drone-Praxis, in: Mitteilungen der Anthropologischen Gesellschaft in Wien, 98:75-79.

Födermayr Franz & Werner A. Deutsch (1992): Musik als geistes- und naturwissenschaftliches Problem, in: Gratzer, W. & A. Lindmayr (Hg.), De editione musices. Laaber, 377-389.

Graf, Walter (1963/64): Moderne Klanganalyse und wissenschaftliche Anwendung, in: Schriften des Vereins zur Verbreitung naturwissenschaftlicher Kenntnisse in Wien, 104:43-66. Neudruck in Graf (1980).

Graf, Walter (1980): Vergleichende Musikwissenschaft. Ausgewählte Aufsätze, hg. von F. Födermayr, Wien-Föhrenau.

Helmholtz, Hermann von L.F. (1863): Die Lehre von den Tonempfindungen als physiologische Grundlage für die Theorie der Musik. Vieweg & Sohn, Braunschweig; 6. Aufl. 1913.

Koenig, Walter K., H.K. Dunn, L.Y. Lacey (1946): The Sound Spectrograph. Journal of the Acoustical Society of America, Vol. 18, p. 19-49.

Mac Neill, Seumas and Frank Richardson (1987): Piobreachd and its interpretation. Edinburgh; p.32.

Messner, Gerald F. (1980): Die Schwebungsdiaphonie in Bistrica Tutzing.

Ohm, Georg, Simon (1843): Über die Definition des Tones, nebst daran geknüpfter Theorie der Sirene und ähnlicher tonbildender Vorrichtungen. Annalen der Physik und Chemie, 59, pp. 513-565.

Potter Ralph K., George A. Kopp, Harriet C. Green (1947): Visible Speech. D.van Nostrand Company Inc. New York.

Schouten, J.F. (1940): The perception of subjective tones Proc. Kon. Nederl. Akad. Wetensch. 41, 1086-1093.

Seebeck, A. (1841): Beobachtungen über einige Bedingungen zur Entstehung von Tönen. Annalen der Physik und Chemie, 53; 417-436.

Seebeck, A. (1843): Über die Sirene. Annalen der Physik und Chemie, 60; 449-487.

Terhardt, Ernst (1972): Zur Tonhöhenwahrnehmung von Klängen. II. Ein Funktionsschema. Acustica, Vol 26/4, 187-199.

Zwicker, Eberhard and E. Terhardt (1980): Analytical expression for critical-band rate and critical bandwidth as a function of frequency. JournaL of the.Acoust.Soc.Am. 68(5), Nov. 1980; 1523-1525.

http://www.kfs.oeaw.ac.at/research/psychoacoustics/musicology/vis_multipart_music/poly1.htm

PIERO COSI, GRAZIANO TISATO : ON THE MAGIC OF OVERTONE SINGING

PIERO COSI, GRAZIANO TISATO : ON THE MAGIC OF OVERTONE SINGING
Posted on January 6, 2015 by haidiphonie
Standard

ON THE MAGIC OF
OVERTONE SINGING
Piero Cosi, Graziano Tisato
*ISTC-SFD – (ex IFD) CNR
Istituto di Scienze e Tecnologie della Cognizione – Sezione di Fonetica e Dialettologia
(ex Istituto di Fonetica e Dialettologia) – Consiglio Nazionale delle Ricerche
e-mail: cosi@csrf.pd.cnr.it tisato@tin.it
www: http://nts.csrf.pd.cnr.it/Ifd
I really like to remember that Franco was the first person I met when I approached the “Centro di Studio per le Ricerche di Fonetica” and I still have a greatly pleasant and happy sensation of that our first warm and unexpectedly informal talk. It is quite obvious and it seems rhetorical to say that I will never forget a man like Franco, but it is true, and that is, a part from his quite relevant scientific work, mostly for his great heart and sincere friendship.
1. ABSTRACT
For “special people” scientific interests sometimes co-occur with personal “hobbies”. I remember Franco talking to me about the “magic atmosphere” raised by the voice of Demetrio Stratos, David Hykes or Tuvan Khomei1 singers and I still have clear in my mind Franco’s attitude towards these “strange harmonic sounds”. It was more than a hobby but it was also more than a scientific interest. I have to admit that Franco inspired my “almost hidden”, a part from few very close “desperate” family members, training in Overtone Singing2. This overview about this wonderful musical art, without the aim to be a complete scientific work, would like to be a small descriptive contribute to honor and remember Franco’s wonderful friendship.
2. THE THROAT-SINGING TRADITION
“Khomei” or “Throat-Singing” is the name used in Tuva and Mongolia to describe a large family of singing styles and techniques, in which a single vocalist simultaneously produces two (or more) distinct tones. The lower one is the usual fundamental tone of the voice and sounds as a sustained drone or a Scottish bagpipe sound. The second corresponds to one of the harmonic partials and is like a resonating whistle in a high, or very high, register. For convenience we will call it “diphonic” sound and “diphonia” this kind of phenomenon.
Throat-Singing has almost entirely been an unknown form of art until rumours about Tuva and the peculiar Tuvan musical culture spread in the West, especially in North
1 We transcribe in the simplest way the Tuvan term, for the lack of agreement between the different authors: Khomei, Khöömii, Ho-Mi, Hö-Mi, Chöömej, Chöömij, Xöömij.
2 This is the term used in the musical contest to indicate the diphonic vocal techniques.
America, thanks to Richard Feynman [1]3, a distinguished American physicist, who was an ardent devotee of Tuvan matters.
This singing tradition is mostly practiced in the Central Asia regions including Bashkortostan or Bashkiria (near Ural mountains), Kazakhstan, Uzbekistan, Altai and Tuva (two autonomous republics of the Russian Federation), Khakassia and Mongolia (Fig. 1), but we can find examples worldwide: in South Africa between Xosa women [3], in the Tibetan Buddhist chants and in Rajastan.
The Tuvan people developed numerous different styles. The most important are: Kargyraa (chant with very low fundamentals), Khomei (it is the name generally used to indicate the Throat-Singing and also a particular type of singing), Borbangnadyr (similar to Kargyraa, with higher fundamentals), Ezengileer (recognizable by the quick rhythmical shifts between the diphonic harmonics), Sygyt (like a whistle, with a weak fundamental) [4]. According to Tuvan tradition, all things have a soul or are inhabited by spiritual entities. The legends narrate that Tuvan learnt to sing Khomei to establish a contact and assimilate their power trough the imitation of natural sounds. Tuvan people believe in fact that the sound is the way preferred by the spirits of nature to reveal themselves and to communicate with the other living beings.

TISATO 1 MAP
Figure 1. Diffusion of the Throat-Singing in Central Asia regions.
In Mongolia most Throat-Singing styles take the name from the part of the body where they suppose to feel the vibratory resonance: Xamryn Xöömi (nasal Xöömi), Bagalzuuryn Xöömi (throat Xöömi), Tseedznii Xöömi (chest Xöömi), Kevliin Xöömi (ventral Xöömi, see Fig. 13), Xarkiraa Xöömi (similar to the Tuvan Kargyraa), Isgerex (rarely used style: it sounds like a flute). It happens that the singers itself confuse the different styles [5]. Some very famous Mongol artists (Sundui and Ganbold, for example) use a deep vibrato, which is not traditional, may be to imitate the Western singers (Fig. 13).
The Khakash people practice three types of Throat-Singing (Kargirar, Kuveder or Kilenge and Sigirtip), equivalent to the Tuvan styles Kargyraa, Ezengileer and Sygyt. We
3 Today, partly because of Feynman’s influence, there exists a society called “Friends of Tuva” in California, which circulates news about Tuva in the West [2].
find again the same styles in the peoples of the Altai Mountains with the names of Karkira, Kiomioi and Sibiski. The Bashkiria musical tradition uses the Throat-Singing (called Uzlau, similar to the Tuvan Ezengileer) to accompany the epic chants. In Uzbekistan, Kazakhstan and Karakalpakstan we find forms of oral poetry with diphonic harmonics [6].
The Tibetan Gyuto monks have also a tradition of diphonic chant, related to the religious believes of the vibratory reality of the universe. They chant in a very low register in a way that resembles (see later the difference) the Tuvan Kargyraa method. The aim of this tradition is mystical and consists in isolating the 5th or the 10th harmonic partial of the vocal sound. They produce in this way the intervals of 3rd or 5th (in relation to the fundamental) that have a symbolic relation with the fire and water elements (Fig. 14) [4].

TISATO 2
Figure 2. Spectral section of a vocal (up) and a diphonic vocal (down).
3. SEPARATION OF THE AUDITORY IMAGE IN THROAT-SINGING
What is so wonderful in Throat-Singing? It is the appearance of one of the harmonic partials that discloses the secret musical nature of each sound. When in Throat-Singing the voice splits in two different sounds, we experience the unusual sensation of a pure, discarnate, sine wave emerging from the sound. It is the same astonishment we feel when we see a rainbow, emerging from the white light, or a laser beam for the first time.
The natural sounds have a complex structure of harmonic or inharmonic sinusoidal partials, called “overtones” (Fig. 2). These overtones are not heard as distinct sounds, but their relative intensity defines our perception of all the parameters of sound (intensity, pitch, timbre, duration). The pitch corresponds to the common frequency distance between
the partials and the timbre takes into account all the partials as a whole. The temporal evolution of these components is what makes the sound of each voice or instrument unique and identifiable.
In the harmonic sounds, as the voice, the components are at the same frequency distance: their frequency is a multiple of the fundamental tone (Fig. 2). If the fundamental frequency is 100 Hz, the 2nd harmonic frequency is 200 Hz; the 3rd harmonic frequency is 300 Hz, and so on. The harmonic partials of a sound form a natural musical scale of unequal temperament, as whose in use during the Renaissance [7]. If we only take into consideration the harmonics that are easy to produce (and to perceive also), i.e. from the 5th to the 13th, and if we assume for convenience a C3 131 Hz as starting pitch, we can get the following musical notes:
Harm. N. Freq. (Hz) Note Interval with C3
5 655 E5 3rd
6 786 G5 5th
7 917 A+ 6th +
8 1048 C6 Octave
9 1179 D6 2nd
10 1310 E6 3rd
11 1441 F6+ 4th +
12 1572 G6 5th
13 1703 A6- 6th-
The series of 8th, 9th, 10th, 12th, 13th harmonic and the series from 6th to 10th are two possible pentatonic scales to play. Note that the frequency differences between these scales and the tempered scale are on the order of 1/8th of a tone (about 1.5%).
The Throat-Singing allows extracting the notes of a natural melody from the body of the sound itself.
The spectral envelope of the overtones is essential for the language comprehension. The glottal sound is filtered by the action of the vocal tract articulation, shaping the partials in the voice with some characteristic zones of resonance (called formants), where the components are intensified, and zones of anti-resonance, where the partials are attenuated (Fig. 2-3). So, the overtones allow us to tell apart the different vocal sounds. For example the sounds /a/, /e/, /i/, /o/, etc. uttered or sung at the same pitch, nevertheless sound different to our ears for the different energy distribution of the formants (Fig. 2).
The auditory mechanisms “fuse” the partials in one single “image”, which we identify as voice, musical instrument, noise, etc. [8]. In the same way, the processing of visual data tends to group different dots into simple shapes (circle, triangle, square, etc.). The creation of auditory images is functional to single out and to give a meaning to the sonic sources around us.
The hearing mechanisms organize the stream of perceptive data belonging to different components of different sounds, according to psychoacoustics and Gestalt principles. The “grouping by harmonicity”, for example, allows the fusion in the same sound of the frequency partials, which are multiples of a common fundamental. The “common fate” principle tells that we integrate the components of a complex sound, which show the same amplitude and frequency behaviour (i.e. similar modulation and microvariation, similar attack and decay, similar vibrato, etc.) [8]. If one of these partials reveals a particular evolution (i.e. it is mistuned or has not the same frequency and amplitude modulation, etc.),
it will be heard as a separate sound. So the Throat-Singing is a marvelous example to understand the illusory nature of perception and the musical structure of the sound.

TISATO 3 FIG 3
Figure 3. Resonance envelope for an uniform vocal tract (left). A constriction on the pharynx moves the formants so that the intensity of partials in the 2500-3500 Hz region increases (right).
4. FUNDAMENTAL TECHNIQUES IN THROAT-SINGING
In the Throat-Singing the singer learn to articulate the vocal tract so that one of the formants (usually the first or the second) coincide with the desired harmonic, giving it a considerable amplitude increase (even more than 30 dB, see in Fig. 2 the 10th harmonic) and making it perceptible. Unlike the normal speech, the diphonic harmonic can exceed a lot the lower partials intensity (Fig. 2). Soprano singers use similar skill to control the position of the 1st formant, tuning it to the fundamental with the proper articulation (i.e. proper opening of the mouth), when they want to sing a high note [9].
There are many different methods to produce the diphonic sound [5-6], but we can summarize them in two possible categories, called “single cavity method” or “two cavities method”, that are characterized by the use or not of the tongue, according to the proposal of Tran Quang Hai [4].
4.1 SINGLE CAVITY METHOD
In this method, the tongue doesn’t move and remains flat or slightly curved without touching the palate. In this case the vocal tract is like a continuous tube (Fig. 3). The selection of the diphonic harmonic is obtained by the appropriate opening of the mouth and the lips. The result is that the formants frequency raises if the vocal tract lengthens (for example with a /i/) and that the formants frequency lowers, if it extends (for example with a /u/). With this technique the 1st formant movement allows the selection of the partials. As we can see in Fig. 4, we cannot go beyond 1200 Hz. The diphonic harmonic is generally feeble, masked by the fundamental and the lower partials, so the singers nasalize the sound to reduce their intensity [10-11].

TISATO 4 FIG 4
Figure 4. Opening the mouth controls the 1st formant position. The movement of the tongue affects the 2nd formant and allows the harmonic selection in a large frequency range.
4.2 TWO CAVITIES METHODS
In this method, the tongue is raised so to divide the vocal tract in two main resonators, each one tuned on a particular resonance. By an appropriate control, we can obtain to tune two separate harmonics, and thereby to make perceptible, not one but two (or more) pitches at the same time (Fig. 9-12).
There are three possible variants of this technique:
The first corresponds to the Khomei style: to select the desired harmonic the tip of the tongue and the tongue body moves forward (higher pitch) and backward (lower pitch) along the palate.
The second is characteristic of the Sygyt style: the tip of the tongue remains fixed behind the upper teeth while the tongue body rises to select the harmonics.
In the third variant, the movement of the tongue root selects the diphonic harmonic. Shifting the base of the tongue near the posterior wall of the throat, we obtain the lower harmonics. On the contrary, moving the base of the tongue forward, we pull out the higher harmonics [6].
A different method has been proposed by Tran Quang Hai to produce very high diphonic harmonics (but not to control the selection of the desired component). It consists
to keep the tongue pressed by the molars, while singing the vowels /u/ and /i/, and maintaining a strong contraction of the muscles at the abdomen and the throat [4].
The advantage of the two cavities techniques is that we can use the 2nd formant to reinforce the harmonics that are in the zone of best audibility. In this case the diphonic harmonic reaches the 2600 Hz (Fig. 4). Furthermore the movement of the tongue affects the formants displacement in opposite directions. The separation of the 1st and the 2nd formant produces in between a strong anti-resonance (Fig. 2), which helps the perception of the diphonic harmonic.
In all these methods it is useful a slight discrete movement of the lips to adjust the formants position.
5. REINFORCING THE DIPHONIC SOUND
There are three main mechanisms required to reinforce the effect of segregation of the diphonic sound:
• The appropriate movement of the lips, tongue, jaw, soft palate, throat, to produce a fluctuation in the amplitude of the selected harmonic, so that it differentiates from the other partials that remain static. The auditory mechanisms are tuned to capture the more subtle changes in the stream of auditory information, useful to discriminate the different sounds [8].
• The nasalization of the sound. In this way we create an anti-resonance at low frequency (<400 Hz) that attenuates the lower partials responsible for the masking of the higher components [10-11]. The nasalization provokes also the attenuation of the third formant [12], which improves the perception of the diphonic harmonic (Fig. 2).
• The constriction of the pharynx region (false ventricular folds, arytenoids, root of the epiglottis), which increases the amplitude of the overtones in the 2000-4000 Hz region (Fig. 2). This is also what happens in the “singer’s formant”, the technique used by the singers to reinforce the partials in the zone of best audibility and to avoid the masking of the voice by the orchestra, generally very strong in the low frequency range [9]. For this reason the Throat-Singing technique requires a tuning extremely precise and selective, in order to avoid the amplification of a group of harmonic partials, as in the “singer’s formant”.
6. VOICE MULTIPHONICS
We disregard in this paper the polyphonic singing that could produces some diphonic effects: for example the phenomenon of the quintina in the Sardinia religious singing, where the coincidence of the harmonics of 4 real voices produces the perception of a 5th virtual voice (Fig. 5) [13].
There are in the literature many terms to indicate the presence of different perceptible sounds in a single voice: Khomei, Throat-Singing, Overtone Singing, Diphonic Singing, Biphonic Singing, Overtoning, Harmonic Singing, Formantic Singing, Chant, Harmonic Chant, Multiphonic Singing, bitonality, diplophonia, vocal fry, etc.
According to the pioneer work in the domain of the vocal sounds made by The Extended Vocal Techniques Ensemble (EVTE) of San Diego University and bearing in mind that there is little agreement regarding classifications [4], [14-15], the best distinctive criterion for the diphonia seems to be the characterization of the sound sources that produce the perception of the diphonic or multiphonic sound [16].
Following this principle, we can distinguish between Bitonality and Diphonia:
• Bitonality: In this case there are two distinct sound sources that produce two sounds. The pitches of the two sounds could be or not in harmonic relationship. This category includes: diplophonia, bitonality and vocal fry.
• Diphonia: The reinforcement of one (or more) harmonic partial(s) produces the splitting of the voice in two (or more) sounds. This category includes: Khomei, Throat-Singing, Overtone Singing, Diphonic Singing, Biphonic Singing, Overtoning, Harmonic Singing, Chant, Harmonic Chant.

TISATO 5 FIG 5 - Copie
Fig. 5 Sardinia religious folk singing. The pitches of the 4 voices of the choir are F1 88 Hz, C2 131 Hz, F2 176 Hz, A3# 230 Hz. The 8th harmonic of the F1, the 6th of the C2, the 4th of the F2 and the 3rd of the A# coincide at 700 Hz and produce the perception of a 5th voice.
6.1 BITONALITY
Diplophonia: The vibration of the vocal folds is asymmetrical. It happens that after a normal oscillatory period, the vibration amplitude that follows is reduced. There is not the splitting of the voice in two sounds, but the pitch goes down one octave lower and the timbre assumes a typical roughness. For example, assuming as fundamental pitch a C3 130.8 Hz, the resulting pitch will be C2 65.4 Hz. If the amplitude reduction happens after two regular vibrations, the actual periodicity triplicates and then the pitch lowers one octave and a 5th. The diplophonic voice is a frequent pathology of the larynx (as in unilateral vocal cord paralysis), but can be also obtained willingly for artistic effects (Demetrio Stratos was an expert of this technique) [16-18].
Bitonality: The two sound sources are due to the vibration of two different parts of the glottis cleft. This technique requires a strong laryngeal tension [16-17]. In this case there is not necessarily a harmonic relationship between the fundamentals of the two sounds. In the Tuvan Kargyraa style, the second sound source is due to the vibration of the supraglottal structures (false folds, arytenoids, aryepiglottic folds that connects the arytenoids and the epiglottis, and the epiglottis root). In this case generally (but not always) there is a 2:1 frequency ratio between the supraglottal closure and vocal folds closure. As in the case of Diplophonia, the pitch goes down one octave lower (or more) [19-21].
Vocal fry: The second sound is due in this case to the periodic repetition of a glottal pulsation of different frequency [14]. It sounds like the opening of a creaky door (another common designation is “creaky voice”). The pulse rate of vocal fry can be controlled to produce a range from very slow single clicks to a stream of clicks so rapid to be perceived as a discrete pitch. Therefore vocal fry is a special case of bitonality: the perception of a second sound depends on a pulses train rate and not on the spectral composition of the single sound.
6.2 DIPHONIA
Diphonic and Biphonic refer to any singing that sounds like two (or more) simultaneous pitches, regardless of technique. Use of these terms is largely limited to academic sources. In the scientific literature the preferred term to indicated Throat-Singing is Diphonic Singing.
Multiphonic Singing indicates a complex cluster of non-harmonically related pitches that sounds like the vocal fry or the creaky voice [14]. The cluster may be produced expiring as normal, or also inhaling the airflow.
Throat Singing is any technique that includes the manipulation of the throat to produce a melody with the harmonics. Generally, this involves applying tension to the region surrounding the vocal cords and the manipulation of the various cavities of the throat, including the ventricular folds, the arytenoids, and the pharynx.
Chant generally refers to religious singing in different traditions (Gregorian, Buddhist, Hindu chant, etc.). As regards the diphonia, it is noteworthy to mention the low singing practiced by Tibetan Buddhist monks of the Gyuto sect. As explained before, they reinforce the 5th or the 10th harmonic partial of the vocal sound for mystical and symbolic purposes (Fig. 14). This kind of real diphonia must be distinguished from resonantial effects (enhancement of some uncontrolled overtones) that we can hear in Japanese Shomyo Chant [4] and also in Gregorian Chant.
Harmonic Singing is the term introduced by David Hykes to refer to any technique that reinforces a single harmonic or harmonic cluster. The sound may or may not split into two or more notes. It is used as a synonym of Overtone Singing, Overtoning, Harmonic Chant and also Throat-Singing.
Overtone Singing can be considered to be harmonic singing with an intentional emphasis on the harmonic melody of overtones. This is the name used by Western artists that utilizes vowels, mouth shaping, and upper-throat manipulations to produce melodies and textures. It is used as a synonym of Harmonic Singing, Overtoning, Harmonic Chant and also Throat-Singing.

TISATO 6 FIG 6
Fig. 6 Tuvan Khomei Style. The fundamental is a weak F#3+ 189 Hz. The diphonic harmonics are the 6th (C#6+ 1134 HZ), 7th (E6 1323 Hz), 8th (F#6+ 1512 Hz), 9th (G#6+ 1701 Hz), 10th (A#6+ 1890 Hz) and 12th (C#7+ 2268 Hz).
7. KHOMEI STYLES
Although there is no widespread agreement, Khomei comprises three major basic Throat-Singing methods called Khomei, Kargyraa, and Sygyt, two main sub methods called Borbangnadyr and Ezengileer and various other sub styles.
Khomei means “throat” or “pharynx” and it is not only the generic name given to all throat-singing styles for Central Asia, as underline above, but also a particular style of singing. Khomei is the easiest technique to learn and the most practiced in the West. It produces clear and mild harmonics with a fundamental usually within the medium range of the singer’s voice (Fig. 6). In Khomei style there are two (or more) notes clearly audible. Technically the stomach remains relaxed and there is a low-level tension on larynx and ventricular folds, whereas Sygyt style requires a very strong constraint of these organs (Fig. 7). The tongue remains seated flatly between the lower teeth as in the Single Cavity technique, or raises and moves as in the Two Cavities techniques. The selection of the desired harmonic comes mainly from a combination of different lips, tongue and throat movements.
Sygyt means “whistle” and actually sounds like a flute. This style is characterized by a strong, even piercing, harmonic and can be used to perform complex and very distinct melodies (Fig. 10). It has its roots in the Khomei method and has the same range for the fundamental. Sygyt is sung with a half-open mouth and the tip of tongue placed behind front teeth as if pronouncing the letter “L”. The tongue tip is kept in the described position, while the tongue body moves to select the harmonic. This is the same technique described above for the Khomei method. The difference is in the timbre quality of the sound lacking of energy in the low frequencies. To produce a crystal-clear, flute-like overtone,
characteristic of the Sygyt style, it is necessary to learn how to filter out the lower harmonic components, that usually mask the overtone sensation.

TISATO 7 FIG 7
Figure 7. Position of the arytenoids in Khomei (left) and Sygyt style [21].
Crucial for achieving this goal is a considerable pressure from the belly/diaphragm, acting as a bellows to force the air through the throat. Significant tension is required in the throat as well, to bring the arytenoids near the root of the epiglottis (Fig. 7). In this way, we obtain the displacement of first 3 formants in the high frequency zone (Fig. 3). The result is that the fundamental and the lower harmonics are so attenuated to be little audible (Fig. 10).
It is possible to sing Sygyt either directly through the center of the mouth, or, tilting the tongue, to one side or the other. Many of the best Sygyt singers “sing to the side”: directing the sound along the hard surfaces of the teeth enhances the bright, focused quality of the sound.
Kargyraa style produces an extremely low sound that resembles the roaring of a lion, the howling of a wolf, and the croaking of a frog and all these mixed together (Fig. 9). Kargyraa means “hoarse voice”. As hawking and clearing the throat before speaking Kargyraa is nothing else than a deep and continuous hawking. This hawking must rise from the deepest part of the windpipe; consequently low tones will start resonating in the chest. Overtones are amplified by varying the shape of the mouth cavity and the position of the tongue. Kargyraa is closely linked to vowel sounds: the selection of diphonic harmonic corresponds to the articulation of a particular vowel (/u/, /o/, //, /a/, etc.), which the singer learnt to associate with the desired note.
This technique is a mixture of Diphonia and Bitonality (see 6.1): in fact the supraglottal structures start to vibrate with the vocal folds, but at a half rate. The arytenoids also can vibrate touching the root of the epiglottis, hiding the vocal folds and forming a second “glottic” source [21]. The perceived pitch will be one octave lower than normal (Fig. 9), but also one octave and a 5th lower [20]. In the case of Tran Quang Hai voice, the fibroendoscopy reveals the vibration and the strong constriction of the arytenoids that hide completely the vocal folds (Fig. 8).
We must distinguish this technique from the Tibetan Buddhist chant, which is produced with the vocal folds relaxed as possible, and without any supraglottal vibration. The Tibetan chant is more like the Tuvan Borbangnadyr style with low fundamentals.

TISATO 8 FIG 8
Figure 8. Simulation of the Kargyraa style by Tran Quaang Hai: the arytenoids move against the root of the epiglottis and hide the vocal folds [21].
Borbangnadyr is not really a style, as are Khomei, Sygyt and Kargyraa, but rather a combination of effects applied to one of the other styles. The name comes from the Tuvan word for “rolling”, because this style features highly acrobatic trills and warbles, reminiscent of birds, babbling brooks, etc. While the name Borbangnadyr is currently most often used to describe a warbling applied to Sygyt, it is also applied to some lower-pitched singing styles, especially in older texts. The Borbangnadyr style with low fundamentals sounds like the Tibetan Buddhist chant.
Rather the pitch movement of the melody, Borbangnadyr generally focuses the attention on three different harmonics, the 8th, 9th, and 10th, which periodically take their turn in prominence (Fig. 11). In this style the singer easily can create a triphonia effect between the fundamental, a second sound corresponding to the 3rd harmonic at an interval of 5th, and the tremolo effect on the higher harmonics.
Ezengileer comes from a word meaning “stirrup” and features rhythmic harmonic oscillations intended to mimic the sound of metal stirrups, clinking to the beat of a galloping horse (Fig. 12). Ezengileer is a variant of Sygyt style and differs considerably from singer to singer, the common element being the “horse-rhythm” of the harmonics.
8. OVERTONE SINGING IN THE WEST
In the West the Overtone Singing technique has unexpectedly become very popular, starting into musical contests and turning very soon to mystical, spiritual and also therapeutic applications. The first to make use of a diphonic vocal technique in music was Karlheinz Stockhausen in Stimmung [22]. He was followed by numerous artists and amongst them: the EVTE (Extended Vocal Techniques Ensemble) group at the San Diego University in 1972, Laneri and his Prima Materia group in 1973, Tran Quang Hai in 1975, Demetrio Stratos in 1977 [17-18], Meredith Monk in 1980, David Hykes and his Harmonic Choir in 1983 [23], Joan La Barbara in 1985, Michael Vetter in 1985, Christian Bollmann in 1985, Noah Pikes in 1985, Michael Reimann in 1986, Tamia in 1987, Bodjo Pinek in 1987, Josephine Truman in 1987, Quatuor Nomad in 1989, Iegor Reznikoff in 1989, Valentin Clastrier in 1990, Rollin Rachele in 1990 [24], Thomas Clements in 1990, Sarah Hopkins in 1990, Les Voix Diphoniques in 1997.

TISATO 9 FIG 9
Figure 9. Vasili Chazir sings “Artii-sayir” in the Kargyraa Tuvan style. The fundamental pitch is B1 61.2 Hz. The diphonic harmonics are the 6th (F#4- 367 HZ), 8th (B4 490 Hz), 9th (C#5 550 Hz), 10th (D#5- 612 Hz) and 12th (F#5- 734 Hz). The diphonic (but not perceptible) harmonics 12th-24th are in octave with the previous one. In the 2600-2700 Hz region, a steady formant amplifies the 43rd and 44th harmonics.

TISATO 10 FIG 10
Figure 10. Tuvan Sygyt style. The fundamental is a weak E3+ 167 Hz. The melody uses the 8th (E6+ 1336 Hz), 9th (F#6+ 1503 Hz), 10th (G#6+ 1670 Hz) and 12th (B6+ 2004 Hz). There is a rhythmic shift between contiguous harmonics each 900 ms. In the 3000-3200 Hz zone, we can see a second resonance region.

TISATO 11 FIG 11.jpg
Figure 11. Tuvan Borbangnadyr style. The fundamental is a weak F#2 92 Hz. We can see on the harmonics 7-11 the effect of a periodic formantic shift (6 Hz about).

TISATO 12 FIG 12
Figure 12. Tuvan Ezengileer style. The fundamental is A#2 117 Hz.
The most famous proponent of this type of singing is David Hykes. Hykes experimented with numerous innovations including changing the fundamental (moveable drone) and keeping fixed the diphonic formant, introducing text, glissando effects, etc., in numerous works produced with the Harmonic Choir of New York (Fig. 15) [23].
9. ACOUSTIC ANALYSIS
In the recent past, some work has been done on the analysis of Khomei, and more has been done on Overtone Singing generally. The focus on this research has been on the effort to discover exactly how overtone melodies are produced. Hypotheses as to the mechanics of Overtone Singing range from ideas as to the necessary physical stance and posture used by the singer during a performance, to the actual physical formation of the mouth cavity in producing the overtones.
Aksenov was the first to explain the diphonia as the result of the filtering action of the vocal tract [25-27]. Some years later Smith et al. engaged in an acoustical analysis of the Tibetan Chant [28]. In 1971, Leipp published an interesting report on Khomei [29]. Tran Quang Hai carried out a deep research on all the diphonic techniques [4-5][30]. The mechanism of the diphonia was demonstrated in 1989 by two different methodologies. The first applied direct clinical-instrumental methods to study the vocal tract and vocal cords [31-32]. The optic stroboscope revealed the perfect regularity of the vocal folds vibration. The second method made use of a simple linear prediction model (LPC) to analyse and synthesize the diphonic sound [33-34]. The good quality of the resynthesis demonstrated that the diphonia is due exclusively to the spectral resonance envelope. The only difference between normal and diphonic sound consists in the unusual narrow bandwidth of the prominent formant.
Several researchers seem to agree that the production of the harmonics in Throat-Singing is essentially the same as the production of an ordinary vowel. Bloothooft reports an entire investigation of Overtone Singing, based on the similarity of this kind of phonation to the articulation of vowel [10].
Other authors, on the contrary, argue that the physical act of creating overtones may originate in vowel production, but the end product, the actual overtones themselves, are far from vowel-like [35]. They stated, in fact, that for both acoustic and perceptual reasons, the production of an overtone melody cannot be described as vowel production.
Acoustically, a vowel is distinctive because of its formant structure. In Overtone Singing, the diphonic formant is reduced to one or a few harmonics, often with surrounding harmonics attenuated as much as possible. Perceptually, Overtone Singing usually sounds nothing like an identifiable vowel. This is primarily because, a major part of the overtone-sung tone has switched from contributing to the timbre of the tone to provoking the sensation of melody and such a distorted “vowel” can convey little phonetic information.
10. CONCLUDING REMARKS
All musical sounds contain overtones or tones that resonate in fixed relationships above a fundamental frequency. These overtones create tone color, and help us to differentiate the sounds of different music instruments or one voice and another.
Different cultures have unique manifestations of musical traditions, but, what it is quite interesting, is that some of them share at least one aspect in common: the production of overtones in their respective vocal music styles. Among these, each tradition has also its own meanings and resultants from Overtone Singing, but they are often related to a common sphere of spirituality. Overtones in Tibetan and Gregorian Chant, for example, are linked with spirituality, and even health and well being. Overtones in Tuvan Khomei have at least three different meanings: shamanistic, animistic, and aesthetic.

TISATO 13 FIG 13
Figure 13. Mongolia: Ganbold sings a Kevliin Xöömi (ventral Xöömi, similar to Tuvan Sygyt.). The pitch is G3# 208 Hz. The diphonic harmonics are 6th (D#6 1248 Hz), 7th (F#6- 1456 Hz), 8th (G#6 1664 Hz), 9th (A#6+ 1872 Hz), 10th (C7- 2080 Hz), 12th (D#7 2496 Hz). There is a 6 Hz strong vibrato.

TISATO 14 FIG 14
Figure 14. Tibetan Gyuto Chant in the Yang style. The pitch is a weak A1 56 Hz. In the beginning, the singer chant a vowel /o/ that reinforces the 5th partial (and the 10th). In the choir part, the articulation of the prayers produces a periodic emerging of all the scale of the harmonics up to the 30th. There is also a fixed resonance at 2200 Hz.

TISATO 15 FIG 15
Figure 15. David Hykes and the Harmonic Choir. In this 100 s passage from “Hearing the Solar Winds” [23], the pitch moves slowly from A3, A#3, B3, C4, A3, to the final G3. The diphonic harmonics change in the range 6th-12th.
11. ACKNOWLEDGMENTS
We would like to thank Sami Jansson [36] and Steve Sklar [15] for the useful information they made available to us via their respective web sites.
REFERENCES
[1] Feynman (http://www.feynmanonline.com/), website.
[2] Friends of Tuva (http://www.fotuva.org/), website.
[3] Dargie D., “Some Recent Discoveries and Recordings in Xhosa Music”, 5th Symposium on Ethnomusicology, University of Cape Town, International Library of African Music (ed) , Grahamtown, 1985, pp. 29-35.
[4] Tran Quang Hai, Musique Touva, 2000, (http://www.baotram.ovh.org/tuva.html), website.
[5] Tran Quang Hai, Zemp H.,“Recherches expérimentales sur le Chant Diphonique”, Cahiers de Musiques Traditionnelles, Vol. 4, Genève, 1991, pp. 27-68.
[6] Levin Th., Edgerton M., The Throat Singers of Tuva, 1999,
(http://www.sciam.com/1999/0999issue/0999levin.html), website
[7] Walcott R., “The Chöömij of Mongolia – A spectral analysis of Overtone Singing”, Selected Reports in Ethnomusicology, UCLA, Los Angeles, 1974, 2 (1), pp. 55-59.
[8] Bregman A., Auditory scene analysis: the perceptual organization of sound, MIT Press, Cambridge, 1990.
[9] Sundberg J., The science of the singing voice, Northern Illinois University Press, De Kalb, Illinois, 1987.
[10] Bloothooft G., Bringmann E., van Capellen M., van Luipen J.B., Thomassen K.P., “Acoustic and Perception of Overtone Singing”. In Journal of the Acoustical Society of America, JASA Vol. 92, No. 4, Part 1, 1992, pp. 1827-1836.
[11] Stevens K., Acoustic Phonetics, MIT Press, Cambridge, 1998.
[12] Fant G., Acoustic theory of speech production, Mouton, The Hague, 1960.
[13] Lortat-Jacob B., “En accord. Polyphonies de Sardaigne: 4 voix qui n’en font qu’une”, Cahiers de Musiques Traditionnelles, Genève, 1993, Vol. 6, pp. 69-86.
[14] Kavasch D., “An introduction to extended vocal techniques”, Report of CME, Univ. of California, San Diego, Vol. 1, n. 2, 1980, pp. 1-20.
[15] Sklar S., Khöömei Overtone Singing, (http://www.atech.org/khoomei), website.
[16] Ferrero F., Ricci Maccarini A., Tisato G., “I suoni multifonici nella voce umana”, Prooceedings of XIX Convegno AIA, Napoli, 1991, pp. 415-422.
[17] Ferrero F., Croatto L., Accordi M., “Descrizione elettroacustica di alcuni tipi di vocalizzo di Demetrio Stratos”, Rivista Italiana di Acustica, Vol. IV, n. 3, 1980, pp. 229-258.
[18] Stratos D., Cantare la voce, Cramps Records CRSCD 119, 1978.
[19] Dmitriev L., Chernov B., Maslow V., “Functioning of the voice mechanism in double voice Touvinian singing”, Folia Phoniatrica, Vol. 35, 1983, pp. 193-197.
[20] Fuks L., Hammarberg B., Sundberg J., “A self-sustained vocal-ventricular phonation mode: acoustical, aerodynamic and glottographic evidences”, KTH TMH-QPSR, n.3, Stockholm, 1998, pp. 49-59.
[21] Tisato G., Ricci Maccarini A., Tran Quang Hai, “Caratteristiche fisiologiche e acustiche del Canto Difonico”, Proceedings of II Convegno Internazionale di Foniatria, Ravenna, 2001, (to be printed).
[22] Stockhausen K., Stimmung, Hyperion A66115, 1968.
[23] Hykes D., David Hykes and the Harmonic Choir, (http://harmonicworld.com), website.
[24] Rachele R., “Overtone Singing Study Guide”, Cryptic Voices Productions (ed), Amsterdam, 1996, pp. 1-127.
[25] Aksenov A.N., Tuvinskaja narodnaja muzyka, Mosca, 1964.
[26] Aksenov A.N., “Die stile der Tuvinischen zweistimmigen sologesanges”, Sowjetische Volkslied und Volksmusikforschung, Berlin, 1967, pp. 293-308.
[27] Aksenov A.N., “Tuvin folk music”, Journal of the Society for Asian Music, Vol. 4, n. 2, New York, 1973, pp. 7-18.
[28] Smith H., Stevens K.N., Tomlinson R.S., “On an unusual mode of singing of certain Tibetan Lamas”, Journal of Acoustical Society of America, JASA. 41 (5) , USA, 1967, pp. 1262-4.
[29] Leipp M., “Le problème acoustique du Chant Diphonique”, Bulletin Groupe d’Acoustique Musicale, Univ. de Paris VI, n. 58, 1971, pp. 1-10.
[30] Tran Quang Hai, “Réalisation du chant diphonique”, Le Chant diphonique, Institut de la Voix, Limoges, dossier n° 1, 1989, pp. 15-16.
[31] Pailler J.P., “Examen video du larynx et de la cavité buccale de Monsieur Trân Quang Hai”, Le Chant Diphonique, Institut de la Voix, Limoges, dossier n° 1, 1989, pp. 11-13.
[32] Sauvage J.P., “Observation clinique de Monsieur Trân Quang Hai”, Le Chant Diphonique, Institut de la Voix, Limoges, dossier n° 1, 1989, pp. 3-10.
[33] Tisato G., “Analisi e sintesi del Canto Difonico”, Proceedings VII Colloquio di Informatica Musicale (CIM), Cagliari, 1989, pp. 33-51.
[34] Tisato G., Ricci Maccarini A., “Analysis and synthesis of Diphonic Singing”, Bulletin d’Audiophonologie, Vol. 7, n. 5-6, Besançon, 1991, pp. 619-648.
[35] Finchum H., Tuvan Overtone Singing: Harmonics Out of Place,
(http://www.indiana.edu/~folklore/savail/tuva.html), website.
[36] Jansson S., Khöömei Page (http://www.cc.jyu.fi/~sjansson/khoomei.htm), website.
[37] Leothaud G., “Considérations acoustiques et musicales sur le Chant Diphonique”, Le Chant Diphonique, Institut de la Voix, Limoges, dossier n° 1, 1989, pp. 17-43.
[38] Zarlino G., Istitutioni Harmoniche, Venice, 1558.
http://www.researchgate.net/profile/Piero_Cosi/publication/228780318_ON_THE_MAGIC_OF_OVERTONE_SINGING/links/09e4150a363d7236ff000000.pdf

Spectrum of the Tuvin Melody “Artii Sayir” by Tran Quang Hai

Spectrum of the Tuvin Melody “Artii Sayir”
QuangHai Tran
Ajoutée le 14 févr. 2009
Trân Quang Hai sung with overtones the Tuvin melody “Artii Sayir” analyzed by the software ” Overtone Analyzer” invented by Bodo Maass and Wolfgang Saus from Germany

Prof. TSAI Chen-Gia’s publications

Tsai, Chen-Gia

Associate professor, Graduate Institute of Musicology

Ctr. for Neurobiology and Cognitive Science

National Taiwan University, Taiwan

Research Interests

Biomusicology, neuroesthetics, arts and medicine, affective science, music acoustics, Xiqu

Books

Tsai, C.G.*, & Chen, R.S. (2017). Structures and Emotions in Chinese Sentimental Ballads: A Perspective of Cognitive Psychology (in Chinese). Taipei: Faces Publishing LTD.

Tsai, C.G. (2013). The Cognitive Psychology of Music (in Chinese). Taipei: NTU Press.

Tsai, C.G. (2011). Alternative Watching/Listening: Brain Diseases and Voice Disorders in Performing Arts (in Chinese). Taipei: NTU Press.

Journal Articles

Tsai, C.G. (2018). The psychology of musical creativity: the self, executive control, and generation of creative ideas (in Chinese). Journal of National Taiwan University of Arts, 27.

Tsai, C.G., Chou T.L., & Li, C.W.* (2018). Roles of posterior parietal and dorsal premotor cortices in relative pitch processing: comparing musical intervals to lexical tones. Neuropsychologia, 119, 118-127. [SCI, IF=2.888]

Tsai, C.G., Du, W., & Chen, C.L.* (2017). Influence of literature music on the museum visitor experience: a case study of the Laiho Memorial Museum (in Chinese). Museology Quarterly, 31(3), 5-29.

Tsai, C.G., Li, C.W., Yeh, C.H., Chen, R.S., & Lin, Y.S.* (2017). Why do mandarin popular songs usually deal with break-ups? The therapeutic potential of sentimental ballads (in Chinese). Indigenous Psychological Research in Chinese Societies, 47, 371-420. [TSSCI]

Wu, M.T., & Tsai, C.G.* (2017). Emotional effects of Teresa Teng’s songs in Taiwanese healthy and disabled older adults (in Chinese). Journal of Humanities, Social Sciences and Medicine, 4, 119-138.

Tsai, C.G.*, & Hsia, L.T. (2107). Musical features and theatrical uses of Jin-La-Man-Chang rhythmic mode in Xiqu (in Chinese). Taipei Theatre Journal, 25, 105-128.

Wen, Y.C., & Tsai, C.G.* (2017). The effect of harmonization on cortical magnetic responses evoked by music of rapidly changing tonalities. Psychology of Music, 45(1), 22-35. [SSCI, IF=2.173]

Cheng, T.H., & Tsai, C.G.* (2016). Female listeners’ autonomic responses to dramatic shifts between loud and soft music/sound passages: a study of heavy metal songs. Frontiers in Psychology, 7, 182. [SSCI, IF=2.560]

Tsai, Y.H., & Tsai, C.G.* (2016). Emotional effects of the chorus scenes in musicals on audience: a study on Les Misérables and Chicago (in Chinese). Collected Papers on Arts Research, 25, 147-166.

Li, C.W., Chen, J.H., & Tsai, C.G.* (2015). Listening to music in a risk-reward context: the roles of the temporoparietal junction and the orbitofrontal/insular cortices in reward-anticipation, reward-gain, and reward-loss. Brain Research, 1629, 160-170. [SCI, IF=2.988]

Chen, C.L., & Tsai, C.G.* (2015). The influence of background music on the visitor museum experience: a case study of the Laiho Memorial Museum. Visitor Studies, 18(2), 183-195.

Tsai, C.G.*, & Chen, C.P. (2015). Musical tension over time: listeners’ physiological responses to the ‘retransition’ in classical sonata form. Journal of New Music Research, 44(3), 271-286. [SSCI, IF=0.771]

Tsai, C.G.*, Yang, C.M., Chen, C.C., Chen, I.P., & Liang, K.C. (2015). Relaxation and executive control processes in listeners: an exploratory study of music-induced transient suppression of skin conductance responses. Empirical Studies of the Arts, 33(2), 125-143. [SSCI, IF=0.370]

Chang, Y.H., Lee, Y.Y., Liang, K.C., Chen, I. P., Tsai, C.G.*, & Hsieh, S.* (2015). Experiencing affective music in eyes-closed and eyes-open states: an electroencephalography study. Frontiers in Psychology, 6, 1160. [SSCI, IF=2.560]

Tsai, C.G., Chen, C.C., Wen, Y.C., & Chou T.L.* (2015). Neuromagnetic brain activities associated with perceptual categorization and sound-content incongruency: a comparison of music and speech. Frontiers in Human Neuroscience, 9, 455. [SCI, IF=3.626]

Tzeng, N.S., & Tsai, C.G.* (2015). Dutuo and salvation in Beijing Opera Peng-Bei (Tragic Monument in Yang’s Saga) and Nan-Tien-Men (South Heavenly Gate): a study from the perspectives of psychiatry and audience psychology (in Chinese). Taipei Theatre Journal, 22, 25-50.

Tsai, C.G., & Tzeng. N.S.* (2015). Music therapy for the elderly: perspectives from cognitive neuroscience (in Chinese). Journal of Humanities, Social Sciences and Medicine, 2, 87-106.

Tsai, C.G.*, Chen, R.S., & Yu, S.P. (2014). Analyzing the verse-chorus form: schema shifts and musical rewards in lyrical-slow songs (in Chinese). Research in Applied Psychology, 61, 239-286.

Tsai, C.G.*, Chen, R.S., & Tsai, T.S. (2014). The arousing and cathartic effects of popular heartbreak songs as revealed in the physiological responses of listeners. Musicae Scientiae, 18(4), 410-422. [SSCI, IF=1.537]

Tan, W.H., Tsai, C.G., Lin, C., & Lin, Y.K.* (2014). Urban canyon effect: storm drains enhance call characteristics of the Mientien tree frog. Journal of Zoology, 294(2), 77-84. [SCI, IF=1.545] [reports: Nature, bioforum.tw]

Yang, I.H., & Tsai, C.G.* (2014). Plucking positions on the guzheng strings: timbral analysis and performance practice (in Chinese). Yin Yue Yan Jiu, 19, 1-30.

Tsai, C.G.* (2014). The emotional expressions and structure in Beijing opera Pong-Yin: combining performance analysis with audience’s physiological measures (in Chinese). Journal of Traditional Chinese Theater, 11, 125-161.

Chen, I.P.*, Lin, Z.X., & Tsai, C.G. (2013). A felt-emotion-based corpora of music emotions (in Chinese). Chinese Journal of Psychology, 55(4), 571-599. [TSSCI]

Tsai, C.G.* (2013). Relationships between musical emotions and music cognition: dialogues between aesthetics and psychology (in Chinese). Journal of Xinghai Conservatory of Music, 2013.2, 120-127.

Tsai, C.G.*, & Chen, R.S. (2012). Desire, resolution, and reward system: listeners’ emotional responses to musical cadences (in Chinese). Journal of National Taiwan University of Arts, 90, 325-345.

Tsai, C.G., Fan, L.Y., Lee, S.H., Chen, J.H., & Chou, T.L.* (2012). Specialization of the posterior temporal lobes for audio-motor processing – evidence from a functional magnetic resonance imaging study of skilled drummers. European Journal of Neuroscience, 35(4), 634–643. [SCI, IF=3.658]

Yang, W.C., & Tsai, C.G.* (2011). Telling the red myth with western music: the function and practice of musical schema shifts in model Beijing operas (in Chinese). Taipei Theatre Journal, 13, 131-157.

Tsai, C.G.*, Chen, C.C., Chou, T.L., & Chen, J.H. (2010). Neural mechanisms involved in the oral representation of percussion music: an fMRI study. Brain and Cognition, 74(2), 123-131. [SCI & SSCI, IF=2.547]

Tsai, C.G.* (2010). The song forms in cultures of humpback whales and songbirds: interdisciplinary perspectives of biomusicology (in Chinese). Huangzhong-Journal of Wuhan Music Conservatory, 2010.4, 129-134.

Tsai, C.G., Chen, C.L.*, & Lee, J.W. (2010). Literature soundscape in the museum: on the roles and functions of sound elements in literature exhibitions (in Chinese). Museology Quarterly, 24(1), 93-115.

Tsai, C.G.*, Wang, L.C., Wang, S.F., Shau, Y.W., Hsiao, T.Y., & Auhagen, W. (2010). Aggressiveness of the growl-like timbre: acoustic characteristics, musical implications, and biomechanical mechanisms. Music Perception, 27(3), 209-221. [SSCI, IF=1.068]

Tsai, C.G.* (2009). The Taiwanese horned fiddle: an example of exaptation of musical instruments (in Chinese). Huangzhong-Journal of Wuhan Music Conservatory, 2009.4, 129-134.

Tsai, C.G., Chen, J.H., Shau, Y.W., & Hsiao, T.Y.* (2009). Dynamic B-mode ultrasound imaging of vocal fold vibration during phonation. Ultrasound in Medicine & Biology, 35(11), 1812-1818. [SCI, IF=2.395]

Tsai, C.G.* (2009). Impure musical sounds: auditory model and harmonic-to-noise ratio (in Chinese). Guandu Music Journal, 10, 113-125

Tsai, C.G.* (2009). From propaganda to dramatic ornaments: arias and divertissements in modern Beijing operas in 1958-1976 (in Chinese). Taipei Theatre Journal, 10, 113-147.

Tsai, C.G.* (2008). String vibration with nonlinear boundary condition: an acoustical study of “blossoming tones” produced by the junhu (in Chinese). Huangzhong-Journal of Wuhan Music Conservatory, 2008.4, 168-173.

Tsai, C.G.* (2008). Madness by romantic identification: Brain diseases in Xiqu (in Chinese). Journal of Chinese Ritual, Theatre and Folklore, 161, 83-133. [TSSCI]

Tsai, C.G., Shau, Y.W., Liu, H.M., & Hsiao, T.Y.* (2008). Laryngeal mechanisms during human 4 kHz vocalization studied with CT, videostroboscopy, and color Doppler imaging. Journal of Voice, 22(3), 275-282. [SCI, IF=0.953]

Tsai, C.G.*, & Lin, Y.Y. (2008). Contributions of epilepsy research to the psychology of music (in Chinese). Journal of Xinghai Conservatory of Music, 2008.1, 31-37.

Tsai, C.G.* (2007). When Beijing Opera actors meet Beiguan Opera: an impartation project for Beiguan Opera by Xiao-Yiao Theater (in Chinese). Journal of Culture Resources, 3, 75-94.

Tsai, C.G.* (2006). Disease and composing: syphilis in Smetana, Wolf, and Schubert (in Chinese). Formosan Journal of Music Research, 3, 91-106.

Tsai, C.G.* (2006). Towards the cognitive psychology of Xiqu music: examples from Xi-Mei-Fong-Yun and Da-Tzei-Men (in Chinese). Performing Arts Journal, 12, 159-172.

Tsai, C.G.* (2005). Chaotic behavior of performers’ vocalizations: an interdisciplinary study of growl voices (in Chinese). Taipei Theatre Journal, 2, 39-62.

Tsai, C.G.* (2004). Absolute pitch: studies in cognitive psychology (in Chinese). Guandu Music Journal, 1, 77-92.

Tsai, C.G.* (2000). Fu-Lu Sheng-Qiang of Taiwanese Luan-Tan-Xi belongs to Luan-Tan-Qiang system: evidence from tunes and repertory (in Chinese). Journal of Chinese Ritual, Theatre and Folklore, 123, 43-88.

Tsai, C.G.* (1997). A comparison of Chinese Nan-Xi and opera comique: the structure of He-To and vaudeville final (in Chinese). Arts Review, 8, 163-185.

Tsai, C.G.* (1997). A preliminary study on music of Luan-Tan Xiao-Xi (in Chinese). Journal of Chinese Ritual, Theatre and Folklore, 106, 1-29.

[Chinese version / Home]

email :
tsaichengia@ntu.edu.tw

biography of Prof. Dr. TSAI Chen-Gia

院) Musicology (音樂學研究所) CHEN-GIA TSAI (蔡振家)
tsai chengia.jpg

Musicology (音樂學研究所)

JEN-YEN CHEN (陳人彥)
TUNG SHEN (沈冬)
CHEN-GIA TSAI (蔡振家)
YING-FEN WANG (王櫻芬)
YUH-WEN WANG (王育雯)
FUMITAKA YAMAUCHI (山內文登)
CHIEN-CHANG YANG (楊建章)

CHEN-GIA TSAI (蔡振家)
ORCID
teacher.alert

Musicology (音樂學研究所)
Associate Professor (副教授)
+886-2-3366-4691
tsaichengia@ntu.edu.tw
Website

teacher.Profile
teacher.Publications 44

teacher.Profile 2017-12-04 16:09:53
teacher.Education Ph.D. , Systematic Musicology , Humboldt-Universität zu Berlin (Humboldt University of Berlin) , Berlin , Germany, Federal Republic of Germany , 2004
teacher.CareerAndExperience teacher.CurrentPositions:

2011- Now, Associate Professor, Graduate Institute of Musicology, National Taiwan University, Taipei, Taiwan, ROC

teacher.Experiences:

2006- 2011, Assistant Professor, Graduate Institute of Musicology, National Taiwan University, Taipei, Taiwan, ROC
2006- 2006, Adjunct Assistant Professor, Graduate Institute of Musicology, National Taiwan University, Taipei, Taiwan, ROC

teacher.ResearchField

Chinese theater music
biomusicology
phoniatics
music acoustics
psychoacoustics

http://ah.ntu.edu.tw/web/Teacher!one.action?tid=2530

Tsai, Chen-gia, biography

Tsai, Chen-gia, biography

Chen-gia_Tsai

Tsai, Chen-gia

Assistant professor, Graduate Institute of Musicology, National Taiwan University

PhD (Musikwissenschaft), Humboldt-Universität zu Berlin

Research Interests

Biomusicology, music cognition, vocal fold dynamics, music acoustics, Chinese opera

Courses Opened

Music of local Xiqu; Music acoustics; Music, evolution and the brain; Feeling and representations of love: linguistic and musicological perspectives

Journal Articles

Tsai, C.G. (2010). The song forms in cultures of humpback whales and songbirds: interdisciplinary perspectives of biomusicology (in Chinese). Journal of Xinghai Conservatory of Music (in press)

Tsai, C.G., Chen, C.C., Chou, T.L., Chen, J.H. (2010). Neural mechanisms involved in the oral representation of percussion music: an fMRI study. Brain and Cognition 74(2): 123-131. [SCI & SSCI, IF=2.547]

Tsai, C.G., Chen, C.L., and Lee, J.W. (2010). Literature soundscape in the museum: on the roles and functions of sound elements in literature exhibitions (in Chinese). Museology Quarterly 24(1):93-115. [THCI]

Tsai, C.G., Wang, L.C., Wang, S.F., Shau, Y.W., Hsiao, T.Y., and Wolfgang Auhagen. (2010). Aggressiveness of the growl-like timbre: acoustic characteristics, musical implications, and biomechanical mechanisms. Music Perception 27(3):209-221. [SSCI, IF=1.714]

Lu, Y.H., and Tsai, C.G. (2009). Importance of motor imagery for music performance: Evidence from neuroscience (in Chinese). Guandu Music Journal 11:75-90.

Tsai, C.G., (2009). The Taiwanese horned fiddle: An example of exaptation of musical instruments (in Chinese). Huangzhong-Journal of Wuhan Music Conservatory 2009.4:129-134. [CSSCI]

Tsai, C.G., Chen, J.H., Shau, Y.W., and Hsiao, T.Y. (2009). Dynamic B-mode ultrasound imaging of vocal fold vibration during phonation. Ultrasound in Medicine & Biology 35(11):1812-1818. [SCI, IF=2.395]

Tsai, C.G. (2009). Impure musical sounds: auditory model and harmonic-to-noise ratio (in Chinese). Guandu Music Journal 10:113-125.

Tsai, C.G. (2009). From propaganda to dramatic ornaments: arias and divertissements in modern Beijing operas in 1958-1976 (in Chinese). Taipei Theatre Journal 10:113-147. [THCI]

Tsai, C.G. (2008). String vibration with nonlinear boundary condition: an acoustical study of “blossoming tones” produced by the junhu (in Chinese). Huangzhong-Journal of Wuhan Music Conservatory 2008.4:168-173. [CSSCI]

Tsai, C.G. (2008). Madness by romantic identification: Brain diseases in Xiqu (in Chinese). Journal of Chinese Ritual, Theatre and Folklore 161:83-133. [TSSCI]

Tsai, C.G., Shau, Y.W., Liu, H.M., and Hsiao, T.Y. (2008). Laryngeal mechanisms during human 4 kHz vocalization studied with CT, videostroboscopy, and color Doppler imaging. Journal of Voice 22(3):275-282. [SCI, IF=1.143]

Tsai, C.G., Lin, Y.Y. (2008). Contributions of epilepsy research to the psychology of music (in Chinese). Journal of Xinghai Conservatory of Music 2008.1:31-37.

Tsai, C.G. (2007). When Beijing Opera actors meet Beiguan Opera: An impartation project for Beiguan Opera by Xiao-Yiao Theater (in Chinese). Journal of Culture Resources 3:75-94.

Tsai, C.G. (2006). Disease and composing: Syphilis in Smetana, Wolf, and Schubert (in Chinese). Formosan Journal of Music Research 3:91-106.

Tsai, C.G. (2006). Towards the cognitive psychology of Xiqu music: Examples from Xi-Mei-Fong-Yun and Da-Tzei-Men (in Chinese). Performing Arts Journal 12:159-172.

Tsai, C.G. (2005). Chaotic behavior of performers’ vocalizations: an interdisciplinary study of growl voices (in Chinese). Taipei Theatre Journal 2:39-62.

Tsai, C.G. (2004). Absolute pitch: studies in cognitive psychology (in Chinese). Guandu Music Journal 1:77-92.

Tsai, C.G. (2000). Fu-Lu Sheng-Qiang of Taiwanese Luan-Tan-Xi belongs to Luan-Tan-Qiang system: evidence from tunes and repertory (in Chinese). Journal of Chinese Ritual, Theatre and Folklore 123:43-88.

Tsai, C.G. (1997). A comparison of Chinese Nan-Xi and opera comique: the structure of He-To and vaudeville final (in Chinese). Arts Review 8:163-185.

Tsai, C.G. (1997). A preliminary study on music of Luan-Tan Xiao-Xi (in Chinese). Journal of Chinese Ritual, Theatre and Folklore 106:1-29.

Conference Papers

Tsai, C.G. (2010). Oral representations of Beijing opera percussion music and jazz drum music: fMRI studies (oral). 「迎向21世紀台灣音樂學:全球化與跨文化」研討會,11月30日至12月2日,國立臺北藝術大學,臺灣

Chen, C.L., and Tsai, C.G. (2010). 〈博物館中的文學風景:台灣文學博物館發展與展示內涵之研究〉(oral). 「博物館展示的景觀」研討會,11月18-19日,國立臺北藝術大學,臺灣

Chen, I.P., and Tsai, C.G. (2010). Emotional attributes of music (oral). 「情緒標準刺激與反應常模的基礎研究」99年度計畫研討會,11月6日,國立中正大學,臺灣

Wang, L.C., and Tsai, C.G. (2010). Beat Perception through body movements: a case study of Nanguan, Beiguan and western classic music (oral). The 3rd International Conference of Students of Systematic Musicology, September13-15, 2010, Cambridge, UK.

Huang, P.L., and Tsai, C.G. (2010). Pitch glide in Chinese small gongs: effects of macrostructure and microstructure. International Symposium on Music Acoustics, 30-31 August, Sydney, Australia.

Tsai, C.G., Bai, M.R. (2010). An acoustical and historical study of the Taiwanese horned fiddle: Exaptation of musical instruments. International Symposium on Music Acoustics, 30-31 August, Sydney, Australia.

Cheng, J.Y., Tsai, C.G. and Lee, S.C. (2010). Bamboos as the material for saxophone reed. 20th International Congress on Acoustics, 23-27 August, Sydney, Australia.

Tsai, C.G., Auhagen, W., and Causse, R. (2009). The nonlinear membrane of Chinese flutes: its impacts on timbre and performance techniques (oral). 5th Conference on Interdisciplinary Musicology (CIM09). October 26-29, Paris, France.

Tsai, C.G. (2009). Possible impact of brain-imaging technology on the psychology of Asian music (oral). CUHK-NTU Music Forum 2009, 2-3 Jan 2009, Hong Kong, China.

Tsai, C.G. (2008). Emotional contents of the growl-like timbre: a study of biomechanics (oral). Taiwan Symposium on Musicology 2008, Tainan, Taiwan.

Tsai, C.G., Hsiao, T.Y., Shau, Y.W., and Wang, S.F. (2008). Aggressiveness of the growl-like timbre: acoustical features and biomechanical mechanisms (oral). 10th International Conference on Music Perception and Cognition, 25-29 August 2008, Sapporo, Japan.

Chen, J.H., Chang, M.D., Tsai, C.G., Hsiao, T.Y., and Shau, Y.W. (2008). On the application of PIV algorithms to the analysis of ultrasound images of vocal fold tissues during phonation. 13th International Symposium on Flow Visualization, Nice, France, July 1-4, 2008.

Tsai, C.G. (2008). Oral transmission of music: roles of the mirror neuron system in humans and humpback whales (oral). Mini-Symposium on Cultural Evolution & Human Ecology, 30 May, Taipei, Taiwan.

Tsai, C.G. (2007). Cognitive mechanisms revealed by some forms of animal song: chunking, working memory, and self-associative memory (oral). Taiwan Symposium on Musicology 2007, December 14-15, Taipei, Taiwan.

Tsai, C.G., Chen, J.H., Hsiao, T.Y., and Shau, Y.W. (2007). A seawater-seabed model of vocal fold vibration: in-vivo measurements of amplitude attenuation and phase lag (oral). International Symposium on Musical Acoustics, 9-12 September, Barcelona, Spain.

Tsai, C.G., Chen, C.C., Chen, D.Y., Chou, T.L., Chen, C.H., Lee, C.W. (2007). Musical memes and oral tradition: the role of an auditory mirror system in music transmission and cognition (oral). Music and Evolutionary Thought Conference, June 22-23, Durham, England.

Tsai, C.G. (2006). Inharmonic sounds of bowed strings in Western music and Beijing Opera (oral). 4th Joint Meeting of the Acoustical Society of America and the Acoustical Society of Japan, 28 November-2 December, Honolulu, Hawaii, USA.

Tsai, C.G., Shau, Y.W., and Hsiao, T.Y. (2006). Vocal fold wave velocity in the cover and body layers measured in vivo using dynamic sonography (oral). 7th International Conference on Advances in Quantitative Laryngology, Voice and Speech Research, October 6-7, 2006, Groningen, the Netherlands.

Tsai, C.G., Hsiao, T.Y., Shau, Y.W. and Chen, J.H. (2006). Towards an intermediate water wave model of vocal fold vibration: Evidence from vocal-fold dynamic sonography (oral). International Conference on Voice Physiology and Biomechanics, July 12-14 2006, Tokyo, Japan.

Tsai, C.G. (2005). Disease and composing: Syphilis in Smetana, Wolf, Schubert (oral). Taiwan Symposium on Musicology 2005, November 11-12, Taipei, Taiwan.

Tsai, C.G., Auhagen, W. (2005). Intonation, tone range and timbre of the Chinese flute (dizi): a Duffing oscillator model of the dizi membrane (oral). Symposium on Traditional Musical Instruments, September 10-11, 2005, Taipei, Taiwan.

Tsai, C.G. (2005). Multi-pitch effect on cognition of solo music: examples of the Chinese flute, Jew’s harp and overtone singing (oral). International Symposium on Body & Cognition, June 4-5, Taipei, Taiwan.

Tsai, C.G. (2004). The timbre space of the Chinese membrane flute (dizi): physical and psychoacoustical effects (invited). 148th Meeting of the Acoustical Society of America, November 15-19, San Diego.

Tsai, C.G., Shau, Y.W., and Hsiao, T.Y. (2004). False vocal fold surface waves during Sygyt singing: a hypothesis (oral). International Conference on Voice Physiology and Biomechanics, August 18-20, Marseille, France.

Chen, J.H., and Tsai, C.G. (2004). Experimental research of the flow field in a brass mouthpiece-like channel using Particle Image Velocimetry (poster). Proceedings of the International Symposium on Musical Acoustics, March 31-April 3, Nara, Japan.

Tsai, C.G. (2004). Auditory grouping in the perception of roughness induced by subharmonics: empirical findings and a qualitative model (oral). Proceedings of the International Symposium on Musical Acoustics, March 31-April 3, Nara, Japan.

Tsai, C.G. (2004). Helmholtz’s nasality revisited: physics and perception of sounds with predominance of upper odd-numbered harmonics (poster). Proceedings of the International Symposium on Musical Acoustics, March 31-April 3, Nara, Japan.

Tsai, C.G. (2003). Relating the harmonic-rich sound of the Chinese flute (dizi) to the cubic nonlinearity of its membrane (poster). Stockholm Music Acoustics Conference 2003, August 6-9, Stockholm, Sweden.

[Blog / Chinese version]

http://www.gim.ntu.edu.tw/gia/

TSAI Chen-Gia, Ph.D. Acoustics, Taiwan, selectec publications

TSAI Chen-Gia, Ph.D. Acoustics, Taiwan

Chen-gia_Tsai

Vocal fold vibration and singing

* Ultrasonic imaging of vocal folds
* Vocal fold vibration as sea waves on a porous seabed
* Overtone singing & high-frequency vocalization
* Growl voice & spine stability

Chen-Gia Tsai
Assistant Professor, Graduate Institute of Musicology
National Taiwan University, Taipei, TAIWAN

Ph.D., Musikwissenschaft
Humboldt-University Berlin, Germany
Research Interests
Mechanics of the Chinese membrane flute

* Acoustic effects of the dizi membrane
* Linear effects of the membrane: impedance
* Nonlinear effects of the membrane I: jump phenomena and wrinkles in the membrane
* Nonlinear effects of the membrane II: spectral features

Perception of musical sounds

* Brightness and spatial effects
* Helmholtz’s hollowness and nasality
* Roughness induced by subharmonics

Vocal fold vibration and singing

* Ultrasonic imaging of vocal folds
* Vocal fold vibration as sea waves on a porous seabed
* Overtone singing & high-frequency vocalization
* Growl voice & spine stability

Biomusicology

* Absolute pitch
* Music & biological motor system
* Chinese opera music & memetics

Selected Publications
Journal papers

C.G. Tsai (2004) Absolute pitch: studies in cognitive psychology. Guandu Music Journal 1, 77-92.

C.G. Tsai (2005) Chaotic behavior of performer’s vocalizations: an interdisciplinary study of growl voices. Taipei Theatre Journal 2, 39-62.

C.G. Tsai (2006) Disease and Composing: Syphilis in Smetana, Wolf, and Schubert. Formosan Journal of Music Research 3, 91-106.

Chen-Gia Tsai, Yio-Wha Shau, Hon-Man Liu, and Tzu-Yu Hsiao. Laryngeal mechanisms during human 4 kHz vocalization studied with CT, videostroboscopy, and color Doppler imaging (accepted by Journal of Voice)
Conference papers

C.G. Tsai (2003) Relating the harmonic-rich sound of the Chinese flute (dizi) to the cubic nonlinearity of its membrane (poster). Stockholm Music Acoustics Conference 2003, August 6-9.

C.G. Tsai (2004) Helmholtz’s nasality revisited: physics and perception of sounds with predominance of upper odd-numbered harmonics (poster). Proceedings of the International Symposium on Musical Acoustics, March 31-April 3, Nara, Japan.

C.G. Tsai (2004) Auditory grouping in the perception of roughness induced by subharmonics: empirical findings and a qualitative model (oral). Proceedings of the International Symposium on Musical Acoustics, March 31-April 3, Nara, Japan.

J.H. Chen, and C.G. Tsai (2004) Experimental research of the flow field in a brass mouthpiece-like channel using Particle Image Velocimetry (poster). Proceedings of the International Symposium on Musical Acoustics, March 31-April 3, Nara, Japan.

C.G. Tsai, Y.W. Shau, and T.Y. Hsiao (2004) False vocal fold surface waves during Sygyt singing: a hypothesis (oral). International Conference on Voice Physiology and Biomechanics, August 18-20, Marseille, France.

C.G. Tsai (2004) The timbre space of the Chinese membrane flute (dizi): physical and psychoacoustical effects (invited). 148th Meeting of the Acoustical Society of America, November 15-19, San Diego.

C.G. Tsai (2005) Multi-pitch effect on cognition of solo music: examples of the Chinese flute, Jew’s harp and overtone singing (oral). International Symposium on Body & Cognition, June 4-5, Taipei, Taiwan.

C.G. Tsai, W. Auhagen (2005) Intonation, tone range and timbre of the Chinese flute (dizi): a Duffing oscillator model of the dizi membrane (oral). Conference on Traditional Music Instruments, September 10-11, Taipei, Taiwan.

C.G. Tsai (2005) Disease and composing: syphilis in Smetana, Wolf, and Schubert (oral). Taiwan Symposium on Musicology, November 11-12, Taipei, Taiwan.

C.G. Tsai, T.Y. Hsiao, Y.W. Shau, and J.H. Chen (2006) Towards an intermediate water wave model of vocal fold vibration: Evidence from vocal-fold dynamic sonography (oral). International Conference on Voice Physiology and Biomechanics, July 12-14 2006, Tokyo, Japan.

C.G. Tsai, Y.W. Shau, and T.Y. Hsiao (2006) Vocal fold wave velocity in the cover and body layers measured in vivo using dynamic sonography (oral). 7th International Conference on Advances in Quantitative Laryngology, Voice and Speech Research, October 6-7, 2006, Groningen, the Netherlands.

C.G. Tsai (2006) Inharmonic sounds of bowed strings in Western music and Beijing opera (oral). 4th Joint Meeting of the Acoustical Society of America and the Acoustical Society of Japan, 28 November-2 December, Honolulu, Hawaii, USA.
Links

* Music Acoustics Laboratory at UNSW (impedance measurements of the dizi were performed there)
* Mitzi Meyerson’s homepage (my favorite harpsichordist)
* Introduction to the Qin
* Learn traditional Chinese painting
* Liu Fang’s pipa and guzheng music world

[Chinese version]
Latest update: 12/2006

http://homepage.ntu.edu.tw/~gim/gia/index.html

Chen-Gia Tsai : Perception of Overtone Singing

Perception of Overtone Singing : Chen-Gia Tsai

Pitch strength

Voices of overtone-singing differ from normal voices in having a sharp formant Fk (k denotes Kh??mei), which elicits the melody pitch fk = nf0. For normal voices, the bandwidths of formants are always so large that the formants merely contribute to the perception of timbre. For overtone-singing voices, the sharp formant Fk can contribute to the perception of pitch.

A pitch model based on autocorrelation analysis predicts that the strength of fk increases as the bandwidth of Fk decreases. Fig. 1 compares the spectra and autocorrelation functions of three synthesized single-formant vowels with the same fundamental frequency f0 = 150 Hz and formant frequency 9f0. In the autocorrelation functions the height of the peak at 1/9f0, which represents the pitch strength of 9f0, increases as the the formant bandwidth decreases. Fig. 1 suggests that the pitch of fk is audible once the strongest harmonic is larger than the adjacent harmonics by 10 dB.

image002.jpg 1
image003.jpg 2
image004.jpg 3
image005.jpg 4
image006.jpg 5
chen 1

Figure 1: Spectra (left) and autocorrelation functions (right) of three single-formant vowels. Stream segregation

Next to the bandwidth of Fk, the musical context also plays a role in the perception of fk. During a performance of overtone-singing, the low pitch of f0 is always held constant. When fk moves up and down, the pitch sensation of f0 may be suppressed by the preceding f0 and listeners become indifferent to it. On the contrary, if f0 and fk change simultaneously, listeners tend to hear the pitch contour of f0, while the stream of fk may be more difficult to trace.

The multi-pitch effect in overtone-singing highlights a limitation of auditory scene analysis, by which the components radiated by the same object should be grouped and perceived as a single entity. Stream segregation occurs in the quasi-periodic voices of overtone-singing through the segregation/grouping mechanism based on pitch. This may explain that overtone-singing always sounds extraordinary when we first hear it.

Perception of rapid fluctuations

Tuvans employ a range of vocalizations to imitate natural sounds. Such singing voices (e.g., Ezengileer and Borbannadir) are characterized by rapid spectral fluctuations, evoking the sensation of rhythm, timbre vibrato or trill.

Return to Mongolian Khoomii Singing main page
http://www.soundtransformations.co.uk/PerceptioofOvertoneSingingChenGiaTsai.htm

INGE R. TITZE : From Aerosmith to Pavarotti — How Humans Sing

INGE R. TITZE : From Aerosmith to Pavarotti — How Humans Sing

From Aerosmith to Pavarotti — How Humans Sing
How Does The Singer’s Voice Produce Those Amazing Sounds?

By Ingo R. Titze

inge titze

Overview
The Human Instrument
Infographic
How Instruments Make Music

Although the human vocal system is small, it manages to create sounds as varied and beautiful as those produced by a variety of musical instruments. The question is: How can singers produce all those remarkable sounds?
All instruments, including our singing voices, have a sound source, a resonator that reinforces (amplifies) the basic sound and a radiator that transmits the sound to listeners. In people, the source is vibrating vocal folds (vocal cords) of the larynx or voice box; the resonator is the sound-boosting airway above the larynx; and the radiator is the opening of the mouth.
The human voice can create an impressive array of sounds because it relies on non-linear feedback by which a small input can result in a disproportionately large output. One of the voice’s more effective nonlinear mechanisms is inertive reactance, whereby singers create special conditions in their vocal tract to amplify sounds generated by the vocal folds.
To better understand the complex phenomena that produce the incredible sounds acclaimed vocalists demonstrate in the following sound clips and elsewhere, take a look at my article—The Human Instrument—in the January issue of Scientific American.
SOUND CLIPS Steven Tyler
Steven Tyler, lead singer of the rock band Aerosmith, is celebrated for his ability to scream tunefully. Here, he produces several interesting vocal effects. Tyler first uses some inharmonic (noise-like) sounds to match the timbre of his voice to percussive instruments. He also demonstrates a “flip” into falsetto register, but later employs a bright vowel on the word “same” to continue his belt-like voice (as in “belt” it out) into a high pitch.

Georgia Brown
Georgia Brown is a Brazilian pop singer who is noted for her wide vocal range (eight octaves) and is thus classified as a full dramatic coloratura soprano. In this example, she is likely using inertive reactance in her vocal tract to reinforce a very high-pitched whistle voice that she creates with her vocal folds. No vowels are heard because the pitch sits above the first two vocal-tract resonances that define (perceptually) what a vowel is.

Rollin Rachele
Rollin Rachele is one of the world’s foremost overtone singers, a technique in which a person vocalizes two notes simultaneously. Overtone singing and related techniques are most widely recognized in the Tuvan, Mongolian and Tibetan cultures. Rachele never uses the fundamental frequency to change pitch. Rather, he maintains the fundamental frequency as a constant drone, then applies varying vocal tract shapes to resonate a single harmonic of this drone at any one time. By skipping from harmonic to harmonic he can play a tune with these high frequencies, also known as overtones.

1 2 Next »

Joan Sutherland
Dame Joan Sutherland, the renowned Australian operatic soprano, knew instinctively that some vowels cannot be used when singing certain pitches. In this case, she uses a less open mouth shape in her middle pitch range than she does in her high pitch range. One vowel, for instance, sounds more like “oh” in the middle and “ah” at the top. Sutherland alternates between an inverted megaphone (horn-like) shape and a megaphone shape in these vowels to reinforce the sonic energy produced at the vocal folds.

Ethel Merman
On stage, Broadway musical star Ethel Merman belted out songs with precise enunciation and pitch so audiences could hear her even without amplification. Here, she uses bright vowels with high first-resonance frequency to make optimal use of inertive reactance. Pay particular attention to the vowels she uses in “everything,” “roses,” “for” and “me.” The vowels all suggest that she employs the horn-like megaphone vocal-tract shape. But unlike Joan Sutherland, Merman uses the megaphone shape in the middle of her pitch range to reinforce the second harmonic. Sutherland, in contrast, makes use of the megaphone shape only on very high notes to reinforce the first harmonic. Neither female vocalist sings true speech-like vowels.

Luciano Pavarotti
Luciano Pavarotti, the recently deceased Italian operatic tenor, is famed for the brilliance and beauty of his tone. In this example, he uses a vocal production in his high notes that is similar to that which Ethel Merman uses in her mid- to high-pitch range. The male high voice has a strong second harmonic as does the female belt voice. But Pavarotti widens his pharynx (the airway above the larynx) more, producing an additional ring in the voice, while downplaying the more typical twanging sound. As far as timbre is concerned, ring sounds match better with bowed string and woodwind instruments, whereas twanging sounds match better with brass and percussion instruments.

Audio file manifest ETHEL MERMAN – Female belt voice
Clip 1: Minutes 1:35 to 1:51
APA style ref: Styne, J., Sondheim, S. (1959). Everything’s Coming Up Roses [Recorded by E. Merman, S. Black, London Festival Orchestra & Chorus]. On Merman Sings Merman [CD]. London, England: Decca (1972, reissued 2004)
Clip 2: Minutes 2:25 to 2:47
APA style ref: Styne, J., Sondheim, S. (1959). Everything’s Coming Up Roses [Recorded by E. Merman, S. Black, London Festival Orchestra & Chorus]. On Merman Sings Merman [CD]. London, England: Decca (1972, reissued 2004)

Clip 3: Minutes 1:50 to 2:22
APA style ref: Porter, C. (1934). I Get a Kick Out of You [Recorded by E. Merman, S. Black, London Festival Orchestra & Chorus]. On Merman Sings Merman [CD]. London, England: Decca (1972, reissued 2004)
JOAN SUTHERLAND – Female operatic voice
Clip 1: Minutes 5:30 to 5:50
APA style ref: Bellini, V. (1831). Casta diva [Recorded by J. Sutherland, R. Bonynge, London Symphony Orchestra & Chorus]. On Joan Sutherland: The Greatest Hits [CD]. London, England: Decca (1998)
Clip 2: Minutes 4:03 to 4:24
APA style ref: Gounod, C. (1859, Rev. 1869). O Dieu! Que de bijoux! …Ah! je ris de me voir si belle (Jewel song) [Recorded by J. Sutherland, F. Molinari-Pradelli, Orchestra of the Royal Opera House, Covent Garden]. On Joan Sutherland: The Greatest Hits [CD]. London, England: Decca (1998)
STEVEN TYLER – Male rock voice
Clip 1: Minutes 3:24 to 3:45
APA style ref: Aerosmith (1973). Dream On. Aerosmith [CD]. New York: Columbia Records
Clip 2: Minutes 0:56 to 1:22
APA style ref: Aerosmith (1989). Janie’s Got a Gun. Pump [CD]. New York: Geffen Records
LUCIANO PAVAROTTI – Male operatic voice
Clip 1: Minutes 1:45 to 2:06
APA style ref: Verdi, G. (1851). La donna e mobile [Recorded by L. Pavarotti, A. Toscanini, Symphonic Orchestra of Emilia Romagna]. On Luciano Pavarotti in concert [CD]. New York: CBS Records
Clip 2: Minutes 1:20 to 1:35
APA style ref: Verdi, G. (1851). Questa o quella [Recorded by L. Pavarotti, A. Toscanini, Symphonic Orchestra of Emilia Romagna]. On Luciano Pavarotti in concert [CD]. New York: CBS Records
ROLLIN RACHELLE – (Male) Overtone singing (improperly referred to as “throat singing”)
Clip 1: 20 seconds long
APA style ref: Rachelle, R. (1995). Track 18. Overtone Singing Study Guide [Book/CD]. Amersterdam, Netherlands: Cryptic Voices Productions
GEORGIA BROWN – (Female) Whistle voice (not a person whistling!)
Clip 1: Seconds 0:08 to 0:21
Ref: Recording of Georgia Brown: sound clip from:www.dutchdivas.net/nighC.html (link to http://escravosdegeo.sites.uol.com.br/index1.htm) last accessed 12/05/07.
Clip 2: Minutes 0:54 to 1:00
Ref: Recording of Georgia Brown: sound clip from:www.dutchdivas.net/nighC.html (link to http://escravosdegeo.sites.uol.com.br/index1.htm) last accessed 12/05/07.

« Prev 1 2
http://www.scientificamerican.com/article.cfm?id=sound-clips-human-instrument&page=2

Malte KOB: Analysis and modelling of overtone singing in the sygyt style

Abstract

Overtone singing, also called biphonic singing, xöömij or chant diphonique in french; is a special singing style that exhibits two or more separate sounds – one “drone” sound of relatively low pitch and one or more high pitch melody sounds. The perceived pitches of the upper tones are multiples of the drone sound, i.e. taken from its overtone scale. Compared to voiced sounds of western style singers, the relative amplitude of the melody pitches is quite high, and the formant bandwidth of overtone sounds is small. This paper tries to answer the question of how these formant properties are achieved. Experimental investigations and numerical calculations prove the existence of two closely neighboured formants for the production of the melody sound in the sygyt style.

 

 

Keywords

Overtone singing
Impedance measurement
Voice modelling

View full text

  • Saturation mechanism in clarinet-like instruments, the effect of the localised non-linear losses

    Applied Acoustics, Volume 65, Issue 12, December 2004, pp. 1133-1154

 

  • Some aspects of the harmonic balance method applied to the clarinet

    Applied Acoustics, Volume 65, Issue 12, December 2004, pp. 1155-1180
  • Some insight into the acoustics of the didjeridu

    Applied Acoustics, Volume 65, Issue 12, December 2004, pp. 1181-1196

 

View more articles

Captures
  • Exports-Saves: 16
  • Readers: 13
Citations
  • Citation Indexes: 5

About ScienceDirectRemote accessShopping cartContact and supportTerms and conditionsPrivacy policy

https://www.sciencedirect.com/science/article/abs/pii/S0003682X04001082