Voice scientists and an increasing number of voice teachers and vocal pedagogy students utilize a cheap and widely available tool called a spectrograph. A spectrograph is a software that analyzes sound, splits it into its component harmonics, and displays this information visually in an image called a spectrogram. A spectrogram is, then, a visual representation of sound. Some people call them “voice prints,” which is nicely evocative. It is read much like printed music; times passes from left to right, and pitch is indicated from low (bottom) to high (top) (see figure 1). This article will teach you a little about what a spectrogram is, and what we can learn from it.
What We See and What it Means
The thing to know about the human voice is that it produces harmonics in multiples of the fundamental frequency (the harmonic that we hear as the pitch). Each pitch has a measurable frequency (measured in Hertz, or vibrations per second, abbreviated “Hz”). If the fundamental is 100Hz (a little sharp to G2), the harmonics will be 100Hz x 2=200Hz, 100Hz x 3=300Hz, 100Hz x 4 = 400Hz, &c. The relative strengths of the harmonics we hear color the sound (is it bright, dark, this or that vowel), but we always perceive the pitch based on the fundamental frequency, called H1 (short for harmonic 1). The second harmonic is H2, &c. If you’d like to learn more about the pitches of these harmonics (called the harmonic series), and how they reflect a fundamental law of the vibration of matter in the universe (which is amazing, I think), click here. What I want you to notice today is that as the pitch increases, the space between each harmonic also increases; the gaps of black silence get bigger (see figure 2). This has to do with the logarithmic (rather than linear) nature of pitch frequency. You can read about this here if you like. Or, just trust me that the take-away is that low notes have more closely packed harmonics than high notes. Later, this will help to explain why sung low notes sound rich, sung high notes sound far less complex, and the distance between a half-step in your high range may feel larger than one in your low range.
You will notice in figures 1 & 2 that two things change as we move from left (past) to right (future). First, all the lines move up and then down. This is because the fundamental pitch is rising then falling, and all the harmonics are generated by the same sound source – the vocal folds. The vocal folds vibrate as a whole (the fundamental pitch) and in parts (the other harmonics). As the vocal folds adjust to vibrate the fundamental frequency faster, every harmonic proportionally increases in frequency (sounds at a higher pitch). Your brain groups these sounds together into a single concept – the fundamental frequency plus the color of the other harmonics. This is because of the Law of Common Fate, an aspect of Gestalt Psychology that, “…dictates …[that the brain perceives]… that objects that move together are likely to be connected.”1 You will notice that there is a similar (common) motion of both pitch (the overall ascent and descent of the harmonics) and vibrato (the pervasive squiggly quality of each individual line). Add another singer with the exact same vibrato (lined up perfectly), and the voices would be indistinguishable from one another. Add someone singing straight tone, with a different rate of vibrato, or a different pitch, and the brain separates the voices easily. To a certain extent, this helps to explain why straight tone is encouraged by so many choral directors seeking a “blended” sound. Second, different lines are different colors. The color indicates loudness (amplitude), and follows a “hot colors” are “louder” convention. In order from quiet to loud, the color palette in these images is: black, dark blue, light blue, yellow, red.
Formants, Vocal Tract Resonances, and Hot and Cold Areas of Sound
The patterns of hot and cold (loud and quiet) harmonics imply hot zones of the voice. We call these vocal tract formants, or resonances. (The vocal tract is the resonating tube from your vocal folds to your lips, nostrils, or sometimes both.) Depending on who is speaking, formant theory is the most important discovery since the trill, far too complicated to think about as you teach or sing, or something in between. Admittedly, at least based on how the concept is currently explained, there is a high knowledge threshold required before formant theory becomes truly practical for most singers. A recent informal Facebook poll of my singing and voice teaching friends revealed a wide divergence in both the basic understanding and practical application of formant theory. I will write more about this in later articles. For now, in the context of learning to read a spectrogram, simply notice that all harmonics are not equally loud and consider the reason. If you follow a specific harmonic from left to right, you will notice that it becomes louder when inside, and quieter when it rises above or falls bellow the white rectangles (see figure 3).
You could think about it like a venetian blind with a few missing pieces (see figure 4).
In the venetian blind analogy (see figure 4), a formant (or vocal tract resonance) is a region of light (pitch) where the venetian blind panel is missing. That area will be brighter (louder), while areas covered by a blind panel will be dimmer (quieter). As pitch rises, and thus the pitch of all the harmonics rise, different harmonics will peek through the gaps in the blinds. When they do, they get louder (often a LOT louder). In elite singers, sometimes the upper harmonics are even louder than the fundamental pitch (H1). This is called “formant tuning,” and is a basic strategy of resonant singers (especially when singing in their highest range) whether they realize it or not.
Formants and Vowels
Formants (or specific resonances of the vocal tract – think the white rectangles in figure 3) have a profound impact on the way that we perceive the sound produced by the vocal folds. We call the result of a voiced sound filtered through specific formants “vowels.” How the shape of the vocal tract produces these formants is fascinating, but a little complicated for our purposes. Suffice it to say, they are here to stay. In fact, any time you speak, you intuitively move your first and second formants to create intelligible vowels. If you would like to get an instant sense of the filtering power of the vocal tract, check out this video. The buzzing sound is a good approximation of the raw sound generated by the vocal folds.
All of the above examples show one vowel sung on changing pitches (the formants stay in the same place even as the pitch goes up and down). The opposite is to sing the same pitch and move the formants around. Listen to, and look at (see figure 5) this example:
The pitch remains the same (the bottom line), however, as the vowels change you can see the formants – marked from low to high F1 (short for “formant 1”) through F5 – move independent of one another (see figure 6).
Play the audio again and specifically listen for the whistling overtone following the rise and fall of F2. What sounds like a rising and then falling pitch is actually the second formant (F2) of the vocal tract amplifying progressively higher, then lower harmonics. This is easiest to hear on [e] and [i]. Since the pitch doesn’t change, the harmonics themselves never move, just the formants.
This article has introduced some basic ideas about both the sound of the vocal folds (when filtered by the vocal tract), and what that sound looks like when analyzed by a spectrograph. This is just the beginning. While the implications of formant theory are profound, and allow one to both explain and predict the complex resonance and registration behavior of the voice, the field is not yet completely explored. It is tempting to say that all singers should use spectrograms to train themselves to be resonant singers. This can be done, though serious care must be taken to prevent the technology from interfering with the singer’s process in other ways. I believe the greatest potential of this technology lies in training the voice teacher to systematically recognize successful (and unsuccessful) resonance strategies as predictable timbres of his or her student’s voice. Study with a spectrograph, in effect, trains you to not need the spectrograph in your daily teaching practice. At the New England Conservatory of Music, we use spectrographs (and other computer-based tools) to clarify what we hear, compare singers objectively, and study the implications of repeatable technical choices. This is an exciting time to study vocal pedagogy, when such tools, and the means to understand how to use them, are available to all.
What to do Next
- If you want to read more about the acoustics of the singing voice, start with: Practical Vocal Acoustics by Kenneth Bozeman (I receive no commission or incentive if you buy this book.) He clearly describes formants and harmonics, and explains how to apply formant theory in the voice studio.
- If you want to learn more about the Vocal Pedagogy Program at The New England Conservatory of Music, and the classes taught by Ian Howell, click here.
Spectrograms were created using VoceVista v.3.3
1. Roger Shepherd, “Cognitive Psychology and Music,” in Music, Cognition, and Computerized Sound, ed. Perry R Cook (Cambridge: The MIT Press, 1999), 33.