Why the human voice is so versatile

January 19, 2017 by Noel Hanna, The Conversation
What is it that gives us a voice? Credit: Shutterstock/Sergey Nivens

Macaques and baboons – two distantly related primates – are able to produce a similar range of voice-like sounds to humans.

In fact, many animals convey basic information using their but they don't display the full range of vocal abilities available to humans that enables our voice to be used for such a wide range of communication and entertainment.

This suggests that the uniqueness of the is less in the anatomical ability to produce the sounds and more in our ability to precisely coordinate the physical movements, and to process the sounds into meaningful language.

Just how versatile is the human voice?

To get an idea of how versatile our voice is, we can think about how many intelligible sounds we make use of in a language.

Since English spelling is such a mess, this is more clearly illustrated by looking at the pinyin romanisation of Mandarin Chinese.

Using pinyin, Mandarin words can start with one of the following 24 sounds:

b, p, m, f, d, t, n, l, g, k, h, j, q, x, zh, ch, sh, r, z, c, s, w, y or nothing

These can be combined with the following 35 final sounds:

a, ai, an, ang, ao

e, ei, en, eng, er

i, ia, iao, ian, iang, ie, in, ing, iong, iu

o, ong, ou

u, ua, uai, uan, uang, ui, un, uo

ü, üan, üe, ün

The anatomy of the voice. Credit: Noel Hanna, illustration Olivia Cox

This gives us 24×35 = 840 possible distinguishable sounds but each of these can have up to five tones (pitch patterns), which then gives us 840×5 = 4,200 unique words.

In practice, less than half of these words are actually used in the language.

But then you realise that most words in modern Mandarin are compounds of two of these words, so there are say 2,000×2,000 = 4 million possible unique words using this system of pronunciation, which are then strung together to make sentences.

And that is just one language. Each language has its own set of different sounds, which may or may not overlap with other languages.

So how do humans actually produce this variety of ?

How the voice is produced

Voice production can be thought of as a source-filter model. The voice is a combination of a vibrating source that controls its amplitude and pitch (the five tones in the example above), and an acoustic filter that controls how it sounds, much like how you can shape the sound with a graphic equaliser on a sound system.

The source is the vibrating vocal folds situated in the larynx. The filter is the airway that runs from the vocal folds to the lips or nostrils, which we call the vocal tract.

In the above image, the larynx (voice box) comprises the epiglottis to the cricoid cartilage. The thyroid cartilage tends to protrude from the neck in men and is called the Adam's apple.

Source: controlling pitch and amplitude

The vocal folds are two flaps of flesh that vibrate around 100-300 times per second (Hz) in speech.

The widely used name "" came about from French anatomist Antoine Ferrein's analogy that the air acted like a bow playing the strings (cordes in French) of the viola da gamba, or even a feather plucking the strings of a harpsicord.

While these analogies aren't very accurate, understanding the physics of vocal fold motion is still an active area of research, since experiments are so difficult. Observing the vocal folds is possible but not always practical. We can look at them but only from above – and even that isn't very comfortable.

In this example (video, above) the camera frame rate does not allow us to see the vibration of the vocal folds, but high speed video that shows the vibration is possible (video, below).

The vocal fold vibration isn't an on-off twitching of muscles, instead it is caused by the air that is passed over the vocal folds from the lungs. The frequency of vibration and its amplitude are controlled by a combination of pressure supplied by the lungs, the shape of the gap between the folds (the glottis), and the tension supplied by muscles in the larynx.

Learning to use all of these voice controls doesn't come easily – ask any teenage boy. Even singers take years to master the independent control of pitch and volume, which is put to the test by a practice a technique called messa di voce.

Filter: controlling articulation

Speech sounds, such as vowels and consonants, are determined by the vocal tract, which changes shape by moving the articulators (tongue, lips, soft palate, etc.) to filter the sound produced by the vocal folds.

Magnetic resonance imaging can give us a more detailed picture of the range of movement happening in the vocal tract (video, above) but it is difficult to get three-dimensional information and it doesn't help us to see what the vocal folds are doing.

Although it is obviously more complicated, for a physicist, the vocal tract is something like a cylinder. It is a resonant system that is closed (or almost closed) at the vocal folds and open at the mouth.

A resonant system allows standing waves to form. In the vocal tract the standing waves, or resonances, occur when the pressure is high at the vocal folds and low at the mouth.

The sound produced by the at frequencies close to these resonances will be more noticeable. These more noticeable frequencies are called formants and they distinguish different vowel sounds.

For a 17cm long cylinder (about the length of a man's vocal tract) the first two acoustic resonances occur around 500Hz and 1,500Hz, close to what you would recognise as the vowel in the word "heard".

If you change the shape from a cylinder to more realistic geometries (by mimicking the effect of moving the articulators) then you change the position of the resonances, and therefore the vowel.

It may seem like a gross simplification to think of the vocal tract as a cylinder but in terms of acoustics this simple model allows us to determine the energy loss in the vocal tract.

Simplifying the respiratory airway helps us to understand its resonant properties. Credit: Noel Hanna

It also gives us information about how rigid the walls of the tract are, as shown in a paper by me and colleagues, which is important for producing plosive sounds like "p" and "b".


So if all humans (and some primates) can produce such a wide range of sounds, why do we have accents when we learn foreign languages?

Surely, if I want to learn Mandarin, I just need to train myself to produce those 2,000 sounds mentioned earlier. It would be almost like a form of physical exercise. The problem is our brains tend to categorise similar sounds. This hinders us in producing and perceiving sounds that do not fit into these categories.

For example, the French words for "above" and "below" ("dessus" and "dessous") tend to sound the same to untrained English speakers. When we learn French, our brain must be taught to separate "u" and "ou" into two new categories, where previously there was only one.

So if our brains can't distinguish finely enough between the different sounds, could we use our understanding of to improve language learning? Seeing the articulators inside our in action is one idea that could help.

This example video shows a system that provides visual feedback on tongue and mouth movements to train pronunciation. Perhaps understanding voice production could help us to make those tricky new sounds when learning a language, and to further increase the versatility of our voice.

Explore further: Baboon vocalizations contain five vowel-like sounds comparable to those of human speech

Related Stories

Home remedies: helping a hoarse voice

December 29, 2016

You've likely had days when your voice sounds excessively husky, raspy or weak. You may have even lost your voice for a short time laryngitis. Laryngitis is an inflammation of your voice box (larynx) from overuse, irritation ...

Beatboxing poses little risk of injury to voice

December 23, 2013

You might think that beatboxing, with its harsh, high-energy percussive sounds, would be harder on the voice than the sweet song of a soprano. But according to new research by voice expert Dr. H. Steven Sims of the University ...

Recommended for you

Termite queen, king recognition pheromone identified

March 19, 2018

Researchers at North Carolina State University have for the first time identified a specific chemical used by the higher termite castes—the queens and the kings—to communicate their royal status with worker termites. ...

Making intricate images with bacterial communities

March 19, 2018

Working with light and genetically engineered bacteria, researchers from Stanford University are able to shape the growth of bacterial communities. From polka dots to stripes to circuits, they can render intricate designs ...

New life form answers question about evolution of cells

March 19, 2018

Bacteria and Archaea are two of the three domains of life. Both must have evolved from the putative last universal common ancestor (LUCA). One hypothesis is that this happened because the cell membrane in LUCA was an unstable ...

Research signals arrival of a complete human genome

March 19, 2018

It's been nearly two decades since a UC Santa Cruz research team announced that they had assembled and posted the first human genome sequence on the internet. Despite the passage of time, enormous gaps remain in our genomic ...


Please sign in to add a comment. Registration is free, and takes less than a minute. Read more

Click here to reset your password.
Sign in to get notified via email when new comments are made.