Organic production of speech sounds in phonetics

This section of the English Phonetics Academy designed to provide guidance and practice in English pronunciation through the use of phonetics. The exercises and drills included will aid in reducing accents and improving overall speech sound production. These lessons are particularly beneficial for non-native speakers looking to improve their English-speaking skills.

5. The production of speech

Before we can describe the way the vowels and consonants of GA are pronounced, we need to know what we actually do when we speak. How are speech sounds actually made? Click on the menu items on the right for an answer to this question.

5.1 The lungs and the larynx

Before we begin to speak, we first breathe in, taking in sufficient air to produce an utterance of reasonable length. Instead of simply letting go of the muscular tension and allowing our lungs to collapse, pushing the air from our lungs (which is what we could do if we were breathing normally), we slowly ease up on the muscular tension, thereby slowing down the exhalation phase. This artificially extended period of pressure from our lungs is used to produce speech. Because the airstream we are using is generated with the help of the lungs, speech sounds produced in this way are called pulmonic, and because it is the outgoing rather than the incoming air we use, these speech sounds are called egressive. All speech sounds of GA and AN are pulmonic egressive. The amount of speech produced with a single outgoing airstream is called a breath group.

The first important organ the airstream will meet on its path from the lungs is the larynx (Du: strottenhoofd). The outward part of this organ can be felt – and, especially in men, be seen – at the front of the neck (the Adam’s apple). The larynx is essentially a valve, which can be opened and closed by moving two thickish flaps that run from back to front apart or together (see Fig. 1). These flaps are primarily there to prevent food or saliva from entering the lungs, but because they also have a function in speech they are known as the vocal cords (or vocal folds).

There is an additional valve, called the epiglottis, positioned above the larynx where the root of the tongue begins. It normally points upward, but it flaps down to channel food and saliva into the esophagus – the tube behind the larynx leading to the stomach – when we swallow.

5.1.1 Breath group

The amount of speech produced with a single outgoing airstream is called a breath group.

Read the following nursery rhyme out loud. How many breath groups did you use, and where did they end?

Mary had a little lamb
Whose fleece was white as snow
And everywhere that Mary went
The sheep was sure to go.

In unexcited speech the escape of the air from the lungs is quite slow and steady to prevent the breath group from being too short. Some people tend to use up too much air, which forces them to break up their speech production rather frequently, and to stumble a lot when reading a text. If you find that you do this, practice maintaining the quality of a nasal like [m] or a vowel like [a] for some twenty seconds without straining yourself and without taking an abnormally deep breath before you start. Then try to read a fairly easy text on the same long breath, again without straining yourself.

5.2 The vocal cords

The aperture between the vocal cords is called the glottis: obviously, no air can pass through when the glottis is closed, and when it is open, air can flow through quite freely.

5.2.1 The open and vibrating glottis

There are many consonants that are produced with the glottis held open, as in ordinary breathing. Such sounds are called voiceless (Du: stemloos), and we hear them because other speech organs, usually the tongue or the lips, are involved in their production. Examples of voiceless sounds are [f] and [ʃ] in GA fish and [t] in AN eten.

When the vocal cords are allowed to vibrate, they produce voice. The closed glottis is subjected to gentle air pressure, sufficient to blow the vocal cords apart, but not strong enough to prevent them from falling together again; when they have fallen together, they are immediately blown apart again, resulting in vibration. Opening and closing cycles typically repeat themselves more than a hundred times per second for the larger and laxer vocal cords of men, and over two hundred times per second for the smaller vocal cords of women and children. You produce voice when saying [mː], a voiced sound; now say [sː], a voiceless sound, and feel the difference. Sonorants and vowels (except when they are whispered) are voiced (Du: stemhebbend).

The AN obstruents /b, d, (g); (v), z, (ʒ), (ɣ)/ are also voiced. The lenis obstruents of GA have voiced allophones. GA fortis obstruents are always voiceless.

Types of voice.

Instead of what is called modal voice, which is produced when no particular adjustments are made, it is possible to produce breathy voice. It is produced when part of the glottis is held open and part is allowed to vibrate. (The same effect is apparently produced when the closing phase of the vibration is not complete, so that air can be allowed to flow through with friction during phonation.) Breathy voice is not uncommon in many other idioms and may be more typical of female than of male speech. In addition, note that AN /h/ is pronounced as a vowel with breathy voice. In English, breathy voice is sometimes jocularly used to create the effect of a sexy voice.

Another special phonation type is creaky voice. It is produced with tensed vocal cords, and sounds as if you can hear the opening actions of the vocal cords separately. (The effect may remind you of the sound produced when you run your fingernail along the teeth of a comb.) Creaky voice is also known as vocal fry, and has been popular for some time now among GA-speaking (young) women who break into creak at the ends of their utterances, when the pitch is very low.

5.2.2 The closed and narrowed glottis

It is possible during phonation to suddenly close the glottis, hold that closure briefly, and then, equally suddenly, allow it to vibrate again. The ’sound’ is known as the glottal stop and is symbolized [ʔ]. Many of us use it in very informal situations to indicate ’no’, going [mʔm], as we may do when addressing children. We all use it to separate the two vowels of beoog ([bəˈʔo.x]). Often voice occurs only before or only after the glottal stop.

It is also possible to bring the vocal cords together to form a narrowing which produces a whisper when air passes through it. Whisper the words light, flight, put, take, make, trip, report and notice how the difference between AN /h/ and /u/ disappears. (This is not surprising, because AN /h/ in this word is normally pronounced as an [u] with breathy voice and AN /u/ is an [u] with normal voice: when we substitute whisper for voice, no difference remains.) While AN /h/is really a vowel said with breathy voice, GA /h/ is really a whispered vowel. GA /h/ is therefore called voiceless h, as opposed to AN /h/, which is called voiced h.

In order to feel how we normally start off with a glottal stop before a word-initial vowel, whisper the words Oven and button, or cat and people: note that the former in each pairs begins gradually, while the latter is preceded by a glottal stop. The symbol [ ̥ ] is used to indicate that a sound is voiceless (open glottis) or whispered (narrowed glottis): the pairs could therefore be transcribed [u̥u̥n̥] and [ʔu̥n̥], [ɛ̥ɛ̥ɪ̥] and [ʔɛ̥ɪ̥].

Summarizing, there are four major states of the glottis:
1. open glottis: voiceless sounds are produced, for example [t] in GA tea, and [s] in GA say.
2. vibrating glottis: voiced sounds are produced (three different types), for example [m] and [a.] in AN maak, [eɪ] in GA say.
3. closed glottis: required for the production of [ʔ], as in AN beoog.
4. narrowed glottis: to produce whisper, and GA /h/.

5.3 Pitch

The number of times the vocal cords open and close per second is called the frequency of vibration, and is expressed in cps (cycles per second). Variations in the frequency of vibration are heard by the listener as variations of pitch (Du: toonhoogte): the more frequently the vocal cords open and close, the higher the pitch.

At the same time, pitch is used to signal the important stresses in an utterance: the position of the up or down movement of the pitch determines where we hear the stressed syllable. If you count óne-two-three-four, óne-two-three-four, etc., with the stress on one, then that syllable will be higher than the others; if you make two the high syllable, then you will hear two as the stressed syllable, etc. You can also hear the stress on a syllable because the pitch goes down for it: if you count with a question intonation (one-twó-three-four? one-twó-three-four?, etc.) you will find that you pronounce the stressed syllable (in this case two) with low pitch and that each of the ones that follow is higher than the one before, so three higher than two, four higher than three. Try it.

So, very simplistically, you might say that whether the pitch goes up or down is a matter of intonation, but that where it goes up or down is matter of stress.

5.4 The speech tract

From the larynx the airstream enters the speech tract (Du: aanzetstuk), a tube extending all the way from the vocal cords to the lips and/or nostrils. It consists of the pharynx (Du: keel), the mouth, and – assuming that the soft palate is down – the nasal cavity. The soft palate (Du: zachte gehemelte) is a valve which closes off the entrance to the nasal cavity when it is pressed up, but opens the cavity when it is allowed to hang down, as in ordinary breathing. (When we have a cold, the entrance to the nasal cavity may be blocked by mucus, which forces us to breathe through the mouth.) It is in this speech tract that the airstream coming from the larynx is (further) modified so as to produce all the different shades of sound – in vowels as well as in consonants – that we can produce.

In essence, there are two factors that play a part in bringing about these differences: resonance and friction.

5.4.1 Resonance

The particular ’color’ or timbre that the glottal tone will have when it leaves the speaker depends on the shape of the speech tract. This shape can be altered quite a bit, mainly by moving the tongue and lower jaw into different positions and by rounding or spreading the lips. Since every shape will give its own characteristic resonance to the air contained in it, we will generally be able to produce as many different vowel sounds as we can give different shapes to the speech tract.

Try to ‘mouth’ the vowels without making any sound and notice how your tongue and lips move to different positions for them. Since our speech tract will always be with us during our speaking lives, resonance will always be a contributing factor to the quality of the sounds we produce, even of voiceless sounds. This is easy enough to demonstrate: say [ss] with spread lips and then round them, and notice how the sound produced becomes duller.

5.4.2 Friction

Friction only accompanies some sounds, usually obstruents. It can be heard when the air flows through a narrowed opening in the speech tract. When we blow out a candle or a match, we make such a narrowing between our lips.

Many speech sounds have friction. Clear examples of sounds that only consist of friction are the voiceless fricatives [s], [f], [x] etc. When we narrow the glottis, whisper (which is glottal friction) is produced, as we have seen in the previous section. (Whisper a few vowels softly. When you whisper them louder, you may also generate friction in the mouth itself, in particular for a vowel like [i]. Try it!)

Of course, friction can also accompany voiced sounds: imitate the buzzing of a bee or a gnat to produce [z] as in easy.

5.5 The mouth

The mouth is the most important part of the speech tract because it is here that the most important modifications of its shape are achieved and that the majority of the articulatory contacts are made. The roof of the mouth is formed by the soft palate, with the uvula (Du: huig) at the extreme end, which can easily be seen with the aid of a mirror. To the front of the soft palate lies the hard palate, or simply palate (Du: (harde) gehemelte): if you curl your tongue up, you can feel the hard palate arching back to where the soft palate begins. Immediately behind the front teeth is the alveolar ridge, which you touch with the tongue when you say dada; then there are the front teeth themselves and the upper lip. Below these parts there are the more active speech organs: the lower lip and the tongue.

The narrow zone immediately behind the tip of the tongue is called the blade. You use it when imitating the sharp, hissing sound of a snake. The part of the tongue opposite the hard palate is called the front, the part opposite the soft palate is called the back, and the part opposite the back wall of the pharynx is called the root.