The english phonetics and pronunciation roadmap

By Eriberto Do Nascimento on 01 Nov 2021

Pronunciation is all about Phonetics

When learning to pronounce a new language, it’s essential to get your priorities right. The most important sounds are the ones that can change the meaning of words. These are called phonemes.

If you say pin and it sounds like bin, people will misunderstand you. And if you say I hid them and it sounds like I hit them, there will also be a breakdown in communication. Furthermore, you should be aware that sounds may be pronounced differently in different contexts, e.g., pre-vocalically (before vowels), intervocalically (between vowels), or pre-consonantally (before consonants).

They may also be pronounced differently in different positions in the word – at the beginning (initial), in the middle (medial), and at the end (final). For instance, /p/ is more like a /b/ when it occurs after /s/, e.g., port vs. sport; /r/ sounds different in red and tread, the two /t/ sounds in tight are different, and the quality of is different in goat and goal. Note that when we refer to the letters in a word – as opposed to the sounds – we show them in angle brackets, e.g., or . Phonemes are shown in slant brackets, e.g., /r/ or /ɛ/.

The word spread would phonemically be shown as /sprɛd/. Even if people can understand what you are saying, an off-target pronunciation may still sound comical, irritating, or distracting to listeners. For instance, if you say English /r/ with a back articulation (in your throat) instead of a front articulation (with your tongue-tip), it may sound funny to people who aren’t used to it. If listeners are distracted because of a false pronunciation, they may stop concentrating on what you are trying to say.

Or if they need to invest a lot of effort in deciphering what you are saying, they may lose track of your message. Furthermore, judgments of your overall ability in English are likely to be based on the impression your pronunciation makes: if you sound like a beginner, you may be treated like a beginner, even if your level is advanced in terms of grammar, vocabulary, reading, and writing.

The best approach is to aim for a pronunciation that:

(1) can be understood without any difficulty and
(2) doesn’t irritate or distract your listeners.

Advancing in phonetics aspects

Note that there’s more to learning the pronunciation of a language than mastering the segments (vowels and consonants). You have to pay attention to several other points. For instance, correct use of weak forms helps to get the speech rhythm right. Contractions, e.g., don’t, it’s, we’ll, improve the fluency.

To make your pronunciation more authentic, it’s important to have knowledge of assimilation (sounds that change under the influence of neighboring sounds, e.g., when becomes /wɛm/ in when my) and elision (disappearing sounds, e.g., /t/ is often lost in facts). Some sound differences matter a great deal, whereas others are of little significance. The ones that matter most are those that can change the meaning of otherwise identical words.

In English, the words bit, bet, boat are distinguished only by the vowels; in bit, sit, wit, only the initial consonant is different. In bit, bill, bin, it’s the final consonant that brings about the change in meaning. Sounds that can distinguish meaning are called phonemes (adjective: phonemic) or Phonemes and allophones .

A pair of words distinguished by a single phoneme is called a minimal pair, e.g., bit – hit. The variety of English taught has 24 consonant phonemes and 13 vowel phonemes. Not every sound difference can change the meaning of a word. Listen carefully to feet and feed. You can hear a distinct difference in the length of the two vowels. But the native English speaker interprets these vowels as two variants of the same phoneme /i/; the different vowel lengths are the result of the influence of the following consonants /t/ and /d/.

Similarly, the two /k/ sounds in keen and corn are different, the first being formed more forward and the second further back in the mouth, but English speakers hear both as variants of the phoneme /k/. When you say the /d/ in deal, your lips are unrounded during the consonant, but when you say /d/ in door, they are rounded. In deal, the vowel is unrounded, and in door, the vowel is rounded. When we say deal and door, our lips are getting ready for the vowel during the articulation of the consonant. So the lip-shape of the consonant is affected by the lip-shape of the following vowel. Each phoneme is composed of a number of such different variants.

These are termed allophones (adjective: allophonic). Allophones may occur in complementary distribution or in free variation. Our deal/door example is an instance of allophones in complementary distribution. This means that the different allophones complement each other; where one occurs, the other cannot occur. In other words, we can write a rule for the occurrence of the two allophones: /d/ with rounded lips occurs before lip-rounded sounds while /d/ with unrounded lips occurs before all other sounds. Vowels are shortened before voiceless consonants like /s/ while they retain full length before voiced consonants like /z/; for example, the vowel in face /feɪs/ is clearly shorter than that in phase /feɪz/.

Again, the allophones are in complementary distribution. If allophones are in free variation, their occurrence cannot be predicted from the phonetic context. An example of this would be the different possible pronunciations of /t/ in word-final position, as in hat. It’s possible to pronounce the /t/ with or without glottal reinforcement.

Many speakers vary between these two possibilities, and we cannot predict which of the two they are going to use. The glottally reinforced and non-glottally reinforced variants are therefore said to be in free variation. Unfortunately for the learner, languages generally don’t have the same phoneme system, and they certainly don’t have the same range of allophones. So the learner has to work out the phonemic inventory of the new language and all the phonetic variants. Your first task is to make sure you never lose a phoneme contrast.

This isn’t easy to do in practice. Even though two phonemes may sound very similar, or identical, to the learner, to the native speaker, they are completely different. This is something native speakers and learners are often not aware of. Native speakers are frequently surprised to hear that the vowels in the English words seat /sit/ and sit /sɪt/ sound identical to speakers of most other languages, who hear them as the same vowel because they count as allophones of the same phoneme in their languages.

Many learners find it difficult to separate the phonemes in Luke /luk/ and look /lʊk/. Others find it difficult to distinguish between cat /æ/, cut /ə/, and cot /ɑ/. Yet others can’t hear and/or make the difference between the initial consonants in three /θ/ and tree /t/, or three /θ/ and free /f/, or theme /θ/ and seem /s/. In this course, we have provided exercises for 29 consonant contrasts and 17 vowel contrasts.

You’ll find that some of these don’t pose a problem for speakers of your language while others will take a long time to master. If making a particular contrast isn’t difficult for you, you can still use the contrast section as extra material to help you get the two sounds just right. Note that a full command of the contrasts involves being able to say all the different allophones of a phoneme in their appropriate contexts.

Remember that allophones can never change the meaning of words. English /t/ can be said in many different ways (i.e., there are many different allophones or variants), but if we substitute one allophone for another, the meaning remains the same. It will merely sound a bit odd. However, if we replace /t/ in tight by /s/, /f/, or /k/, then it turns into sight, fight, kite, and the result is a new word with a different meaning; /t s f k/ are therefore examples of phonemes in English.

The American English phoneme system

The English phoneme system is shown in the “English Phonemic Transcription Key” or Spelling and sound English orthography (i.e., spelling) is notoriously unreliable. For instance, the vowel /i/ can be spelled in numerous ways. All the letters underlined in the following words represent /i/: me, see, sea, believe, receive, pizza, people, key, quay, quiche, Portuguese. Most other phonemes can also be spelled in many different ways, especially vowels. So instead of relying on the orthography, phoneticians use transcription. There are two types:

(1) phonemic transcription, indicating phonemes only; this type, as we have seen, is normally placed inside slant brackets / /, e.g., part /pɑrt/. The sign – is used to show phoneme contrasts, e.g., let /lɛt/ – met /mɛt/;
(2) phonetic transcription, showing more detailed allophonic distinctions, enclosed by square brackets [ ], e.g., part [phɑrt]. To indicate the allophonic distinctions, we often make use of diacritics, i.e., marks added to symbols to provide extra information, e.g., [pʰ]. The rounded allophone of /t/ is shown as [tw]; as /t/ said with unrounded lips is the default, there’s no special symbol to denote it.

Sometimes words with different meanings are spelled completely differently but are pronounced in the same way, as in key and quay above. Such words are called homophones (same pronunciation, different meaning).

English has a great many of these. Other examples of homophones are wait/weight, know/no, sea/see, cite/sight/site. To confuse matters even more, the opposite also occurs. It’s possible for words that are spelled identically to be pronounced differently. The written word row can be said with the vowel in goat (when it means a “line”) or the vowel in mouth (when it means a “quarrel”), and it’s therefore impossible to tell from the spelling alone which meaning and pronunciation are intended. Words of this type are called homographs (same spelling, different pronunciation).

Phoneme symbols Unfortunately, at present, there is no consensus among writers on the set of symbols used for transcribing GA. Even those, like us, who use the symbols of the International Phonetic Association’s (IPA) International Phonetic Alphabet (see p. xv) don’t necessarily use the same symbols in their transcriptions.

The main reason for this is that while the IPA provides symbols to represent the range of speech sounds found in language, it doesn’t dictate how the sound system of a language should be analyzed. A further reason is that writers have different approaches depending on whether they are writing for foreign learners, speech and language therapists, professional linguists, actors, dictionary users, and so on. In each case, there may be different traditions of transcription, differences in the linguistic knowledge of readers, different levels of tolerance for unfamiliar symbols, and different assumptions about what needs to be made explicit and what can be taken for granted in transcriptions.

Transcription system which is mostly phonemic but includes a small number of non-phonemic elements. We take a phonemic approach to the schwa /ə/ phoneme, using the same symbol for it in stressed and unstressed syllables (e.g., above /əˈbəv/) and the same symbol followed by /r/ when it is r-colored in stressed and unstressed syllables (e.g., murmur /ˈmərmər/). We take a non-phonemic approach to the sport [o] vowel (e.g., four /for/, sort /sort/, story /ˈstori/), t-tapping (e.g., city /ˈsɪt̬i/), and syllabic consonants (e.g., kitten /ˈkɪtn̩/, rattle /ˈrætl̩/).

In these three cases we continue to use phonemic slanted bracket in order to avoid the inconvenience of constantly switching between phonemic and phonetic bracketing.

The syllable

A syllable is a group of sounds that are pronounced together. Words can consist of a single syllable, i.e., a monosyllable (tight, time) or of two or more syllables (polysyllabic), e.g., waiting (two syllables – disyllabic), tomato (three syllables), participate (four syllables), university (five syllables), and so on.

A syllable nearly always contains a vowel (e.g., eye /aɪ/); this is called the syllable nucleus. The nucleus may be preceded or followed by one or more consonants (e.g., tea, tree, stream, at, cat, cats, stamps). The consonant or consonants preceding the nucleus are known as the syllable onset, and the consonants following the nucleus are called the coda.

A group of consonants in a syllable onset or coda is known as a cluster. The English syllable can consist of clusters of up to three consonants in the onset (e.g., strengths /strɛŋθs/), and as many as four in the coda (e.g., texts /tɛksts/). Note that we are here concerned with pronunciation, so even though the word time looks as if it consists of two syllables because it has two vowel letters in the orthography, the word consists of only one syllable, as the second vowel letter in the spelling doesn’t represent a vowel sound. A syllable that has a coda (i.e., one or more closing consonants) is called a closed syllable, while a syllable that ends with a vowel phoneme is called an open syllable. Occasionally, a syllable consists of a consonant only, most frequently /n/ or /l/, e.g., Britain /ˈbrɪtn̩/, hidden /ˈhɪdn̩/, mission /ˈmɪʃn̩/, middle /ˈmɪdl̩/, apple /ˈæpl̩/.

A consonant that forms a syllable without the aid of a vowel is called a syllabic consonant. Note that we show a syllabic consonant by means of a small vertical mark beneath the symbol (with descending symbols, a superscript mark is used, e.g., bacon /ˈbeɪkŋ̍/).

A word like apple /ˈæpl̩/ consists of two syllables, but only the first contains a vowel; the second contains a syllabic consonant.

Stress Words consist of more than a set of segments (vowels and consonants) arranged in a certain order. Words of more than one syllable also have a distinctive rhythmic pattern depending on which syllables are pronounced with stress and which are not. Stressed syllables are pronounced with greater energy and effort than unstressed syllables, which results in greater prominence, i.e., they stand out more.

The first syllable in carpet is stressed and the second unstressed; the second syllable in contain is stressed and the first unstressed. Stress is indicated by means of a vertical mark placed before the stressed syllable, and unstressed syllables are left unmarked, e.g., /ˈkɑrpət/, /kənˈteɪn/. The position of stress in an English word is an important factor in word recognition, and there are even words that are distinguished by stress alone, e.g., the noun increase /ˈɪŋkris/ and the verb increase /ɪŋˈkris/. Some words have more than one stressed syllable.

In Alabama, the first and third syllables are stressed, while in impossibility, the second and fourth syllables are stressed. In these examples, as in all cases of multiple stresses, the last stress sounds more prominent than the earlier stress, and this is why the term primary stress has been used for the last, more prominent stress and secondary stress for any earlier, less prominent stresses. Primary stress is indicated with the usual stress mark and secondary stress with the same symbol at a lower level, e.g., /ˌæləˈbæmə/, /ɪmˌpɑsəˈbɪlət̬i/.

Although the terminology and transcription seem to suggest that there are three different levels of stress – primary stress, secondary stress, and unstressed – this isn’t actually the case. There are only stressed and unstressed syllables, and the difference in prominence between the stresses in words such as Alabama and impossibility is due to pitch accent.

An accented /ˈæksɛntəd/ syllable is one that is accompanied by a change in the pitch of the voice. Pitch is related to the speed at which the vocal folds vibrate: faster vibration results in higher pitch and slower vibration lower pitch. When a word is pronounced in isolation, the syllable that takes primary stress is accented, i.e., accompanied by a pitch movement, usually a fall in pitch. When there’s a “secondary stress” earlier in the word, this is accompanied by a step up to a relatively high pitch before the pitch movement of the “primary stress.”

In terms of the English sound system, the pitch movement associated with the “primary stress” is more salient than the step up in pitch associated with the “secondary stress.” Thus, the distinction between primary and secondary stress is really a difference between different kinds of pitch accent rather than stress. In this course when individual monosyllabic words are transcribed as examples, we don’t use a stress mark, which agrees with the approach taken in most dictionaries and works on English phonetics.

Every word must have at least one stressed syllable when pronounced in isolation, and therefore, it’s self-evident that the one and only syllable of a word is stressed. When we transcribe an individual polysyllabic word, we only indicate primary stress. When we transcribe utterances longer than a single word, we use the stress mark whenever a syllable is stressed, meaning that monosyllabic words can receive a stress mark but also that some stresses that appear when a word is said in isolation may disappear when the word is spoken in a phrase.

Pronunciation model and accent

Every language has a number of different accents. An accent is a pronunciation variety characteristic of a group of people. Accents can be regional or social. In the USA, we find many different regional accents; examples are Texas, Kentucky, New York, and Boston, spoken by most of the people who live in these areas. But unless you have reasons for specifically wishing to adopt one of these regional accents, it’s best for learners not to use these as a model for imitation.

The accent of American English we recommend is one heard from educated speakers throughout the USA (and also in Canada). We shall term this social accent General American (abbreviated to GA).

If you listen regularly to the American media, you’re probably already familiar with this accent, since it’s the variety used by the majority of American presenters. It’s sometimes even called “Network English.” It’s either completely non-localizable (i.e., it’s impossible to tell where speakers come from) or has very few regional traces.

Thus, GA can be taken as the common denominator of the speech of educated Americans. When people alter their pronunciation (consciously or unconsciously) to sound less regional, they change in the direction of GA. When there’s an accent spectrum within a location, those at the lower end of the social scale speak with the local accent while those toward the other end of the social scale speak with an accent progressively more like GA. The English we describe in this course is the speech of the average modern General American speaker. Old-fashioned usages have been excluded, as have any “trendy” pronunciations that are too recent to have gained widespread acceptance.

Ready to improve your english accent?

Get a FREE, actionable assessment of your english accent. Start improving your clarity when speaking

Your email*

Your name*

** You will receive an e-mail to unlock your access to the FREE assessment test

Eriberto Do Nascimento

Eriberto Do Nascimento has Ph.D. in Speech Intelligibility and Artificial Intelligence and is the founder of English Phonetics Academy.