Phonetics is a branch of linguistics that studies how humans produce and perceive sounds, or in the case of sign languages, the equivalent aspects of sign. Phonetics—linguists who specialize in phonetics—study the physical properties of speech. The field of phonetics is traditionally divided into three sub-disciplines based on research questions such as how humans plan and execute movements to produce speech (articular phonetics).

The various movements of the resulting sound ( acoustics phonetics ), or how humans convert sound waves into linguistic information (auditory phonetics) Traditionally, the minimum linguistic unit of phonetics is the phone - a speech sound in a language - which differs from the phonetic unit of phoneme; A phoneme is an abstract classification of a phone.

Phonetics is broadly concerned with two aspects of human speech: the way humans make sounds and perception – speech is understood. The communicativeness of a language describes how a language creates and experiences language.

Languages with oral-aural modalities such as English deliver speech orally (using the mouth) and observe speech orally (using the ear). Sign languages such as Auslan and ASL have a manual-visual modality, which produces speech manually (using the hands) and perceives the speech visually (using the eyes).

Tactile signatures by deaf-blind speakers in ASL and some other sign languages There is a manual-manual dialect for use in the language, where signals are generated by the hands and perceived by the hands.

Language production consists of several interdependent processes transforming a non-linguistic message into a spoken or signed linguistic sign. After identifying a message encoded linguistically, a speaker must select individual words – known as lexical items – to represent that message in a process called lexical selection. During sound encoding, the mental representations of words are assigned their phonological content as phonemes sequences.

Vowels are assigned to articulatory features that denote particular targets, such as a closed-lip or tongue at a particular location. These tones are then coordinated into a sequence of muscle commands that can be sent to the muscle, and when these commands are appropriately executed, the desired sounds are produced.

These movements disrupt and modify an air stream resulting in a sound wave. Modification is done by the articulator, with different locations and modes of expression producing different acoustic results. For example, the words tackle and sack begin with alveolar sounds in English, depending on how far the tongue is from the alveolar ridge.

This difference has a significant effect on the airflow and thus the sound it produces. Similarly, the direction and source of the current can affect the sound. The most common airstream mechanism is pulmonary - using the lungs - but the glottis and tongue can also produce the airstream.

Language perception is the process by which a linguistic signal is decoded and understood by a listener. For example, a continuous acoustic signal must be converted into discrete linguistic units such as vowels, morphemes, and words to understand speech.

In order to correctly identify and classify sounds, listeners prioritize certain aspects of the sign that can reliably differentiate between linguistic categories. While some signs are preferred over others, several aspects of the sign can contribute to perception.

For example, although oral languages prefer acoustic information, the McGurk effect suggests that visual information differentiates ambiguous information when acoustic cues are unreliable.

There are three main branches of modern phonetics:

  • articulatory phonetics, which studies the way sounds are made with articulators
  • Acoustic phonetics, which studies the acoustic consequences of various expressions
  • Auditory phonetics studies the way listeners perceive and interpret linguistic cues.


Sanskrit grammarians did the first known phonetic studies in the 6th century BCE. The Hindu scholar Panini is the most famous of these early investigators, whose four-part grammar, written around 350 BCE, is influential in modern linguistics and is still "the complete parental grammar of any language yet written." " represents. [3]His grammar formed the basis of modern linguistics and described many important phonetic principles, including the voice.

This early account describes resonance as being produced by vocalizations when the vocal folds are closed or noise when the vocal folds are opened. Phonetic theories in grammar are considered "primitive" in that they base their theoretical analysis rather than as objects of theoretical analysis, and theories can be inferred from their system of phonetics.


Progress in phonetics after Panini and his contemporaries was limited to the modern era, except for some limited investigation by Greek and Roman grammarians. In the millennium between Indian grammarians and modern phonetics, attention shifted away from the distinction between spoken and written language, which was the driving force behind Panini's account. It began to focus on the physical properties of speech alone. Continuing interest in phonetics resumed around 1800 CE, with the term "phonetics" first being used in its current sense in 1841.

With new developments in medicine and audio and visual recording devices, phonetic insight was enabled. To access and review new and more detailed data. Alexander Melville Bell, in this early period of modern phonetics, developed an influential phonetic alphabet based on artistic terms. Known as visual speech, it gained prominence as a tool in the oral education of deaf children.

Before the widespread availability of audio recording equipment, phonetics relied heavily on practical phonetics to ensure that transcriptions and inferences were consistent with phonetics. This training included both ear training—recognition of speech sounds and production training—the ability to produce sounds.

Phonetics was expected to learn to recognize by ear the various sounds on the International Phonetic Alphabet and the IPA still tests and certify speakers' ability to produce the phonetic patterns of English accurately (although they have also applied for other languages). This practice has been discontinued).

As a modification of his visual speech method, Melville Bell developed vowels based on height and back, resulting in 9 central vowels. As part of their training in practical phonetics, phonics were expected to learn to produce these keynotes to stabilize their perception and transcription of these phones during fieldwork. This approach was criticized by Peter Ledefogd in the 1960s based on experimental evidence. He found that cardinal vocalizations were auditory rather than vocal targets, challenging the claim that they represent artistic anchors. by which phonetics can judge other expressions.


Language production consists of several interdependent processes transforming a non-linguistic message into a spoken or signed linguistic sign. Linguists debate whether language production occurs in steps (serial processing) or parallel production processes. After identifying a message encoded linguistically, a speaker must select individual words – known as lexical items – to represent that message in a process called lexical selection. Words are selected based on their meaning, which is called semantic information in linguistics.

Verbal selection activates the word's lemma, which contains both semantic and grammatical information about the word. 

After an utterance is planned, it then goes through phonetic encoding. At this stage of language production, the mental representation of words is assigned to their phonetic content as a sequence of vowels forms.

Vowels are assigned to articulatory features that denote particular targets, such as a closed-lip or tongue at a particular location. These tones are then coordinated into a sequence of muscle commands that can be sent to the muscle, and when these commands are appropriately executed, the desired sounds are produced.  Thus, the process of production from message to sound can be summarized as the following sequence:

  • message plan
  • lemma selection
  • Retrieval and assignment of phonetic word forms
  • artistic specification
  • muscle order
  • jointing
  • speech sounds
  • place of articulation

Those sounds which are made by the complete or partial composition of the vowel path are called consonants. The pronunciation of consonants occurs in the vocal tract, usually in the mouth, and the location of this construction affects the resulting sound. Due to the close relationship between tongue position and the resulting sound, the location of phonetics is an essential concept in many sub-disciplines of phonetics.

Sounds are classified partly by the location of an erection and the part of the body performing the constriction. For example, in English, the words fought and thought are a minimal pairing that differs only in the forming part rather than the construction place. The "F" in battle is a labiodental articulation made with the bottom lip against the teeth. The "th" in thought is a linguistic expression made with the tongue against the teeth. The contractions made by the lips are called labial, whereas the contractions made by the tongue are called lingual.

Contractions made with the tongue can be made in several parts of the vocal tract, broadly classified into coronal, dorsal, and radical sites of articulation. Coronal articulations are made along the front of the tongue, dorsal articulations are made with the back of the tongue, and radical articulations are made in the pharynx. These divisions are not sufficient to separate and describe all speech sounds.  For example, in English, the sounds [s] and [ʃ] are both coronal, but they originate in different mouth places. To take this into account, more precise locations are needed depending on the mouth area in which the constriction occurs.

Joints involving the lips can be performed in three different ways: with both lips (bilabial), with a lip and teeth (labiodental), and with the tongue and upper lip (linguolabial). Depending on the definition used, some or all of these manifestations may be classified in the class of labial manifestations. Biblical consonants are formed with both lips. In producing these sounds, the lower lip moves farthest to meet the upper lip, which is also slightly lower,  although in some cases, the lip is due to the force of air through the orifice (the opening between the lips). Can separate rapidly. They can come together. 

Unlike most other joints, both joints are composed of soft tissue and are more likely to produce bilabial stops with incomplete closure than joints involving hard surfaces such as the teeth or palate. Bilabial stops are also unusual because an articulator in the upper part of the vocal tract actively moves downward, as the upper lip shows some active downward movement. 

Linguistic consonants are formed by the blade of the tongue coming into or in contact with the upper lip. Like the bilabial articulation, the upper lip moves slightly towards the more active articulator. Expressions of this group do not have their symbols in the International Phonetic Alphabet. Furthermore, they are created by combining a top symbol with a diacritic that places them in the coronal category. They exist in several languages indigenous to Vanuatu, such as Tango.

Labiodental consonants are formed by the lower lip moving towards the upper teeth. Labiodental consonants are most often fricatives, while labiodental nasals are also typologically common. The debate is not as to whether true labiodental plosives occur in any natural language,  although there are reports for labiodental plosives including languages Zulu, Tonga, and Shubi.


Coronal consonants are formed from the tip or blade of the tongue and represent a variation not only in the location but also in the tongue's posture due to the agility of the front of the tongue. Coronal spaces of articulation represent areas of the mouth where the tongue contacts or forms constriction and include dental, alveolar, and alveolar spaces.

Tongue postures using the tip of the tongue may be epistemic if the tip of the tongue tip is used if made laminal with the blade of the tongue, or subheading if the tip of the tongue is turned backward. It goes and is used under the tongue. Coronaries are unique as a group in that each type of expression is confirmed. The Australian languages are known for the many coronal contrasts displayed in other languages in the region.

Dental consonants are made from the tip or blade of the tongue and the upper teeth. They are divided into two groups based on the part of the tongue that is used to produce them: apical dental consonants arise with the tip of the tongue touching the teeth; Interdental consonants arise with the blade of the tongue as the tip of the tongue sticks out in front of the teeth. Neither language is known to use the two interchangeably, although they may exist allophonically.

The alveolar consonants are formed by the tip or blade of the tongue on the alveolar ridge just behind the teeth and may likewise be apical or laminal. 

Crosslinguistically, dental consonants, and alveolar consonants are often opposite, leading to many generalizations of the cross-linguistic pattern. The different articulation locations are also contrasted in the tongue used to produce them: most languages with dental stops have laminal teeth, while languages with apical stops usually have apical stops. Languages with contrasting liminality rarely have two consonants in the same position, although ta (ǃXóõ) is a counterexample to this pattern.

Suppose a language has only one dental stop or alveolar stop. In that case, it will usually be laminal if it is a dental stop, and the stop will usually be apical if it is an alveolar stop. For example, Temne and Bulgarian [28] Do not follow this pattern.

If a tongue has both an apical and palpebral closure, then the palpebral stop is more likely to be affricated, like in Isoko. However, Dahalo shows the opposite pattern, with the alveolar stops being more affricated. 

There are several different definitions of retroflex consonants, depending on whether the tongue's position or the position on the roof of the mouth is given prominence. However, in general, they represent a group of articulations in which the tip of the tongue is somewhat turned upward. In this way, retroflex articulation can occur in many different locations on the roof of the mouth, including the alveolar, posterior alveolar, and palatal regions.

If the lower part of the tip of the tongue contacts the roof of the mouth, it is subtonic, although apical post-alveolar sounds are also described as retroflexes. Typical examples of sub-apical retroflex stops are commonly found in Dravidian languages. In some languages indigenous to the southwestern United States, the contrasting difference between dental and alveolar stops is a slight retroflection alveolar stops. [32] Acoustically, retroflection affects higher formats.

The articulations just behind the alveolar ridge, known as alveolar consonants, have been referred to using several different terms. The apical post-alveolar consonant is often called retroflex, while the laminal articulation is sometimes called palato-alveolar; In Australian literature, these laminal stops are often described as 'palate,' although they usually originate further away than the palatal region described as palatine. Due to individual anatomical variation, the precise expression of palatal–alveolar stops (and coronals in general) can vary widely within a speech community. 


Dorsal consonants are those consonants that are made using the body of the tongue rather than the tip or blade and usually originate in the palate, velum, or uvula. Palatal consonants are formed by using the body of the tongue against the hard palate on the roof of the mouth.

They are often contrasted with velar or uvular consonants. However, it is rare for a language to contrast all three together, with Jakarta as a possible example of a three-way contrast. The tongue body is used against the Velar consonant velum. They are ubiquitous cross-linguistic; Almost all languages have a velar stop. Because both velar and vowel are formed using the body of the tongue, they are highly influenced by cohesion with vowels and can be produced as far away as the hard palate or as the uvula.

These variations are generally divided into front, middle and posterior velars parallel to the vocal space. [36] It can be challenging to distinguish phonetically from the palatal consonant, although the prototypical palate originates slightly behind the region of the consonant. Umbilical consonants are formed by the body of the tongue coming into contact with or near the uvula. They are rare, occurring in an estimated 19 percent of languages, and there are no languages with uvular consonants in large areas of the Americas and Africa. In languages with uvular consonants, the stop is often followed by a continuous (including nasal). 

Pharynx and larynx

Consonants formed by contraction of the throat are pharyngeal, and consonants formed by contraction in the larynx are larynx. The larynx is made using the vocal folds because the larynx is too low for the tongue to reach. However, the pharynx is so close to the mouth that parts of the tongue can reach them.

Radical consonants use either the root of the tongue or the epiglottis during production and originate far back in the vocal tract. Pharyngeal consonants are produced by pulling the root of the tongue far enough almost to touch the pharynx wall.

Due to the difficulties of production, only friction and approximation can be produced in this way.  Epiglottal consonants are formed along with the epiglottis and the posterior wall of the pharynx. Epiglottal stops have been recorded in Mahalo. Vocal epiglottal consonants are not considered possible because the cavity between the glottis and the epiglottis is too tiny to allow voice. 

Glottal consonants are those that are formed using the vocal folds in the larynx. Because the vocal folds are the source of phonation and the bottom of the oro-nasal vocal tract, many glottal consonants are impossible such as a voiced glottal stop. Two glottal consonants are possible, one voiceless glottal stop and two glottal fricatives, and attested in all-natural languages. Glottal stops, the sound produced by closing muscles, are particularly common in the world's languages. 

While many languages use them to delimit the boundaries of phrases, some languages, such as Huatla Mazatec, have them as opposite vowels. In addition, the glottal stop can be felt as a Latinization of the following vowels in this language. A glottal stop, especially between vowels, is not usually a complete closing. True glottal usually stops only occur when they are gemmated. [45]


Top to bottom view of the larynx.

The larynx, commonly known as the "voice box," is a cartilaginous structure in the trachea that is responsible for vocalization. The vocal folds (chords) are held together to vibrate or are held apart so that they do not. The position of the vocal folds is derived from the movement of the arytenoid cartilage. 

The internal laryngeal muscles are responsible for moving the arytenoid cartilage and modulating the tension of the vocal cords. If the vocal folds are not close or tense enough, they will either vibrate sporadically or not at all. If they vibrate sporadically, this will result in a hoarse or breathless voice, depending on the degree; If they do not vibrate at all, the result will be noiselessness.

In addition to keeping the vocal folds correctly, they must also contain air or not vibrate. The difference in pressure in the glottis required for voice is estimated at 1 - 2 cm H 2 O (98.0665 - 196.133 Pascals).

An increase in pressure above the glottis (supraglottal pressure) or a decrease in pressure below the glottis (subglottal pressure) can cause the pressure difference to drop below the levels required for phonation.

The respiratory muscles maintain Subglottal pressure. The supraglottal pressure, without any constriction or addition, is equal to atmospheric pressure. However, because manipulations—specifically the consonants—represent obstructions of airflow, the pressure in the cavity behind those obstructions may increase, resulting in higher supraglottal pressure. 

Lexical access

According to the Lexical Access Model, two distinct stages of cognition are employed; Thus, this concept is known as the two-stage principle of lexical access. The first step, lexical selection, provides information about the lexical objects needed to construct a functional-level representation.

These objects are retrieved according to their specific semantic and syntactic properties, but phonetic forms have not yet been provided at this stage. The second step, WordForm's retrieval, provides the information needed to construct the positional level representation artistic model.

While producing speech, articulators pass through and contact particular places in space, resulting in acoustic signal changes. Some models of speech production take this as a basis for modeling expressions in a coordinate system that can be internal (internal) or external (external) of the body. Internal coordinate systems model the position and angle of joint motion in the body as the articulator. For example, internal coordinate models of the jaws often use two to three degrees of freedom to represent translation and rotation.

These face problems with the modeling of the tongue, which, in contrast to the jaw and arm joints, is a muscular hydrostat – like the elephant's trunk – which lacks joints. Due to the different anatomical structures, the trajectories of jaw movement are relatively straight lines during speech and chewing, while tongue movement follows curves. 

Straight-line movements have been used to debate schematically articulating in external rather than internal space, although external coordinate systems also include acoustic coordinate space, not just physical coordinate space. [51] JoeModels assumes movements are planned in outer space, which results in an inverse problem of interpreting the muscle and joint spaces that produce the observed path or acoustic signal. For example, the hand has seven degrees of freedom and 22 muscles, so many different joints and muscle configurations can lead to the same final position.

For planning models in external acoustic space, the same one-to-many mapping problem also applies, with no unique mapping from physical or acoustic targets to the muscle movements required to achieve them. However, concerns about the inversion problem may be exaggerated, as speech is a highly learned skill using neural structures that have evolved for this purpose. 

The balance-point model proposes a solution to the inversion problem by arguing that movement goals should be represented as the muscle pairs acting on the joint. [D] Importantly, muscles are modeled as springs, and the target is the balance point for the modeled spring-mass system. The balance point model can easily account for compensation and feedback when movements are hindered by using springs.

They are considered a coordinate model because they believe that the positions of these muscles are represented as points in space, the equilibrium point, where the spring-like action of the muscles converges. 

The gesture approach to speech production proposes that expressions are represented as movement patterns rather than as coordinates to hit specifically. The minimal unit is a gesture that "represents a group of functionally equivalent articulatory movement patterns that are actively controlled within the context of a given speech-relevant goal (for example, a bilateral closure)."  These groups represent coordinate structures or "synergies" that view movements not as individual muscle movements but as function-dependent groups of muscles that work together as a unit. 

This reduces the degrees of freedom in the articulation scheme, particularly in the internal coordinate model, which allows any movement that achieves the speech goal rather than encoding particular movements in the abstract representation. The gesture model well describes coarticulation as expression at fast speech rates can be explained as a combination of independent gestures at slow speech rates. 


Speech sounds are created by the modification of an airstream resulting in the formation of a sound wave. Modification is done by the articulator, with different locations and modes of expression producing different acoustic results.

Because the posture of the vocal tract, not only the tongue's position, can affect the resulting sound, the mode of expression is vital for describing a speech sound. For example, words Tackle and SackBoth begin with alveolar sounds in English but differ in how far the tongue is from the alveolar ridge.

This difference has a significant effect on the air current and thus the sound it produces. Similarly, the direction and source of the current can affect the sound. The most common airstream mechanism is pulmonary - using the lungs - but the glottis and tongue can also produce the airstream.

Voice and phonation type

A significant difference between speech sounds is whether they are voiced. Sounds are heard when the vocal folds begin to vibrate in the process of vocalization. Many sounds can be produced with or without sound, although physical barriers can make vowels difficult or impossible for some pronunciations.

When vocalizations are voiced, the primary source of noise is periodic vibrations of the vocal folds. Therefore, accents such as voiceless plosives have no acoustic source and are noticeable by their silence, but other voiceless sounds such as fricatives form their acoustic source regardless of phonation.

The muscles of the larynx control the larynx, and languages use more acoustic detail than binary voice. During phonation, the vocal folds vibrate at a fixed rate. This vibration results in a periodic acoustic wave consisting of a fundamental frequency and its harmonics.

The fundamental frequency of an acoustic wave can be controlled by adjusting the laryngeal muscles, and the listener perceives this fundamental frequency as pitch. Languages use pitch manipulation to convey lexical information in tonal languages, and many languages use pitch to characterize prosodic or practical information.

For the vocal folds to vibrate, they must be in the proper position, and air must flow through the glottis. Types of phonetics are based on a continuum of glottal states from completely open (voiceless) to completely closed (glottal stop). The optimum position for vibration, and the most commonly used tone in speech, the modal voice, lies between these two extremes. If the glottis is slightly widened, a breath sound occurs, while bringing the vocal folds together results in a hoarse voice.

The common vowel pattern used in typical speech is the modal voice, where the vocal folds are held together with moderate stress. The vocal folds vibrate as a unit periodically and efficiently with complete glottal closure and no aspiration. If they are pulled further away, they do not vibrate and therefore produce a soundless phone. If they are held firmly together, they produce a glottal stop. 

If the vocal folds are placed further than in modal voicing, they produce phonation types such as breath sounds (or murmurs) and whisper sounds. The vocal ligament (vocal cord) tension is less than in modal voicing, allowing air to flow freely. Breath sounds and whispering sounds both exist on a continuum, characterized by a transition from a more periodic wave of breath sound to a more noisy waveform of a whisper. Acoustically, both dampen the first formant with a whispering sound showing more extreme divergence. 

Holding the vocal folds too tightly together results in a hoarse voice. The tension in the vocal folds is less than in the modal voice, but they are tightly held together, resulting in only the vocal folds' ligaments vibrating. [e] The pulses are highly irregular, with low pitch and frequency amplitude.

Some languages do not maintain a vowel distinction for some consonants, [f] but all languages use the voice to some degree. For example, no language is known to have a phonetic voice contrast for vowels with all known vowels. [g] Other positions of the glottis, such as breathlessness and hoarse voice, are used in many languages, such as Jalpa Majatec, as opposed to vowels. In contrast, in other languages, such as English, they exist allophonically.

There are several ways to determine whether a section has been voiced, the simplest being by feeling the larynx during speech and paying attention to the vibrations felt. More precise measurements can be obtained through a spectrogram or acoustic analysis of spectral slices.

In spectrographic analysis, the voiced segments show a voice bar, a region of high acoustic energy, in the low frequencies of the voiced segments. In the spectral splice probe, a model of the tone of the acoustic spectral utterance at a given time reverses the mouth's filtering to form the glottis's spectrum.

A computational model of the unfiltered glottal signal is then fitted to the inversely filtered acoustic signal to determine the characteristics of the glottis. Visual analysis using specialized medical equipment such as ultrasound and endoscopy is also available. [h]


  • IPA Support

  • Audio

  • complete chart

  • Template

The vowels next to the dots are rounded rounded.

Vowels are classified broadly by the region of the mouth in which they originate. However, because they are produced without a constriction in the vocal tract, their accurate description depends on measuring acoustic correlates of tongue position. This is because the location of the tongue during vowel production changes the frequencies at which the cavity resonates, and it is these resonances—known as formants—that are measured and used to characterize vowels.

Vowel height traditionally refers to the highest point of the tongue during articulation. The height parameter is divided into four primary levels: high (close), close-middle, open-middle, and low (open). The vowels whose height is in the middle are called the middle. Slightly opened close vowels and slightly closed open vowels are called near-close and near-open, respectively. The lowest vowels are expressed not only by lowering the tongue but also by lowering the jaw.

While the IPA implies that there are seven levels of vowel height, it is unlikely that a given language can have a minimum in contrast to all seven levels. Chomsky and Halle suggest that there are only three levels, although four levels of vowel height seem to be needed to describe Danish, and some languages may even require five. 

Vocal backness is divided into three levels: front, middle, and back. Languages are usually not contrasted by at least two levels of vowel backness. Some languages claimed to have a three-way backness distinction including Nimboran and Norwegian. 

In most languages, lips can be classified as either round or rounded (protruding) during vowel production, although other types of lip positions, such as compression and protrusion, have been described. Lip position is correlated with height and back: front and low vowels are rounded, while back and high vowels are generally rounded. Paired vowels on the IPA chart have a broad vowel on the left and a rounded vowel on the right. 

With the universal vowel features described above, some languages have additional features such as sonority, length, and various vowel constructions such as silent or creaky. Sometimes more specialized tongue gestures such as rhoticity, advanced tongue root, pharyngealization, babbling, and frication are necessary to describe a specific vowel.

Manner of articulation

It is not enough to know the location of the articulation to describe a consonant fully. How the consonant occurs is equally essential. Expression methods describe how the active articulator modifies, constricts, or closes the vocal tract. 

Stops (also called plosives) are consonants where the airstream is completely obstructed.

During the contraction, pressure builds up in the mouth, which is released as a short sound burst when the articulators pull apart. The velum is raised so that air cannot flow through the nasal cavity. If the velum is lowered and the air is allowed to flow through the nose, nasal congestion. However, phonetics almost always refers to nasal closure as "nose." 

A fricative is a sequence of steps followed by a fricative at the exact location.

Fricatives are consonants in which the airflow partially, but not wholly, obstructs part of the vocal tract. The sibilant is a particular type of fricative, where the turbulent air current is directed towards the teeth,  producing a loud hissing sound.

Nose (sometimes referred to as a nasal closure) are consonants consisting of a closure in the oral cavity and reduced velum, allowing air to flow through the nostrils. 

In an approximation, the articulators come together, but not to the extent that a turbulent air stream allows. 

Lateral consonants

Lateral consonants are consonants in which the airflow is obstructed along the center of the vocal tract, allowing airflow to flow freely on one or both sides. Laterals are also defined as consonants in which the tongue is compressed so that there is more airflow around the edges than around the center of the tongue. 

The first definition does not allow air to flow over the tongue.

Trills are consonants in which the tongue or lips are set in motion by the air current. The strut is constructed so that the airstream causes a repeating pattern of opening and closing of the soft articulator(s).  Apical trills usually have two or three periods of vibration. 

Taps and flaps are single, fast, usually indifferent gestures where the tongue is thrown against the roof of the mouth, equivalent to a very quick pause. The terms are sometimes used interchangeably, but some phonetics distinguish between them.[In a tap, the tongue contacts the ceiling in a single motion, whereas in a flap, the tongue travels tangentially to the roof of the mouth, striking it as it passes.

During the glottalic airstream mechanism, the glottis closes, trapping a body of air. This allows the remaining air in the vocal tract to move separately. An upward movement of the closed glottis will take this air out, resulting in an ejective consonant. Alternatively, the glottis may shrink by sucking more air into the mouth, resulting in an implosive consonant. 

Clicks are stops in which air is sucked into the mouth by the movement of the tongue; this is called a valeric airstream. [85] The air between the two articulatory closures becomes sparse during the click, producing a sharp 'click' sound when the anterior closure is released.

The release of the anterior closure is known as the click flow. Release of rear closure, which may be velar or uvular, clicks efflux. Clique is used in several Afrikaans language families, such as the Khoisan and Bantu languages. 

Pulmonary and Subglottal System

The lungs drive nearly all speech production, and their importance in phonetics is due to the pressure buildup of their pulmonary sounds. The most common type of sound in all languages is pulmonary exhalation, where the air is expelled from the lungs. The opposite is possible, although in no language are pleomorphic sounds known as phonetics. 

Many languages, such as Swedish, use them for paralinguistic articulation, such as confirmation in many genetically and geographically diverse languages.  Both ingressive and ingressive sounds depend on holding the vocal folds in a particular posture and using the lungs to draw air into the vocal folds so that they either vibrate (voice) or do not vibrate (voiceless). 

Pulmonic manipulations are restricted by the amount of air exhaled in a given respiratory cycle, known as the vital capacity.

The lungs are used to maintain two types of pressure simultaneously to produce and modify phonetics. The lungs must maintain a 3-5 cm H 2 O greater than the pressure above the glottis to produce phonetics. However, minor and rapid adjustments are made in subglottal pressure to modify speech to account for suprasegmental features such as tension.

Several thoracic muscles are used to make these adjustments. Because the lungs and thorax are stretched during inhalation, the elastic forces of the lungs alone can produce a pressure difference sufficient for phonation in lung volume to exceed 50 percent of critical capacity.

At more than 50 percent of vital capacity, the respiratory muscles are used to "calibrate" the elastic forces of the thorax to maintain a steady difference of pressure. Below that volume, they are used to increase subglottal pressure by actively expelling air.

During the speech, the respiratory cycle is modified to accommodate both linguistic and biological needs. Exhalation, usually about 60 percent of the respiratory cycle at rest, increases to about 90 percent of the respiratory cycle. Because metabolic needs are relatively constant, the total air volume in most speech cases remains the same as in calm tidal breathing.

An increase in speech intensity of 18 dB (a loud conversation) has relatively little effect on air volume. Because their respiratory system is not as developed as that of adults, children use a more significant proportion of their vital capacity with deeper breathing than adults.

The source-filter principle

The source-filter speech model is a theory of speech production that explains the link between vocal tract posture and acoustic outcomes. Under this model, the vocal tract can be modeled as a noise source coupled to an acoustic filter. In many cases, the sound source is the larynx during voice processing, although other noise sources can be produced in the same way. The shape of the supraglottal vocal tract acts as a filter, and different configurations of the articulators result in different acoustic patterns.

These changes are approximate. The vocal tract can be modeled as a sequence of tubes, closed at one end, with varying diameters, and using equations for acoustic resonance, the acoustic effect of an articulatory pose can be achieved.

The inverse filtering process uses this principle to analyze the source spectrum produced by the vocal folds during the voice. By taking the inverse of a predictive filter, the acoustic effect of the supraglottal vocal tract can be undone by giving the acoustic spectrum produced by the vocal folds. [95] This allows quantitative studies of different phonation types.

The Language perception

Language perception is the process by which a linguistic signal is decoded and understood by a listener. [i] To understand speech, the continuous acoustic signal must be converted into discrete linguistic units such as phonemes, morphemes, and words. The sounds correctly identify and classify the listener between the linguistic groups that prioritize certain aspects of the signal that can reliably distinguish. 

While some signals are preferred over others, several aspects of the signal may contribute to perception. For example, although oral languages prefer acoustic information, the McGurk effect suggests that visual information differentiates ambiguous information when acoustic cues are unreliable.

While listeners can use various information to segment a speech signal, the relationship between acoustic cues and category perception is not a perfect mapping. Because of coarticulation, noisy environments, and individual differences, there is a high degree of acoustic variability within categories. 

Known as the problem of perceptual invariance, listeners can reliably understand categories despite variability in acoustic immediacy. To do this, listeners rapidly adjust to new speakers and will shift their boundaries between categories to match the acoustic distinctions being made by their conversational partner. 

How sounds make their way from source to the brain

Hearing, the process of hearing sound is the first step in understanding speech. Articulators cause systematic changes in air pressure that travel in the form of sound waves to the listener's ear. The sound waves then hit the eardrum of the listener, causing it to vibrate. The eardrum's vibrations are transmitted to the ossicles - the three small bones of the middle ear - the cochlea. 

The cochlea is a spiral-shaped, fluid-filled tube divided lengthwise by the organ of Corti, which contains the basilar membrane. The basilar membrane increases in thickness as it travels through the cochlea, causing different frequencies to resonate at different locations. This tonotopic design allows the ear to analyze sound like the Fourier transform.

Basler's differential vibration causes the hair cells within the organ of Corti to vibrate. This causes depolarization of the hair cells and eventually converts the acoustic signal into a neuronal signal. [104] While hair cells do not produce action potentials themselves, they release neurotransmitters at synapses along the auditory nerve fibers, which produce action potentials. In this way, the pattern of oscillations on the basilar membrane is converted into spatiotemporal patterns of firing that transmit information about sound to the brainstem.


In addition to consonants and vowels, phonetics also describes properties of speech that are not localized to clauses but larger units of speech, such as syllables and phrases. Prosody includes auditory characteristics such as pitch, speech rate, duration, and loudness.

Languages use these properties to varying degrees to imply stress, pitch, accent, and intonation – for example, the stress in English and Spanish is correlated with changes in pitch and duration. In contrast, stress in Welsh is correlated with duration in Thai. Moreover, it is more consistently correlated with pitch than tension. is correlated only with duration. 

Principles of speech perception

Early theories of speech perception, such as motor theory, attempted to address the problem of perceptual invention by arguing that speech perception and production were closely linked. In its most substantial motor theory argues that the audience for the speech perception to reach artistic representation of the sounds required is; 

To properly classify a sound, a listener reverses the articulation that would produce that sound and can retrieve the desired linguistic category by identifying these gestures. While findings such as the McGurk effect and case studies of patients with neurological injuries have provided support for the motor theory, further experiments have not supported stronger forms of motor theory. However, there is some support for weaker forms of motor theory that do. Claims a non-deterministic relationship between production and perception. 

The successor theories of speech perception focus on sound categories over acoustic cues and can be divided into two broad categories: abstract theories and contextual theories. In abstractionist theories, speech perception involves the identification of an idealized lexical object based on a signal reduced to its essential components and normalizing the signal to counteract speaker variability.

Episodic models such as episodic theories argue that speech perception involves access to detailed memories (i.e., episodic memories) of previously heard tokens. Relevant theories explain the problem of perceptual immutability as an issue of familiarity: generalization is a byproduct of exposure to more variable distributions rather than a distinct process, as abstractionist theories claim. 

Acoustic phonetics

Acoustic phonetics is concerned with the acoustic properties of speech sounds. The sensation of sound is caused by fluctuations in pressure which causes the eardrum to vibrate. The ear converts this movement into nerve signals that the brain registers as sound. Acoustic waves are the records that measure these pressure fluctuations. [112]

Articulatory phonetics

Articulatory phonetics deals with how speech sounds are made.

Auditory phonetics

Auditory phonetics studies how humans perceive speech sounds. Because of the physical characteristics of the auditory system, speech signals are distorted, humans do not perceive speech sounds as an utterly acoustic record. For example, the auditory effects of a volume measured in decibels (dB) do not correspond linearly to sound pressure differences.

The mismatch between acoustic analysis and what the listener hears is particularly noticeable in speech sounds with high-frequency energy, such as some fricatives. To reconcile this mismatch, functional models of the auditory system have been developed.

To describe sounds

Human languages use many different sounds, and to compare them, linguists need to describe sounds in a way independent of language. Speech sounds can be described in several ways. Speech sounds are usually referred to by the motion of the mouth required to produce them. Consonants and vowels are two broad categories that phonetics defines by movements in a speech sound.

There are more fine-grained descriptive parameters, such as the location of the expression. The place of expression, mode of expression, and voice describe consonants and are the main parts of the International Phonetic Alphabet consonant chart. Vowels are described by their height, back, and roundness. Sign language is described by using a similar but different set of parameters to describe signs: location, speed, Hand size, palm orientation, and non-manual characteristics.

In addition to the artistic description, the sounds used in spoken languages can be described using their phonology. Since phonology results from articulation, both modes of description can differentiate sounds with a choice between systems dependent on the phonological feature being investigated.

Consonants are speech sounds that are expressed with a complete or partial closure of the vocal tract. They are usually produced by the modification of an airstream exiting the lungs. The respiratory organs used to create and modify airflow are divided into the vocal tract (supralaryngeal), larynx, and subglottal system. Airstream can be either egressive (outside the vocal system) or ingressive (in the vocal system).

In pulmonary sounds, airflow is produced by the lungs in the subglottal system and passes through the larynx and vocal tract. Global sounds use an airstream created by movements of the larynx without airflow from the lungs. For example, click consonants are expressed through the rarefaction of air using the tongue, followed by the release of further closure of the tongue.

Vowel syllables are speech sounds that are pronounced without interruption in the vocal tract. Unlike consonants, which usually have fixed articulation positions, vowels are defined concerning a group of reference vowels called cardinal vowels. Three properties are needed to define vowels: tongue height, back of the tongue, and roundness of the lips. Vowels expressed with a stable quality are called monophthongs; The combination of two different vowels in the same syllable is called a diphthong. 

In the IPA, vowels are represented on a trapezoidal shape representing the human mouth: the vertical axis represents the front-back dimension of the mouth, and the horizontal axis represents the floor-to-ceiling.

Phonetic transcription is a system of transcribing phone calls into a language, whether oral or sign. The most widely known system of phonetic transcription, the International Phonetic Alphabet (IPA), provides a standardized set of symbols for oral phones.  The standardized nature of the IPA enables its users to keep track of the transcribed accurately and consistently the phones of different languages, dialects, and idioms. IPA is a helpful tool for the study of phonetics and language teaching, professional acting, and speech pathology. 

While no sign language has a standardized writing system, linguists have developed their own notation systems that describe the hand's shape, location, and movement. The Hamburg Notation System (HamNoSys) is similar to IPA in that it allows for varying levels of detail. Some notation systems such as the KOMVA and Stokoe systems were designed for use in dictionaries; They also use alphabet letters in the local language for the handshape while HamNoSys directly represents the handshape. Signwriting aims to be an easy-to-learn writing system for sign language, although any deaf community has not officially adopted it. 

Sign language

Unlike spoken languages, words in sign languages are seen with the eyes rather than the ears. Signs are associated with the hands, upper body, and head. The main articulators are hand and hand. The relative parts of the arm are described with the terms proximal and distal. The proximal part is the part close to the torso, while the distal part is farthest from it.

For example, the movement of the wrist is farther than the movement of the elbow. Because of the low energy requirement, it is generally easier to produce activities outside. Various factors - such as muscle flexion or perceived inhibition - restrict what can be considered a sign. The original signer does not look at the hands of their conversation partner.

Instead, their gaze is fixed on the face. Because peripheral vision is not as focused as the center of the visual field, signals expressed near the face allow more subtle differences in finger movement and location to be discerned.

Unlike spoken languages, sign languages have two similar expressions: the hand. The signatories can use any hand of their choice without any interruption in communication. Due to universal neurological limitations, two-handed cues generally have the same expression in both hands;

This is known as the condition of symmetry. The second universal constraint is the Dominance condition, which assumes that when two handshapes are involved, one hand will remain stationary and have a more limited set of handshapes than the dominant, moving hand.

Additionally, it is typical for a two-handed sign to drop from one hand during an informal conversation. A process called a weak drop. Just like words in spoken languages, signs can affect each other's appearance due to cohabitation.

Examples include handshapes of neighboring marks becoming more similar (assimilation) or weaker drop (an example of removal).



1.  Linguists debate whether these phases may interact or whether they occur sequentially (compare Dale & Reich (1981) and Motley, Camden & Byrs (1982)). For ease of description, the language production process is described as a series of independent steps, although recent evidence suggests inaccurate.