Soda Bottles & Submarines: Essential Speech Acoustics

Soda Bottles & Submarines: Essential Speech Acoustics


Teaching speech acoustics to parents has become an important and exciting part of my auditory-verbal work with families. Creating a way to make this understandable follows some key principles. An analogy is that the child’s hearing levels are like the water or shoreline and speech as the submarine which must be submerged below the water. Both of these elements can be affected by a variety of factors ranging from fluid in the ear, to a modified ear mold, to a hearing aid in need of repair, or to a cochlear implant which needs remapping.

Many parents do not know the fundamentals of an audiogram. Sounds represented are from low to high, from left to right, similar to a piano keyboard. We begin at 250 and go on to double the numbers to get the frequency in Hertz (Hz). These are written across the top of the audiogram. The sounds then are indicated from quiet to loud in decibels (dB) as one goes down the side of the audiogram. Now consider the submarine. The submarine is submerged on the audiogram and the depth represents the decibels. As depth increases, so too is there an increase in decibels. Parents need to understand that threshold is defined as sound which we hear 50% of the time and can be difficult to assess in children who are hearing impaired. It requires concentration on the part of the listener. One can use a water line – pipe cleaner to represent the shoreline at the beach, to see how hearing can be like dipping your toes in and out of the water. That is how fine the line is for threshold.

Six Sound Submarine

Submarines belong below the water and therefore so do the 6 sounds belong below the thresholds. One can conceptualize sinking the submarine by increasing the auditory intensity of the speech signal. The analogy continues in that anything under water (the thresholds) can be heard, while those sounds which are above water can’t be heard. There are many ways to sink our sub. Two fundamental methods to accomplishing this are: 1) the addition of amplification and 2) the use of distance to increase or decrease the intensity of the signal. Signal intensity is increased by 6 dB with every halving of the distance or the intensity of a signal is decreased by 6 dB every time we double the distance.

Commonly, when speech of a child who is hearing impaired is analyzed, professionals will follow the model as proposed by Dr. Daniel Ling (1976) which includes: nonsegmentals; vowels and diphthongs; consonants by manner of production; consonants by place of production; consonants by voice/voiceless contrast; and blends.


The nonsegmentals include the features of duration, intensity and pitch. These features are in fact what makes English sound like English and Chinese sound like Chinese and dramatically different than English. As the underlying foundations of speech intelligibility, nonsegmentals are superimposed on our speech and essentially this information is carried by vowels. These features actually carry meaning. For example, stress and pitch patterns designate an utterance as a question or a statement.

Formant Chart
Figure 1 *Adapted from Ling and Ling

Acoustic Structure of Vowels – Soda Bottles

The filtering of sounds resonated by the vocal tract are enhanced and are discernable as formants. The visual representation of a vowel (on a spectrograph) allows us to see the characteristic acoustic features, which are formants.

Essentially, we find chunks of energy clustered in certain frequency areas. Formants is the technical term often used to intimidate graduate students or parents. In the case of vowels, two formants combined together, will create a characteristic vowel and are the most important for intelligibility. The first one, the lower-frequency energy chunk, is referred to as the first formant (F1) and the second one, being the higher frequency energy chunk, is referred to as the second formant (F2). These formants are not unlike the classic sounds we are familiar with of air resonating within soda bottles. Different sized bottles produce different pitches caused by the frequency which the air resonates due to the size of the cavity. The submarine indicates 1 soda bottle for each of the two formants for the vowels which are included in the "Six Sounds". Anatomically, as the air passes up through the system responsible for the production of speech, it will pass through the large throat cavity resonating and creating F1 and also resonate in the cavity of the mouth where F2 is produced. When we examine the vowel formant chart (figure 1) and realize that there are vowels which share the same first formant, we can understand that it is the second formant which will make vowels sound different when they share the same F1. The problem arises for children who are hearing impaired who don't detect F2 and therefore have discrimination problems for vowels which vary only by the second formant e.g. [u] and [i].

Diphthongs are another commonly used technical term for two vowels which are glided together at a normal rate of speech. Examples of these include: [aI] – pie [au] - cow [⊃I] - toy [e I] – day.

The vowels and diphthongs in ongoing speech and language carry the nonsegmental information which are the prosody, rhythm, rate and stress. They also will carry accents and dialects.

Consonants Classified by Manner of Production, Place of Production and Voicing

Voicing Bilabial Labiodental Linguadental/Interdental Alveolar Palatal Velar Glottal
Plosives/Stops Voiced b, b d, d g, g
Unvoiced p, p t, t k, k
Fricative Voiced v ð z ʒ
Unvoiced f θ s ʃ h
Nasals Voiced m n ŋ
Affricates Voiced
Semi-vowels Voiced w j
Liquids Voiced l,r
Figure 2 *Adapted from Ling (1996) Acoustic, Audition and Speech Reception (AVI)

The term transition is often used when discussing speech acoustics and perception. Transition means how one sound gets connected or "transitioned" to another sound. Very often this is simply how we produce consonant-vowel (C-V) or consonant-vowel-consonant (C-V-C) strings of phonemes.

When we begin to examine the consonants, we consider the three different parameters for defining a given phoneme – manner, place and voicing. (figure 2 and 3).

The manner of a consonant refers to "how" the sound is produced, in other words the "MANNER" in which it is produced. The manners we need to know are:

International Phonetic Alphabet Symbol Key

[b] bat
[b] cab
[d] do
[d] nod
[f] fill
[g] go
[g] dog
[h] how
[j] yes
[k] key
[k] lick
[l] low
[m] me
[n] no
[ŋ] sing
[p] pat
[p] up
[r] red
[s] so
[t] to
[t] hat
[v] very
[w] we
[z] zoo
[θ] thin
[ð] that
[tʃ] child
[dʒ] jam
[ʃ] shoe
[ʒ] casual
Figure 3

The place of a consonant is, the place where the sound is made. The terms are somewhat confusing for parents, but basic definitions of the anatomical place will help. (see figure 4)

Figure 4
  1. Bilabial - both lips
  2. Labiodental - lips and teeth - (lower lip and upper front teeth)
  3. Linguadental - tongue tip and teeth
  4. Alveolar - tongue tip and ridge and alveolar ridge
  5. Retroflex
  6. Palato-Alveolar
  7. Palatal - front of tongue (tongue blade) and hard palate
  8. Velar - back of tongue (tongue dorsum) and hard palate
  9. Glottal - originating at the vocal cords
bilabial plosive/stop [b] [p] bay - pay
labiodental fricative [v] [f] vat - fat
linguadental fricative [ð] [θ] thy - thigh
alveolar stop [d] [t] doe - toe
Figure 5

Voicing is the feature which refers to the movement of the vocal cords and if they are actually vibrating when the sound is produced. Parents typically assume that if you are using your voice that the vocal cords are vibrating. Having the parents place their hand on the front of their throat while producing "ah" allows them to feel the vibrations of the vocal cords. With a hand still on the throat, they produce an "sss" and instantly understand that, you can speak without "voicing". Whispering an entire sentence makes this point clearly as well. Figure 5 illustrates some of the previous information which we have discussed thus far.

A useful exercise with parents is to complete a plot of the acoustic properties as areas on an audiogram. We draw vertical lines in different colors for the different features we have discussed. Those acoustic properties typically included are listed in Figure 6. There are multiple sources for more detailed information including Ling 1989, Foundations of Spoken Language for Hearing Impaired Children (page 69 figure 3-7).

Acoustic Cues for Speech Features

6 Sounds

  • hearing within the speech range - ability to hear/detect all parts of speech


  • minimum of up to 750 Hz


  • mainly low frequency information
  • information is carried by the vowels
  • can learn to discriminate if have hearing up to 1000 Hz

Vowels & Diphthongs

  • up to 1000 Hz to detect
  • 3000 Hz to discriminate

Manners of production

  • 750 Hz - nasals vs. plosives - nasal murmur at about 300 Hz
  • 1500 Hz - laterals liquids r, l
  • 2000 Hz unvoiced plosives - p, t, k
  • 4000 Hz fricatives and affricates up to, no information below 1500 Hz

Place of production

  • 2nd formant transition and burst frequency-at 1500-4000 Hz


  • principally - low frequency information - 500 Hz and below, and duration and intensity differences
  • 750 Hz for vocal cord vibration
  • 3000 Hz for unvoiced
  • Voiced at least 1 formant present at or below 700 Hz, voiceless no energy below 1000 Hz
Figure 6

Essential Principles

It is important to realize that low frequency information is typically easier and more accessible to children who are hearing impaired, without technology or who are using hearing aids. Cochlear Implants typically provide good access to the high frequencies.

The information in the low frequency range would include: nonsegmentals; F1 of vowels and some F2 information; most consonant manner information; nasal murmur; and consonant voicing cues. In contrast, high frequency information is typically more difficult and less accessible. The information in the high frequency range would include: remaining F2 of vowels; consonant place cues; and consonants of the fricative manner of production.

We can look at an audiogram as more than merely detection. An audiogram contains some secrets of prediction potential and we can postulate a "functional audiogram". Sometimes this can be done based on a detection audiogram or sometimes the reverse process allows part of the audiogram to be predicted based on skills. A child’s perfromance, although not necessarily a comprehensive picture, can help us to make some reasonable assumptions about an audiogram.

Following are some profiles of abilities and difficulties. I ask the parent to show me the general predictive configurations of the hearing loss. They bend their water line for the submarine (blue pipecleaner) and place it on an audiogram for consideration.

Example 1

  • nonsegmentals - OK
  • vowels – F1 OK, F2 differences not able
  • consonants – manner – nasals and plosive others
  • consonants – voicing – inconsistent
  • consonants – place

Example 2

  • nonsegmentals
  • vowels - good variety some [u] [i] confusion
  • manners – good except for fricatives
  • voicing
  • place – problems especially with fricatives, plosives
  • manners – unable to discriminate fricatives

Example 3

  • low frequency vowels OK
  • fricatives - good development
  • mid frequency vowels - not as much variety
  • nasals - poor and some manner - very difficult [m, l, b]

Example 4

  • good articulation of high frequency sounds [ʃ] [s] [ tʃ ] [k]
  • very poor control of nonsegmentals
  • no nasals developing
  • elongations
  • poor voice/voiceless discrimination and development

One way to get information from an audiogram and vica versa to get a functional audiogram from acoustic feature information is by looking at the acoustic features of specific types of speech sounds and the frequncy areas with which they are associated. Another way is to look at the language which is dependent upon specific speech sounds which we have already identified and explored within the context of the audiogram. Given that some grammatical morphemes are dependant on specific phonemes, we can look at the error patterns and difficulties as related to language. Based on the areas of difficulty, we can examine the critical elements and associated frequencies on an audiogram to help determine acoustic features of speech and their location on an audiogram. Through this we can determine the underlying issue for poor development of specific linguistic structures.

Example: a child is not producing plural nouns and is potentially having difficulty with the detection or discrimination of [s] in running speech. That same child would likely have difficulties with the present tense third person singular verb tense (he talks, she sleeps). Example: a child is not developing irregular past tense verb forms or connecting the past tense with the present tense form (drink vs. drank, eat - ate, throw - threw, catch - caught, drive - drove, write- wrote, read - read, sit - sat, stand – stood). One needs to look at the critical phonemes and examine the formants of the vowels up to 3000 Hz.


Knowing some of the mysteries of speech acousitics and demystifying them for families can clearly add to their understanding. Knowledge of the child’s auditory potential provides significant information about speech and spoken language development. This knowledge can help to unlock some of the mysteries of the audiogram when testing has not been completed or when it has been less than completely successful for the child.

A solid understanding of a child’s auditory potential derived from a good audiogram gives parents a feeling of control and empowerment.


Erber, N.P. (1982) Auditory Training. Washington, D.C.: The Alexander Graham Bell Association for the Deaf.

Ladefoged, Peter (1982) A Course In Phonetics New York: Harcourt Brace Jovanovich, Inc.

Ling, D.(1976) Speech and the Hearing-Impaired Child: Theory and Practice Washington, D.C..: The Alexander Graham Bell Association for the Deaf.

Ling, D. (1989) Foundations of Spoken Language for Hearing-Impaired Children Washington, D.C.: The Alexander Graham Bell Association for the Deaf.

Ling, D. & Ling, A. (1978) Aural Habilitation - The Foundations of Verbal Learning in Hearing-Impaired Children Washington DC: The Alexander Graham Bell Association for the Deaf.

Mischook, M. & Cole, E. (1986): Auditory Learning and Teaching of Hearing-Impaired Children. In E. Cole and H. Gregory (Eds.), Auditory Learning Volta Review 88(5)67-81

Pickett, J.M.(1980) The Sounds of Speech Communication A Primer of Acoustic Phonetics and Speech Perception Baltimore: University Park Press.

Rotfleisch, S. Soda Bottles & Submarines: essential speech acoustics, The Listener, Summer 2000, 51-56

The author acknowledges Mary Eager Koch for initial collaboration on Soda Bottle & Submarines.