Levis (2018) begins his analysis of research into
intelligibility with this statement: “Intelligibility is widely agreed
to be the most important goal for spoken language development in a
second language – both for listening and speaking – no matter the
context of communication” (p. 15). For the purposes of this article, let
us assume that you agree with this goal for teaching pronunciation.
Furthermore, you are in the market for a textbook developed to improve
your students’ intelligibility. What would such a textbook look like? I
submit that it would closely follow a framework that describes what a
listener must hear in a speaker’s phrases to judge them as intelligible.
It is a framework that organizes that text and informs every
lesson.
The framework proposed here is a higher order construct. It is
nevertheless grounded in the production and perception of speech as
complementary roles based on a shared understanding of how to encode and
decode speech for intelligibility. The central tenet of teaching for
intelligibility is the Golden Rule for Communication: Speak to listeners
as listeners would wish to be spoken to. Five tests or indicators can
confirm that our instruction prioritizes the listener's perspective.
Teaching learners to speak this way makes the listener’s perspective a
priority.
Indicator 1: Choice of Speech Style as the Target
Spontaneous speech is the currency of everyday verbal exchanges
and the basis for all other oral language. This choice of pronunciation
target helps to identify key characteristics of language that speakers
and listeners find most comfortable. Among its defining features is a
phrase length that is constrained by the practical capabilities of
listeners to catch and interpret a stream of airborne sound waves.
Fieldwork has shown that most spontaneous phrases are three to four
words long. Furthermore, the frequency of phrases longer than seven
words decreases sharply (Pickering, 2018).
From the start, instruction should teach learners about the
seven-word phrase limit, its importance to listeners, and how to segment
speech accordingly. Furthermore, if learners are to become users of
listener-centered language, all phrases in their practice materials
should model the one-to-seven-word range.
Indicator 2: Bipolar Signals of Importance
English speakers use phonetic tools to draw
listeners’ attention to important parts of a phrase. In this excerpt
from a conversation between two colleagues, Speaker B wants Speaker A to
notice the phrase, the Blue Bonnet State
(Texas). So Speaker B adds phonetic length to, and
changes the pitch of, the [uw] of Blue, making it a
pitch accent. Speaker B also minimizes the length and drops the pitch
and volume of the vowels surrounding [uw]—to the and Bonnet State.
Speech Example A
A: So where do you vacátion these days?
B: Well, our first trip’s usually in Márch | to the Blúe Bonnet State.
How does modulating the suprasegmentals of [uw] and nearby
vowels signal importance? The answer is, indirectly. On hearing the [uw]
of Blue, Speaker A correctly interprets it
pragmatically, not literally, and follows pragmatic rules that say (a)
“Here is a cue of importance” and (b) “Don’t look for meaning in [uw],
nor necessarily in Blue, nor necessarily in Blue Bonnet, but in the largest construction that
[uw] is a part of, namely, in the compound noun Blue Bonnet
State.”
Speakers A and B unconsciously engage pragmatic rules that
native English speakers and listeners play by. Learners of English,
unaware of English norms, play by different rules and listen for
different cues. Consequently, they do not notice these suprasegmental
contrasts nor recognize their significance to communication. This gap in
their knowledge and skill makes them poor communicators in
English.
Producing, noticing, and interpreting contrasts in vowel
suprasegmentals as signals of importance are foundational skills that
must be taught explicitly and early. Furthermore, because English, more
than most languages, requires a larger contrast between a pitch accent
and surroundings vowels, it is not enough to maximize vowel cues. To
meet listeners’ expectations of what a prominence sounds like,
instruction should always emphasize maximizing and
minimizing vowel cues.
If you have taught pronunciation for a while, you are likely to
recognize that I have been describing what many pronunciation textbooks
call the focus, a pragmatic term associated with the
primary pitch accent. It is the product of applying pragmatic rules to
interpret the phonetic cues of prominence in a phrase.
Pronunciation textbook writers use the concept of focus to
answer the key question we began with: What must English listeners hear
in speakers’ phrases to judge them as intelligible? They explain that
speakers’ primary pitch accent directs listeners’ attention to the most
important meaning of a phrase.
Indicator 3: The Expanded Focus Within the Two-Peak Profile
The definition of focus as derived from the primary pitch
accent was uncontested until spontaneous-speech researchers discovered a
second meaningful pitch accent in many phrases (Wells, 2006, pp. 8,
209ff). A circumflex identifies this accent:
Speech Example B
A: So └whêre do you vacátion┘ these days?
B: Well, └our fîrst trip’s usually in Márch┘, | └to the Blúe Bonnet State┘.
Departing from the classic TESOL hypothesis that every content
word in a phrase carries an accent, we now know that another
characteristic of nearly all unrehearsed phrases is that they have only
one or two pitch accents. That is, some content words are unaccented,
like trip and usually in Well, our fîrst trip’s usually in Márch.
We call this model of rhythm the two-peak
profile (see Figure 1). The metaphor of a mountain range in
profile led us to call the first pitch accent the anchor
peak and the second the primary peak. All
nonpeak words are in the valleys of this model
(Dickerson & Hahn, in press).

Figure 1. The two-peak profile.
This more accurate conception of English rhythm led to the
conclusion that, because pitch accents signal important pragmatic
meaning, the cue for the focus actually begins with the first pitch
accent (the anchor peak, if there is one) and ends with the last pitch
accent (the primary peak). Furthermore, to make sense of these two peaks
as a single thought, the expanded focus must include any unstressed
syllables before and after these peaks and all words in between. A
phrase with only a primary peak includes any unstressed syllables before
and after it. Syntax provides the necessary cohesion (Pickering, 2018).
The expanded focus is bracketed (└ ┘) in Speech Example B. Compared to a
focus based on the primary peak alone in Speech Example A, the expanded
focus makes the core meaning of each phrase much clearer. We are now
closer to defining what listeners must hear in speakers’ phrases to
judge them as intelligible.
In this framework, the presentation of the two-peak profile
starts near the beginning of instruction so learners have the maximum
exposure to and practice with the rhythm of spontaneous speech. We start
with the anchor-placement rule. To give learners practice placing the
anchor, we mark the primary peak in their materials. Then, when they
practice phrases with the anchor they have predicted, they produce the
complete two-peak profile.
Because many of the phrases they use have a valley between the
peaks, instruction on how to handle valley syllables comes next.
After that, we show learners how to predict the primary peak to
signal the presence of new information, contrasts, and emphasis. With
this addition, learners can predict the entire two-peak profile on their
own.
Indicator 4: Valley Compression for Listeners
Valleys deserve special attention because they play a critical
role in intelligibility. Valleys are characterized by extreme
compression, another feature of spontaneous speech. For many years, we
believed that we rush through valleys to keep peaks coming at a regular
pace. Then research showed definitively that English is not stress-timed
(Cauldwell, 2002). So why hurry? Of the three potential valleys in the
two-peak profile, only the one between the peaks is consistently part of
the extended focus. It seems likely that speakers compress its
syllables to help listeners understand the focus.
Semantically, the expanded focus is a single thought. If a
listener is to grasp it as a meaningful unit, the speaker must deliver
it quickly and in an unbroken string. Compression of valley syllables
speeds up this string so the listener can catch it in one go. Failing to
compress syllables, introducing hesitations, and adding more peaks
lengthen this valley and strain the listener’s memory. Any delay in
understanding one phrase can mean that the listener will miss all or
part of the next phrase, damaging intelligibility.
We compress valleys in a variety of ways—through assimilation,
vowel reduction, trimming consonant and vowel sounds, and linking
consonant and vowel sounds to each other. If our goal is to meet
listeners’ needs, then none of these compression techniques is optional
for speakers. They require instruction as soon as learners begin
creating interpeak valleys.
Indicator 5: Intonation to Interpret the Expanded Focus
Besides the words of the expanded focus, intonation makes its
own contribution to intelligibility. The intonation pattern associated
with the primary peak is especially important. The primary peak
announces the start of the final intonation pattern, alerting listeners
to significant upcoming pitch information. By tuning in to the complete
intonation pattern, listeners learn how to interpret the expanded focus:
Is the speaker concluding or preparing to say more? Is the speaker
making a statement or a query? Intonation is part of what listeners must
hear in order to understand a speaker’s phrase.
Conclusion
The framework for intelligibility-based pronunciation
instruction presented here is a higher order pragmatic structure. It
encompasses the words of the extended focus and its complete intonation
pattern. It describes what listeners are listening for in a speaker’s
phrases and helps teachers and students keep their eyes on the goal of
all their pronunciation efforts. It also motivates all work to control
lower level building blocks of the framework, such as consonant and
vowel articulations, consonant clusters, key allomorphs, and the stress
of words and constructions. They, too, are critical for the
intelligibility of a phrase.
References
Cauldwell, R. (2002). The functional irrhythmicality of
spontaneous speech: A discourse view of speech rhythms. Apples
2, 1–24.
Dickerson, W., & Hahn, L. (in press). Speechcraft: Discourse pronunciation for academic
communication (2nd ed). Ann Arbor, MI: University of Michigan
Press.
Levis, J. (2018). Intelligibility, oral communication,
and the teaching of pronunciation. Cambridge, England:
Cambridge University Press.
Pickering, L. (2018). Discourse intonation: A
discourse-pragmatic approach to teaching the pronunciation of
English. Ann Arbor, MI: University of Michigan
Press.
Wells, J. (2006). English intonation: An
introduction. Cambridge, England: Cambridge University
Press.
Wayne Dickerson is professor emeritus in the
Department of Linguistics at the University of Illinois
(Urbana-Champaign), where he directed the MA program in TESOL and taught
courses in English phonology and ESL pronunciation. He researches and
writes on pedagogical applications of phonetics, pronunciation pedagogy,
and the value of orthography for learners. |