ITAIS Newsletter - February 2015

February 2015

ARTICLES

VOICE RECOGNITION SOFTWARE: SAVING TIME AND STREAMLINING SPEECH ANALYSIS

Lara Wallace, Ohio University, Athens, Ohio, USA

Background

Audio recordings and transcriptions are recommended for ELLs working to improve their spoken English intelligibility (Gorsuch, Meyers, Pickering, & Griffee, 2013). By analyzing these transcriptions, students can understand what they are not communicating clearly and can more easily see how they need to improve their communication. For students, international teaching assistants (ITAs) in particular, who think that their speech is sufficiently comprehensible, these transcriptions may serve as a wake-up call that they have room for improvement (Wallace, 2013).

While this activity provides insight into a student’s speech, transcribing the speech itself is not a pedagogical goal, and because it is time-consuming, many students do not do it well; some even skip this crucial step. With Google’s voice to text software, however, many students are now able to save time and receive feedback instantly.

In my TESOL 2015 presentation, the audience will learn how to record and have speech transcribed simultaneously, how to correct the transcript, and how to mark the transcription for features of discourse intonation. The audience will then analyze parts of a transcription to learn what pronunciation issues the student may face. Finally, suggestions will be given on how this assignment may be graded.

How to Record and Have Speech Transcribed Simultaneously

As of now, it is still necessary to use two different applications for recording and transcribing. Due to the complexity and coordination required to complete this step successfully, it is recommended that this activity be done during class so the instructor can offer guidance and train the learners in the effective use of the technology (Hubbard, 2013). Furthermore, the audio should be recorded through headsets, because the quality of the transcription will suffer due to ambient noise from using the built-in microphone.

Steps

1. Open the Google Web Speech API Demonstration. When you speak, Google Web Speech will transcribe what you say. Do a test run of this first by clicking the microphone icon (you may need to use the Chrome web browser; you may need to “allow” the microphone). It is working when there is a pulsing red dot behind the microphone icon. Your words will appear in the box as you speak.

2. Open an audio recording application to record your voice—Audacity or QuickTime are reliable—and make a test recording to ensure that the quality is clear. If you use Audacity, make sure that you are able to export the file as an MP3, because computers that have not installed Audacity cannot read .AUP (Audacity-specific) files. If you cannot export the file or save it as an MP3, you will need to download LAMElib first. This will enable you to export the AUP file as an MP3, which can be played in other programs.

3. Resize your windows so that you can see both your browser and audio recorder. You will speak for 1.5–2 minutes on the topic of choice (sample topics below). Once you are ready to begin, click the microphone icon on the browser, then click the record button on the audio recorder. Click “stop” when you are finished recording, then click “copy and paste” on your web browser (below the text box). Here is a sample topic of introduction:

Give your full name and your name preference (“My name is ___, but you can call me ___”).

Say where you are from, which languages you speak.

State your major and area of study.

Discuss a hobby (something you like to do).

(If you still have time) Describe your happiest day, one of the most interesting things you have ever done, something that really has surprised you about the United States, or something you would like to do some day and why.

4. Export or save this audio recording as “YourNameBaseRec.mp3”. If you cannot save it as an MP3, MOV or MP4 files work as well. Copy and paste Google Web Speech’s transcription of your voice into a document, and name it “YourNameTranscription”.

Correcting the Transcription and Marking It for Discourse Intonation

Comparing Google Web Speech’s transcription to the corrected transcription can be both interesting and useful. For that reason, students should copy and paste the transcription twice: once for revision, and once for comparison.

Correcting the Transcription

Even with native speakers, Google Web Speech makes the occasional mistake (especially with names of people and places), so it is important to listen to the recording in order to make any corrections. This corrected transcription needs to mirror exactly what was said in the recording, and that includes the following:

Fillers (um, uh, eeh)

False starts or recasts (th-th-the, I run-I ran)

Brief pauses (,) and silences / hesitation (…)

Sentence boundaries (? . !)

Be aware that if a speaker has a very strong accent, Google’s accuracy will be low, and it may be easier for the person to transcribe his or her speech directly from the recording rather than Google’s transcription.

Marking Discourse Intonation

The transcription should not only show the reader what was said, but how it was said. Depending on the students’ knowledge and the goals of the class, the following discourse intonation features can also be marked:

Prominence (write the STRESSED words in uppercase letters)

Pitch movement at the end of a statement ( )

Tone choice or key ()

Please see Gorsuch et al. (2013) for an explanation of these features and suggestions on how to transcribe them.

Analyzing the Transcription

Students analyze their recordings and transcriptions to determine what they need to improve to be better understood when speaking. Even if they are unable to pinpoint specific segmental or suprasegmental errors, students usually notice speech rate (too fast or too slow), fluency, fillers, whether there are clear sentence boundaries, grammatical and lexical mistakes, and which words might be mispronounced. Some students find through the act of correcting Google’s transcription that even they have difficulty understanding some of what was said.

Taking it a step further by contrasting their transcription against Google’s interpretation can illuminate other areas of students’ potential miscommunication. For example, if Google Web Speech transcribed “the person page” but the student actually said “the percentage,” we could guess that the student likely did not reduce the last syllable and might have even misplaced the word stress. This is often useful as long as the students’ accents are not too strong; it should be cautioned that this activity might not be useful to every learner. Furthermore, instructors should be warned that Google Web Speech interprets whatever it hears, and does not take into consideration the context of the topic or the setting. To illustrate, a student once said “you will find that in 1983,” but Google transcribed it as “you will find sh[*]t on a 1983.” Although a possibly-offensive curse word, the discrepancy can be interpreted as the student not pronouncing the “th” or the subsequent vowel clearly, and did not link “in-1983.” The instructor can determine whether or not it is useful to compare the students’ revised transcripts with Google’s after looking at the two versions.

Once students complete their analysis, they practice improved delivery based on the analysis. For this reason, a second recording is made on the same topic, but without looking at the transcription. Afterward, students comment on how they practiced for the second recording, what they feel they improved in the second recording, and what they feel they still need to improve. This way, the instructor can gain insight into students’ practice strategies as well as their understanding of pronunciation and discourse intonation features.

Grading

Although I have assigned this activity for students to do on their own once familiar with the process, the quality of the work is often better when done as an in-class activity. This saves time for busy students, and through the instructor’s guidance, students can deepen their analysis by focusing on the topic at hand (intonation variation, phrasal stress, reduction). That said, if it is done as an in-class, grading is done at the instructor’s discretion (letter grade, complete/incomplete, comments only). If it is an assignment, instructors can use a rubric and give feedback, generalized or specific. A grading rubric will be shared at the TESOL convention during my session.

Conclusion

Because self-monitoring while speaking can be difficult, this activity allows learners to slow down and really hear and see what they said and how. Analyzing their speech for what they did well and what they need to improve helps students to better understand where they have improved and what to work on next. Furthermore, if the students’ speech is sufficiently clear, Google Web Speech can save them much time, as there is no need to transcribe from scratch.

References

Gorsuch, G., Meyers, C., Pickering, L., Griffee, D. (2013). English communication for international teaching assistants (2nd ed.). Long Grove, IL: Waveland Press.

Hubbard, P. (2013). Making a case for learner training in technology enhanced language learning environments. CALICO Journal, 30(2). Retrieved from http://journals.sfu.ca/CALICO/index.php/calico/article/view/945

Wallace, L. (2013). Taking the first step: International teaching assistants’ motivation to improve their spoken English intelligibility. ITAIS Newsletter. Retrieved from http://newsmanager.commpartners.com/tesolitais/issues/2013-08-19/1.html

Dr. Lara Wallace is a lecturer and the English Language Improvement Program pronunciation lab coordinator in Ohio University’s Department of Linguistics. She will present at TESOL on 26 March 2015 at 1 pm in room 205B at the Toronto Convention Centre.