ALIS Newsletter - March 2014 (Plain Text Version)

Return to Graphical Version

In this issue:
Leadership Updates
• LETTER FROM THE EDITORS
• UPDATE FROM ITAIS CHAIR
• UPDATE FROM ITAIS INCOMING CHAIR
• UPDATE FROM ALIS CHAIR
• UPDATE FROM ALIS INCOMING CHAIR
Articles
• STANDARDIZED ASSESSMENTS AND ITA CERTIFICATION PROGRAMS
• IMPROVING DISCOURSE INTONATION FOR INTERNATIONAL TEACHING ASSISTANTS: HOW SLOW THOU ART
• WHAT WE CAN LEARN FROM HIGH-PROFICIENT TEST TAKERS' RESUMPTION STRATEGY
• REVISITING PRACTICE: A MODEL OF CONVERSATIONAL INVOLVEMENT FOR THE ITA CONTEXT
GRAD STUDENT CORNER
• INTERNATIONAL TEACHING ASSISTANT VOICES
ABOUT THIS COMMUNITY
• CLOSING LETTER FROM ITAIS EDITOR
• CLOSING LETTER FROM ALIS EDITOR

Articles

STANDARDIZED ASSESSMENTS AND ITA CERTIFICATION PROGRAMS

Timothy Farnsworth, Hunter College, CUNY, New York, New York, USA

This brief article discusses several oral assessments used for ITA screening. Although they have some similarities, they each differ in important respects. Table 1 gives some information on the assessments mentioned in this article.

Table 1: Some Oral Assessments used for ITA Certification

Name	Primary Purpose	Publisher	Format	Score range
TOEFL iBT Speaking	University Admissions	Educational Testing Service	Test Taker (TT) responds to computer prompts; responses recorded	Speaking scores from 0-30
SPEAK	Mostly for health care and ITA certification	Retired (formerly Educational Testing Service)	TT responds to paper / audio prompts; responses recorded	0 – 60 in 5 point increments
Versant	Mostly for initial employment certification	Knowledge Technologies Pearson, in.	TT responds to computer prompts; responses scored immediately	Total scores 20-80; Sentence Mastery, Vocabulary, Fluency, Pronunciation subscores

Four studies have examined the TOEFL iBT Speaking test with respect to ITA screening. Wylie and Tannenbaum (2006) conducted standard-setting sessions to establish minimum recommended TOEFL Speaking cut scores for ITA screening, and to establish a TOEFL Speaking equivalent to the Test of Spoken English (TSE) score of 50, from a possible range of 20 to 60 on the test. Xi (2007) compared TOEFL Speaking scores with scores on locally administered ITA exams at four universities and investigated the potential for various cut scores to minimize false negative and false positive results. She found a wide range of correlations between TOEFL Speaking and locally administered exams and concluded that the degree of correlation between TOEFL Speaking and the locally administered tests was partially a function of whether the local exams attempted to measure aspects of teacher competence. Farnsworth (2013) examined the construct validity of TOEFL Speaking for this purpose by comparing scores on TOEFL Speaking with an in-house teaching performance test, the Test of Oral Proficiency (TOP) at the University of California, Los Angeles, finding that the two tests indeed measured the same speaking factor to a great extent. Finally, Lim et al. (2012) looked at the validity of using TOEFL Speaking scores for ITAs. Their criterion measure was the SPEAK test. They found moderate correlations between the two tests, but they did not find a cut score at the high end accurate enough to exempt candidates from the SPEAK requirement.

In order to investigate the current state of practice regarding TOEFL iBT Speaking use in ITA programs, Farnsworth (2012) conducted an online survey upon which this article is based. Coordinators of ITA assessment programs were asked about the makeup of their institutions and the size of their ITA population. They were then asked to describe their ITA certification policies and specifically their policies regarding the TOEFL Speaking test. Seventeen participants responded to the survey. Participants responded from overwhelmingly research-oriented institutions, with only one participant reporting his or her institution as more teaching oriented. Most respondents were from large research institutions.

The institutions very often used the TOEFL Speaking score as a prescreening measure to exempt high-scoring students from an in-house performance test. Nine institutions (of seventeen) implemented the TOEFL Speaking test in this way, with cut scores ranging from 23 to 28 points. An example of a typical response was the report on the policies of Purdue University. Purdue accepts scores of 27 or higher on TOEFL Speaking as evidence of ITA language competence. Students with lower scores must take an in-house teaching performance exam to be certified. Oklahoma State University, another large public research university, reports the same policy (a high TOEFL Speaking score exempts students from the local performance test) but with a cut-off score of 26 instead of 27. Both participants reported the cut-off score “working well” as an initial measure. Cornell University has an identical policy but requires a 28, based on a perception that lower scores are “all over the place” but that a very high TOEFL score is a reliable indicator of ITA proficiency.

Only five survey participants reported SPEAK test use, and only two of these relied exclusively on SPEAK to make these decisions, with the other three SPEAK users also accepting TOEFL Speaking scores. Since the SPEAK test has been the primary tool used for ITA assessment over the past two decades, this may represent a fairly major change. One institution reported interest in moving from SPEAK to the Versant English Test, a fully automated computer-scored oral assessment that has been the subject of much debate in the testing literature during the past decade.

Respondents to the survey reported mixed impressions of TOEFL Speaking use in practice. Some respondents, who utilized the scores, reported that “it seems to work” or “as an initial measure, it works well,” whereas others reported that the TOEFL did not measure the appropriate skills. For example, one participant reported:

The iBT cannot replace our test since the iBT does not look at teaching skills, awareness of U.S. classroom, ability to use their language skills to successfully convey information to learners. However, based on our own analysis of past ITA tests, we now allow students with iBT ≥ 26 to be tested by just one rater rather than 4.

Other respondents reported varying degrees of satisfaction with and confidence in TOEFL Speaking scores, saying that scores in the middle range are less useful as predictions of ITA communicative success. For example, one participant said, “We see that there are correlations at the higher levels of the iBT with oral proficiency tools such as the OPI. Anything below a 24 is all over the place.” Overall, the survey results indicate that TOEFL Speaking is in fact widely used to make these decisions.

Of course, the practical advantages of using TOEFL iBT Speaking scores for ITA certification will be obvious to any ESL or testing program coordinator; TOEFL scores will in most cases be already available from the institution’s admissions department, and elimination of in-house performance testing could save substantial resources. One major advantage of using TOEFL iBT Speaking would be that incoming students and departments could make ITA decisions in advance of student intake. Clearly, though, not enough is known about how using these scores instead of an in-house measure may impact programs, students, and departments. All but one (Lim et al., 2012) of the TOEFL-specific studies described in this paper rely on experimental testing, and operational TOEFL Speaking scores may well have properties quite different from those derived from experimental studies, due to practice effects or other issues.

In terms of practical recommendations to ITA program coordinators, the following may be tentatively concluded. The TOEFL Speaking test seems to measure the language needed by ITAs to a certain extent, enough so that very high iBT scores may be useful for ITA certification or that very low scores might be sufficient evidence to prevent candidates from teaching. There is no definitive answer as to the ideal cut score, however, and different institutions may decide on more lenient or more stringent cut scores depending on demand for ITAs, local resources, and other factors. Available research in addition to the anecdotal evidence gathered from the survey indicates that cut scores of 26 or higher seem to minimize the danger of false positive classifications (candidates who pass but are not truly qualified). A cut score of 28 would probably result in very few false positives (Xi, 2007), but relatively few candidates are likely to reach this high bar. The limited available research and anecdotal evidence suggest that scores in the range between 22 and 25 do not predict ITA success with sufficient accuracy (Lim et al., 2012) and local performance measures should be used.

References
American Psychological Association, American Educational Research Association, & National Council on Measurement in Education. (1999). Standards for educational and psychological testing. Washington, DC: American Educational Research Association.

Farnsworth, T. (2012, April). TOEFL iBT Speaking for ITA certification: State of practice and outstanding validation questions. Paper presented at the annual meeting of the Language Testing Research Colloquium, Princeton, NJ.

Farnsworth, T. (2013). An investigation into the validity of the TOEFL iBT Speaking test for international teaching assistant certification. Language Assessment Quarterly, 10, 274–291.

Lim, H., Kim, H., Behney, J., Reed, D., Ohlrogge, A., & Lee, J. E. (2012, March). Validating the use of iBT Speaking scores for ITA screening. Paper presented at the TESOL Annual Convention and Exhibit, Philadelphia, PA.

Wylie, E. C., & Tannenbaum, R. J. (2006). TOEFL academic Speaking test: Setting a cut score for international teaching assistants (ETS Research Memorandum RM-06-01). Princeton, NJ: ETS. Xi, X. (2007). Validating TOEFL Speaking and setting score requirements for ITA screening. Language Assessment Quarterly, 4, 318–351.