Reading Log # 1: Bachman, L. F. and A. S. Palmer (1996) Language Testing in Practice (pp. 3 - 15)
Oxford: Oxford University Press
This chapter begins by discussing some common misconceptions regarding language tests.
The most common is that there is an 'ideal' test that is applicable in all situations,
or even in one specific situation. This view regards language proficiency as "a set of
finite components - grammar, vocabulary, pronunciation, spelling" (p. 4), which can be
tested in a similar way for all testees.
It cannot be assumed that a testee's results on one test provides a universally
valid indicator of language knowledge or skill. There are many contexts for language
learning and language use, each of which draws on different language knowledge and skills.
Language learners also have many different reasons for studying and using English.
For a test to be valid, especially in terms of predicting future performance, it needs to be
designed in terms of the language learner's specific future language needs.
Even a single test designed for a specific group is usually unable to provide a sufficiently
detailed assessment of general ability. The example given in this chapter (p. 6) is that of a test for
university teachers. It's results grouped the testees mainly in terms of reading ability,
regardless of variations in speaking and listening abilities. In the above example, the addition of a dictation
test attempted to assess listening skills. It was unsuccessful in terms of face validity
(test takers and users complained that it was artificial),
as well as operational validity (test performance was not representative of interactive, conversational listening).
This chapter begins by presenting the issues of assessment through an interesting anecdotal example.
This stresses the main point, that language testing is not an abstract science, but a practical skill
requiring informed judgements. The chapter goes on to briefly list ways in which language testing
contributes to course management at various stages. These are:
- clarifying and evaluating instructional objectives
- providing information on students' strengths and weaknesses determining suitable materials and activities
- determining student readiness for a further stage of instruction
- assigning grades based on achievement
- providing feedback on a teaching program's effectiveness
The remainder of the chapter focuses on one fundamental principle of language testing, that is, the
correspondence between language test performance and language use outside the test situation.
A framework is presented which identifies two characteristics, common to both language test performance
and real-life language use. The first concerns the specific characteristics of the language tasks to be tested.
The second concerns the characteristics of the individual language users and test takers, including their topical
knowledge, affective schemata and language ability. When using this framework to develop or select language tests,
it is important to "demonstrate the correspondences between both the characteristics of the language use situation
and those of the test situation and tasks." (p. 11)
Comments:
This chapter is mainly concerned with issues of test validity - how to ensure a test measures what it says it does.
More specifically it provides a useful framework for approaching the issue of operational validity, or the extent
to which test performance represents language use in the target situation. This framework counters an approach that
sees language as an abstract set of universal skills. It gives the characteristics of individuals equal weight, not
only in terms of language ability, but also with regards to feelings and emotions ("affective schemata"). In other
words the conditions of language use in the target situation (ie. business negotiation, university study), need to
be accounted for in the test situation. The individual characteristics of the test takers are also important with
regards to bias for best in test design - making sure that tests "facilitate, rather than impede, test takers'
performance." (p. 12)
Reading Log # 2: Swain, M. (1984) "Teaching and testing communicatively." TESL Talk, pp. 7 - 18.
This article was published seventeen years ago, at a time when "communicative" was still a buzz-word in language
teaching circles. It points out a predominance of non-communicative tests, and stresses the need for tests that
will complement the communicative approach to language teaching.
Three general principle of communicative testing are proposed.
The first is to "start from somewhere". This theoretical ground is provided by communicative competence, a model
composed of four knowledges/skills: grammatical, sociolinguistic, discourse, and strategic.
The second principle is to "concentrate on content". The four main characteristics of test
content are that it should be motivating, substantive, integrated and interactive. Each of these is a key aspect of
communicative teaching methodology, and shows how closely teaching and testing are connected in Swain's model.
The third principle is to "bias for best". This means eliciting a testee's best performance. In the communicative test
model developed by Swain it involves providing the following:
- more than adequate time to complete a task
- opportunity to review and change work
- access to reference materials
- checking that testees are on task
- clear instructions including what is being tested
- useful suggestions about how to do the task.
The scoring procedures that Swain developed with her test included both "objective counts and subjective judgements".
The scoring criteria was based around the four components of communicative competence:
- Grammatical competence
- producing a structured comprehensible utterance (including grammar, vocabulary, pronunciation
and spelling).
- Sociocultural competence
- using socially-determined cultural codes in meaningful ways, often termed 'appropriacy'
(ie. formal or informal ways of greeting).
- Discourse competence
- shaping language and communicating purposefully in different genres (text types), using
cohesion (structural linking) and coherence (meaningful relationship).
- Strategic competence
- enhancing the effectiveness of communication (ie. deliberate speech), and compensating
for breakdowns in communication (ie. comprehension checks, paraphrase, conversation fillers).
Swain states that her model of testing is directly inspired by communicative teaching in the classroom setting. The only
significant difference between the two is that in a testing situation the teacher will "step back as a participant" in
order to assess the students' performance.
Comments:
Swain's model is similar in many ways to the way VCE Outcomes are assessed. The student works on a piece of writing over
time, drafting and rewriting. A drawback of the previous CAT (curriculum assessement tasks) was that students were able
to hand in work done outside the classroom setting. This took away the stress of the examination setting, but prevented
montoring how much assistance the student obtained from outside sources.
One of the conditions of Swain's communicative testing model is in fact to give testees suggestions about how to
do the task. Surely, though, the actual amount and kind of help needs to be decided in advance, and provided to all
testees in a similar way. If students are assisted in different ways, in different contexts, then such a test would
have limited validity as a comparative measure of achievement. This is quite important, given that one of the main
reasons for testing is selection of candidates for courses, jobs, advancement or awards.
Another factor that needs to be taken into account, is the unpredictability of communicative tasks that involve
exchange of information and negotiation of meaning. On the other hand, the more that the responses available in the
communicative task are controlled by the testing material and situation, then the closer such a test comes to a traditional
non-communicative test. A balance is needed to ensure that all candidates provide comparable responses that can be assessed
using the same assessement criteria.
Testing communicatively in this way aims at reproducing real-life conditions of communication in the test situation.
The major drawback of this approach is in the area of practicality. To ensure bias for best, it requires sufficient
time and testee support at all stages. As Swain makes clear at the end of the article, this is best done in the classroom
itself, as an extension of a communicative course of instruction.
Reading Log # 3: Weir, C. J. (1993) Understanding and Developing Language Tests, New York: Prentice Hall. (Chapter 2, "Testing spoken
interaction", pp. 30 - 63)
In order to test speaking we need to ask this question: What are the features of spoken language? Weir presents a
three-part framework consisting of: operations, conditions, and output quality.
The operations of speaking are categorised as either routine (standard ways of presenting information and of
interacting) or improvisational (negotiation of meaning as well as interaction management such as turn-taking and
agenda management). These categories are borrowed from the work of Bygate, and provide a dual approach to the testing
of speaking. In my own opinion it would be desirable to design tests with different ratios of routines and improvisation.
An emphasis on structured routines for lower level testees, and greater opportunity for improvisation for higher testees.
The conditions of speaking are next on the list of features considered by Weir. There would seem to be an infinite
range of conditions under which speaking takes place - time constraints, noise interference, and of course social context
including the purpose of communication and the relative status of the interlocutors. As Weir points out there is a balance
to be met between authenticity and practicality here. Ideally a test would consist of a hidden cameras and microphones in
the style of a 'Big Brother' reality show, and in fact with technology advancing constantly, this may well be an option
sometime this century.
Despite this imaginary future, we still have to struggle with the enduring concept of "test conditions" that is at the
heart of our educational system. An important part of this concept is that conditions must be equal between all candidates
to ensure that test results are reliably consistent between candidates and over time. Without consistency of test conditions
the very nature of the test as a measurement of ability and progress becomes meaningless. And without such measurement,
assessment cannot contribute to the system of awards and advancement that the education system serves.
It is in this context that the communicative approach to testing must tread very softly. Weir states that "every attempt
should be made to simulate reality as closely as possible." (p. 38) The important word to be stressed here is "simulate".
An assessment activity must necessarily be different from reality because the conditions of real life are not consistent,
nor are they biased to encourage best performance.
The third feature of spoken interaction is output quality. This requires us to ask, what criteria do we use to
judge spoken performance? As Weir points out, real speech is characterised by "self-correction, false starts, repetition,
rephrasing and circumlocution." (p. 40) No one is a 'perfect' speaker. In Weir's opinion such "compensation features"
should not be included in assessment "to the detriment of candidates." Weir's suggestions for assessment criteria instead
requires us to focus solely on the positive features of speech - fluency, appropriateness, organisation, management, and
range of language.
In my own opinion, it is probably the extent of compensation features mentioned above that are indicators of communicative
language ability. The abscence of these features would imply that the test conditions are not at all communicative, but
simply mechanical. Accordingly I would like to see Weir include theses features into his marking scheme in a positive
sense as flagging increased flexibilty and ease with language.
The section of this chapter examining test formats (pp. 47 - 63) is extremely well set out and provides a wealth of
observations on the various characteristics of different formats. My own feeling is that many of these differences
are simply academic. Every format is an artificial construct. As such, it comes with a set of rules that must be made
clear to the testee. The testees sole motivation will always be to play by the rules of the particular format in order
to succeed in the test. Viewed in this way, all formats of spoken test are equal. They are all constructs that isolate
and simulate a particular aspect of communicative language use.
Testing spoken language is ultimately a kind of game - it needs clear boundaries in which to play, and clear criteria by which to
judge success. And if the philosopher Ludwig Wittgenstein was right, language assessment is simply one game within the
larger game of language itself.