Assessment for Learning in a Chinese University Context: A Mixed Methods Case

Study on English as a Speaking Ability

by Yang Song

Department of Integrated Studies in Education

McGill University,

Montreal

2011

A thesis submitted to McGill University in partial fulfilment of the requirements of the degree of Master

© Yang Song, 2011

i

Abstract

This study investigates the effectiveness of Assessment for Learning (AFL) in improving oral English skills and explores students‘ and teachers‘ perceptions of AFL.

The study took place at a university in and involved both students and teachers of English at the institution. Chinese university level students were reported to be facing difficulties in their oral skills learning and were not satisfied with the oral

English instruction they were receiving because it is related too much to large-scale tests administered in China (He, 1999; Liao & Qin, 2000; Wen, 2001).

Classroom-based assessment, known as the alternative assessment approach, has attracted increased interest from researchers since the end of the last century (Genesee

& Upshur, 1996; Gipps 1999; Shepard, 2000; Turner, in press). One approach to classroom-based assessment, Assessment for Learning (AFL), has proved a significant influence on language performance by encouraging learners‘ participation, identifying learners‘ weaknesses, providing instructors with useful feedback for learners‘ further development, and turning learners into autonomous learners (Black & Wiliam,

1998ab; Black, Harrison, Lee, Marshall, & Wiliam, 2003; Winne & Butler, 1994;

Topping, 2009). In this study, a mixed methods design incorporating both quantitative and qualitative methods (Creswell & Plano Clark, 2009) is used to examine the effectiveness of AFL and to explore teachers‘ (n = 9) and students‘ (n =

74) perceptions of AFL. There are three phases in this study: the preparation phase, and Phases One and Two. In the preparation phase, second year students‘ and their

ii

teachers‘ classroom interactions were observed to aid in the selection of participants for this study. In Phase One, teacher questionnaires, pre- and post-study student questionnaires, and three AFL tasks were employed and data were collected and analyzed quantitatively by using descriptive statistical analysis to determine the effectiveness of AFL. In Phase Two, teachers and students were interviewed to express their opinions about AFL. The interviews were translated from Chinese to

English, transcribed and then analyzed using content analysis. The results from the three phases were integrated to interpret the findings of the research. The results indicate that AFL can effectively improve the oral learning skills of intermediate and high level students. Additionally, results of the study demonstrate that both teachers and students showed positive attitudes towards AFL.

iii

Résumé

Cette étude a pour but d'évaluer l'efficacité de ―l'évaluation des apprentissages‖

(AFL) à améliorer les habiletés en anglais oral ainsi que la perception qu'ont les

étudiants et les professeurs de l'AFL. Cette étude prend place dans une université de

Chine et concerne des étudiants et des professeurs d'anglais dans cet institution. Les

étudiants dans les universités chinoises ont été reporté ayant des difficultés dans leur capacités d'apprentissage orale et n'étaient pas satisfaits de l'enseignement qu'ils recevaient puisqu'il était trop orienté vers les tests à grande échelle administrés en

Chine (He, 1999; Liao & Qin, 2000; Wen, 2001). Classroom-based assessment, connu comme étant l'approche alternative, a, depuis la fin du siècle dernier, de plus en plus attiré l'intérêt des chercheurs (Genesee & Upshur, 1996; Gipps 1999; Shepard, 2000;

Turner, in press). Une approche du classroom-based assessment, l'évaluation des apprentissages (AFL), a été prouvé ayant une influence sur la performance linguistique en augmentant la participation de l'étudiant, en identifiant leurs faiblesses, en fournissant à l'instructeur de l'information utile pour le développement future de l'étudiant et en changeant la perception de l'étudiant pour en faire un étudiant autonome (Black & Wiliam, 1998ab; Black, Harrison, Lee, Marshall, & Wiliam, 2003;

Winne & Butler, 1994; Topping, 2009). Dans cet étude, une méthode mixte incorporant des méthodes quantitatives et qualitatives (Creswell & Plano Clark, 2009) est utilisé pour étudier l'efficacité de l'AFL et explorer la perception des enseignants

(n=9) et des étudiants (n=74) de l'AFL. Il y a trois phases dans cet étude : la phase de

iv

préparation et la phase un et deux. Dans la phase de préparation, les étudiants de deuxième année ainsi que l'interaction des enseignants avec leur classe pour aider à la sélection des participants à l'étude. Dans la phase un, les questionnaires des enseignants, les questionnaires des étudiants ex-ante et ex-post et trois tâches de l'AFL ont été employés et leur données ont été collectés et analysé quantitativement en utilisant une analyse statistique descriptive de façon à déterminer l'efficacité de l'AFL.

Dans la phase deux, les enseignants et les étudiants ont été interviewés de façon à collecter leurs opinions sur l'AFL. Les interviews ont été traduits du chinois à l'anglais, ils ont été transcrits et leur contenu à par la suite été analysé. Les résultats des trois phases ont été intégrés de façon à pouvoir interpréter les résultats de l'étude. Les résultats indiquent que l'AFL peut en effet augmenter la capacité d'apprentissage de l'anglais oral des élèves de niveau intermédiaire et élevé. De plus, les résultats montrent que les enseignants et les étudiants réagissent positivement à l'ALF.

v

Acknowledgements

Foremost, I thank my supervisor, Professor Carolyn E. Turner. Working under her has been a most enriching experience during the course of the Master program. All chapters of my thesis have benefited from her suggestions. I thank her for her advice, patience, encouragement and, most of all, for putting her faith in my ability to undertake the challenges.

I am also particularly grateful to Professor Mela Sakar and Professor Caroline

Riches who read my thesis as the second reader and the external reader. Their perspectives on my topic significantly improved my thesis on numerous occasions.

I would like to express my gratitude as well to Beverley Baker, Linda Hacket,

Heike Neumann and May Tan, who provided me valuable feedback on my thesis drafts. Their suggestions enriched my understanding and brought me to a deeper level concerning this topic.

Ultimately, I couldn‘t have done this, any of this, without my family and close friends. Thank you for being by my side for the last two years. I would like to especially thank Norie Moriyoshi, Rika Tsushima and Swan Kennedy for being such steadfast companions and study partners. Thank you also to Nicolas St-Amand for his generous work on the French translation of the abstract. Lastly, I would like to thank my parents in China who supported my study financially and emotionally. For each of you, I am eternally grateful—you made it all worthwhile.

vi

Table of Contents

Abstract ...... i

Résumé ...... iii

Acknowledgements ...... v

Table of Contents ...... vi

List of Tables and Figures ...... xii

CHAPTER ONE: INTRODUCTION ...... 1

Background ...... 1

The Study ...... 2

The Organization of the Thesis ...... 3

CHAPTER TWO: REVIEW OF THE LITERATURE ...... 5

Introduction ...... 5

University Level English Language Education in China ...... 5

Background Information ...... 5

Large-Scale Testing of English at Chinese Universities ...... 6

College English Test (CET)...... 6

Test for English Majors (TEM)...... 8

General Introduction to Large-Scale Testing ...... 9

Strengths of Large-Scale Testing ...... 10

Issues Related to Large-Scale Testing ...... 11

For Non-English Majors...... 11

For English Majors...... 12

Classroom Assessment ...... 13

General Introduction ...... 13

Formative Assessment ...... 15

vii

Benefits of formative assessment...... 17

1. Supporting learning...... 17

2. Providing feedback...... 17

3. Enhancing learners‘ motivation...... 18

Self Assessment ...... 19

Peer Assessment ...... 19

Reconceptualizing Validity and Reliability in a Classroom-Based Assessment Context

...... 21

Validity...... 21

Reliability...... 22

Summary ...... 24

CHAPTER THREE: METHODOLOGY ...... 25

Introduction ...... 25

Rationale of This Study ...... 25

Formative Assessment Model Used in This Study ...... 25

Research Questions ...... 26

Mixed Methods Research Design ...... 26

Procedures of This Study ...... 27

Context ...... 28

Preparation Phase ...... 29

Classroom Observation ...... 29

Introduction to the TEM 4 Oral Test Criteria ...... 31

Phase One...... 31

Participants ...... 31

Teacher participants...... 31

viii

Student participants...... 33

Instruments ...... 34

Teacher questionnaire...... 34

Student questionnaire...... 34

Three AFL tasks...... 35

The design of three AFL tasks...... 35

AFL task one: retelling the story...... 36

AFL task two: word reading...... 37

AFL task three: role playing...... 38

Data Collection Procedures ...... 39

Teachers‘ data collection...... 39

Students‘ data collection...... 39

1. Pre-study student questionnaire...... 39

2. Three AFL activities...... 40

3. Post-study student questionnaire...... 40

Data Analysis Procedures ...... 40

Phase Two ...... 41

Participants ...... 41

Instruments ...... 41

The interview...... 41

Data Collection Procedures ...... 42

Teachers‘ data collection...... 42

Students‘ data collection...... 42

Data Analysis Procedures ...... 43

Triangulation of Data from Phases One and Two ...... 44

ix

Summary ...... 44

CHAPTER FOUR: PRESENTATION OF RESULTS ...... 45

Introduction ...... 45

Presentation of Results -Phase One ...... 45

Results of Teacher‘s Questionnaire ...... 45

Part one: teachers‘ attitudes towards learners and assessment (Q1 to 14)...... 46

Part two: teachers‘ attitudes towards learners and assessment (Q 15 to 29)...... 47

Result of Pre-study Student Questionnaire...... 52

Results of Three AFL Tasks...... 55

1. Task one: retelling the story...... 55

2. Task two: word reading...... 56

3. Task three: role playing...... 57

Results of Post-Study Student Questionnaire ...... 62

Presentation of Results - Phase Two ...... 66

The Students‘ Interviews ...... 66

Students‘ preferences for AFL tasks...... 66

Students‘ attitudes towards AFL feedback with no grading...... 68

The Teachers‘ Interviews ...... 69

Teachers‘ positive attitudes about AFL...... 69

Teachers‘ concerns about AFL...... 71

Summary ...... 71

CHAPTER FIVE: DISCUSSION OF RESULTS ...... 73

Introduction ...... 73

Answers to Research Question One ...... 73

Answers to Research Question Two ...... 74

x

Answers to Research Question Three ...... 75

Consistency and Inconsistency of Results and Literature...... 76

Consistency of Results and Literature ...... 76

Consistency between questionnaire results and literature review...... 76

Consistency between interview results and literature review...... 77

Inconsistency of Results and Literature ...... 77

Summary ...... 78

CHAPTER SIX: CONCLUSIONS ...... 80

Introduction ...... 80

Summary of Findings ...... 80

Main Findings ...... 80

Related Findings ...... 81

Implications...... 82

Limitations ...... 83

Future Research Directions ...... 84

Contribution of This Study ...... 86

Contribution to AFL ...... 86

Contribution to Assessment Approaches in the Chinese Context ...... 86

References ...... 88

Appendices ...... 98

Appendix A: Teacher Participation Consent Form ...... 98

Appendix B: Teacher Interview Participation Consent Form ...... 98

Appendix C: Student Participation Consent Form...... 99

Appendix D: Student Interview Consent Form ...... 100

Appendix E: TEM 4 Oral English Exam Criteria ...... 102

xi

Appendix F: Teacher Questionnaire ...... 103

Appendix G: Pre-Study Student Questionnaire ...... 105

Appendix H: Post-Study Student Questionnaire...... 106

Appendix J: AFL Task Two – Word Reading ...... 111

Appendix K: AFL Task Three – Role Playing ...... 115

Appendix L: Student Interview Protocol ...... 117

Appendix M: Teacher Interview Protocol ...... 118

xii

List of Tables and Figures

Table 1 Teachers‘ Bio-Data from the Teacher Questionnaire ...... 31

Table 2 Student Participants‘ Situation ...... 33

Table 3 Phase One: Procedures of Data Analysis ...... 41

Table 4 Teacher Questionnaire Responses ...... 47

Table 5 Descriptive Statistics of Teacher Questionnaire...... 51

Table 6 Q2. As a second year English-major student, do you think you can achieve the goals

/objectives listed in the curriculum? ...... 52

Table 7 Q3. How often do you speak English in the classroom? ...... 53

Table 8 Q4. How would you describe your speaking proficiency in English? ...... 53

Table 9 Q5. What is (are) the challenge(s) you face when speaking English? ...... 55

Table 10 Results of Retelling the Story ...... 56

Table 11 Results of Word Reading ...... 57

Table 12 Q1.Does this group speak audibly and clearly? ...... 58

Table 13 Q2.Does this group make the role play understandable, reasonable? ...... 58

Table 14 Q3. Does this group use Chinese during the role play? ...... 59

Table 15 Q4. Does this group pause a lot during the role play? ...... 59

Table 16 Intermediate Level Students‘ Comments and Suggestions ...... 60

Table 17 High Level Students‘ Comments and Suggestions ...... 60

Table 18 Q1. Have you done these kinds of tasks before? ...... 62

Table 19 Q2 (a). Do you think AFL tasks help you identify your own difficulty in oral English learning? ...... 63

Table 20 Q2 (b): If yes, what has improved? ...... 63

Table 21 Q3. Have you improved in being able to identify strengths and weaknesses in your classmates‘ spoken/oral English? ...... 64

xiii

Table 22 Q4 (a): What is your view about the AFL exercises? ...... 64

Table 23 Q4 (b): What is your view about the AFL exercises? ...... 64

Table 24 Q4 (c): What is your view about the AFL exercises? ...... 65

Table 25 Q4 (d): What is your view about AFL exercises? ...... 65

Figure 1 Mixed Methods Model Applied in This Study: Explanatory Sequential Design ...... 27

Figure 2 Procedures of This Mixed Methods Study ...... 28

CHAPTER ONE: INTRODUCTION

Background

Due to the open door policy1 and economic development in China, more and more

Chinese people realize the importance of learning at least one foreign language

(Li, 2005). The need for foreign language instruction has become an important link between the East and the West. English, as a required subject in China, has been taught from junior high school through to university for more than 30 years. English language learning helps Chinese students to not only master the language itself but also to enhance and develop their knowledge and understanding of Western culture.

In China, English language learning is strongly affected by tests, especially various national large-scale tests, and this large-scale testing plays a dominant role in English language education in modern-day China.

Large-scale testing is used to evaluate learners‘ proficiency in English and to encourage their English learning. However, scholars observed that the results of large-scale testing are often overemphasized. Specifically, the students‘ test performances have a strong impact on whether or not they can graduate and on their employment opportunities. In other words, students who fail to pass national large- scale English tests cannot receive their degrees or diplomas and find ideal jobs (Dong,

1998; Liao& Wolff, 2009; Zhao, 2007).

Due to the fact that most large-scale English tests mainly assess grammar and vocabulary knowledge, teachers tend to emphasize grammar and vocabulary instruction and neglect speaking and listening instruction. Even though reforms in new educational curricula and examinations emphasize English listening and speaking,

1 Open door policy was implemented in China in 1978 by the Chinese government to promote foreign trade and economic investment. 1

many Chinese university students still report that it is difficult to express themselves orally in English, even after years of studying it (Dong, 1998; He, 1999; Liao & Qin,

2000; Liao & Wolff, 2009; Liu & Carless, 2006; Wang, 2004; Wen, 2001; Zhao,

2007). One possible reason may be that oral English courses are suspended or even cancelled by the departments of universities in order to accommodate the national test schedule and to allow teachers to focus only on grammar and vocabulary practice to prepare their students for the test. What makes the situation worse is that most universities approve these changes to the teaching schedule (Li, 2002). To change the current situation and help students to improve their oral English skills, there is a need to explore a different approach to assessment other than large-scale testing. The researcher, first as a student learning English and then a teacher teaching English, became interested in exploring how classroom assessment can benefit EFL students in the Chinese university-level context.

The Study

Classroom assessment, as an on-going assessment approach, is gaining a great deal of attention because of its potential to better promote students‘ learning in comparison to large-scale tests in a classroom context (Black & Wiliam, 1998a, 1998b;

Genesee & Upshur, 1996; Gipps, 1999; Shepard, 2000; Turner, in press). Appropriate forms of classroom assessment that may improve students‘ oral English language skills are discussed in detail in the literature review of this study. One form of classroom assessment, for example, is Assessment for Learning (AFL). This is an interactive and learning-focused pedagogy, which has shown its effectiveness in encouraging learners‘ participation, identifying learners‘ weaknesses, providing instructors with useful feedback for the learners‘ further development, and helping learners become more 2

autonomous (Black & Wiliam, 1998 b; Harlen & Winter, 2004; Rea-Dickins &

Gardner, 2000). However, there is little, if any, research concerning AFL with Chinese students learning English as a foreign language (EFL) at the university level in China despite the fact that AFL has received increasing attention in Great Britain and North

America since the 1990s (Black & Wiliam, 1998ab; Colby-Kelly & Turner, 2007). In order to fill this gap, the researcher of this thesis conducted a study which implemented AFL in the Chinese classroom. Specifically, this study focused on how

AFL promoted the acquisition of oral skills of Chinese students majoring in English.

The researcher applied a mixed methods design in this study. Both quantitative and qualitative methods were used to collect the data. There were three phases: the preparation, and Phases One and Two. The preparation phase helped the researcher to select the participants. Phase One assessed the effectiveness of AFL to improve students‘ oral English skills through questionnaires and the performance of three tasks.

Phase Two explored teacher and student perceptions of AFL through interviews.

Three research questions are explored in this study:

1. Does AFL help students of varying proficiency levels improve their oral English skills?

2. What evidence can be found that AFL benefits learning?

3. What are teacher and student perceptions of AFL?

The Organization of the Thesis

This thesis comprises six chapters including this introductory chapter one. Chapter two discusses the literature on higher English language education, large-scale testing, and classroom-based assessment which provide the foundations of this study. The methodology and a detailed research design are described in chapter three, which includes the context, participants, research questions, the instruments used for data 3

collection, data collection procedures, and the methods of data analysis applied across

Phase One and Phase Two of the study. Chapter four presents the results from the quantitative Phase One and qualitative Phase Two. In Phase One, the teacher questionnaire, pre- and post- study student questionnaires and three AFL tasks are analyzed. In Phase Two, teacher and student interviews are analyzed. Chapter five discusses the findings to answer the three research questions in order to explore teacher and student attitudes towards AFL and to examine the effects of AFL in a

Chinese university context where English is taught as a foreign language. In chapter six, the researcher concludes the study by discussing the implications, limitations of the study and future research suggestions.

4

CHAPTER TWO: REVIEW OF THE LITERATURE

Introduction

In this chapter, the researcher first discusses the university level English language education in China especially focusing on two large-scale English tests: the College

English Test (CET) and the Test for English Majors (TEM). This leads to the background description of and introduction to large-scale testing in general. In this section, disadvantages and advantages and issues related to large-scale testing are discussed. Following that, approaches to some of the problems of large-scale testing are described. Classroom-based assessment, recently recognized as a separate paradigm of large-scale testing (i.e. separate from the measurement paradigm), will be introduced as a complementary approach to assessment within the classroom. In this section, a description of classroom-based assessment, and the reconceptualization of validity and reliability in a classroom context will be discussed. In addition, because it provides information as feedback to modify the teaching and learning activities, formative assessment as the most appropriate assessment approach in a university level English education classroom context in China will be discussed in detail.

University Level English Language Education in China

Background Information

When discussing university-level English language education in China, the influence of the country‘s long history of tests must be considered. The first Chinese national standardized tests were the imperial examinations introduced in the Song

5

Dynasty about 1300 years ago. These imperial examinations still have a long, far- reaching, and multifaceted impact on the Chinese education and testing system. Cheng

(2009) commented that ―examinations and tests in modern-day China….are heavily used for selection purposes and are accepted by the general public as a fair selection procedure‖ (p.22). In other words, due to the long history of imperial examinations, the English education system in China, especially at the university level, lays principal emphasis on national examinations.

In China today, various large-scale tests play a vital role in English education.

There are several influential standardized English tests at the university level which include the Public English Test System (PETS), the National Accreditation

Examinations for Translators and Interpreters (NAETI), the College English Test

(CET), and the Test for English Majors (TEM). Two of these tests, the CET and the

TEM, are discussed here in detail because they are compulsory large-scale national examinations administered to evaluate the English proficiency of all undergraduate university students, both English Major2 and Non-English Major3 students.

Large-Scale Testing of English at Chinese Universities

College English Test (CET).

The College English Test (CET) consists of both a written and an oral test. The written test has two levels: College English Test-Band 4 (CET-4 大学英语四级考试) and College English Test-Band 6 (CET-6 大学英语六级考试).

The written tests were first launched in 1987 to evaluate Non-English major students‘ general English proficiency at different levels in listening comprehension, reading comprehension, writing and translation, and vocabulary and grammatical

2 English majors are university students who are specializing in English language. 3 Non-English major students are university students who are specializing in fields other than English language.

6

structure. It is obligatory for students to take the CET-4 written test to get their degrees. It is not obligatory for them to take the CET-6, which is similar to the CET-4 but has a higher item difficulty.

The oral test is the College English Test-Spoken English Test (CET-SET).The

CET-SET is designed to measure the oral English communication ability of college and university students in Chinese universities (CET-SET Syllabus; National College

English Testing Committee, 1999).

CET-SET is an optional, not compulsory, section of CET. Students who aim to find jobs after graduation in enterprises with foreign-investment (joint ventures) and where Chinese staff are expected to speak English usually take the CET-SET. The

Spoken English Test was first implemented in 1999 after the launch of the written test of CET in 1987. It aimed to evaluate learners' spoken English through different types of tasks. The format of the test is a face-to-face interview consisting of three parts: conversation, monologue presentation and group discussion and dialogue. In the first

(conversation) part, two students and one CET-SET authorized examiner have a conversation. They greet each other and make small talk. This section lasts for 5 minutes. Following that, two activities are included in part two. First of all, each of the two students is asked to give a presentation about him/herself. Then the examiner holds a group discussion with the two students on given topics. This section lasts for

10 minutes. In the final section, the examiner further checks the candidates‘ oral ability by asking questions related to the topics discussed. This section lasts for 5 minutes.

Linguistic accuracy, discourse management, flexibility in communication, and appropriateness in spoken English are assessed and converted to a graded score

(National College English Testing Committee, 1999).

7

Test for English Majors (TEM).

The Test for English Majors (TEM) is a mandatory national standardized English test for English major university students. There are two levels of the test: Test for

English Majors 4 (TEM-4) and Test for English Majors 8 (TEM-8) (大学英语专业四

级,专业八级). TEM 4 is designed to measure the level of proficiency of intermediate level English learners by the end of their second year of study. TEM 8 is designed to measure that of advanced level English learners by the end of their fourth year of study.

The Oral Test for English Majors is designed to assess learners‘ spoken English in different situations and on a wide variety of topics (translated from TEM Oral Test

Syllabus; National Test for English Major Testing Committee, 2000). The oral test is an optional not compulsory section of TEM. Only those students who pass the written section with a score of 80 are qualified to take the oral test. The TEM oral test uses a tape-recording format and consists of three parts: 1) retelling a story, 2) talking on a given topic and 3) role playing. Story retelling asks the students to listen twice to a story approximately 300 words long and retell the story immediately after they have heard it. The retelling should last no less than three minutes. In talking on a given topic, students talk about a topic related to the general theme they heard previously during the story retelling. Students have three minutes to prepare and then give a three minute talk. In the third and final part of the test, the role playing, both students are involved. Each of the students gets a sheet that contains details about a situation and the specific role they are expected to play. The situation is the same but the two students have different roles. The students are given three minutes to prepare together for a four-minute conversation/ role-play. Linguistic accuracy, discourse management,

8

flexibility in communication, and appropriateness in spoken English are assessed and converted to a graded score (National Test for English Majors Committee, 2000).

CET and TEM can provide information on students‘ achievement over a certain period of time, and the value of these two national large-scale tests is not questioned by Chinese teachers and students (Cheng, 2009). An increasing number of participants are taking these large-scale tests. For example, in 2004, 77,386 students took the TEM and in 2007, 120,877 students took it (Hu, 2011). This is an increase of 56% in only a four year period. To better understand the function of these two English tests at the university level, it is necessary to discuss large-scale testing in more detail. The following section will explore the literature on large-scale testing.

General Introduction to Large-Scale Testing

Large-scale testing is often used for policy purposes. It can be defined as

―assessments … used to evaluate programs and/or to set expectations for individual student learning‖ (Pellegrino, Chudowsky, & Glaser, 2001, p. 241). Large-scale tests are frequently considered as ―external‖ tests – that is, outside the classroom and not developed by the classroom teacher , which are used for ―making decisions about individuals and programs regarding, for instance, certificates, diplomas, acceptance, rejection and placement‖ (Schohamy, 1994, p.133). As a scientific measuring instrument, large-scale testingcan produce reliable data on students‘ achievement, abilities and skills and can be administered to a large population (Madaus, 1985). As a result, large-scale testing has taken the dominant position in educational assessment in the past few decades (Black & Wiliam, 1998b; Gibbs & Simpson, 2005; Yorke, 2003) for use all over the world (Firestine, Mayrowtes, & Fairman, 1998). Many countries have used large-scale tests to gather information on student performance since the last

9

half of the 20th century (Olson, Bond, & Andrew, 1999; Thurlow, Ysseldyke, Gutman,

& Geenan, 1998).

Strengths of Large-Scale Testing

The purpose of large-scale testing is to report learning achieved at a certain time.

Studies have explored the role of large-scale testing both on student achievement and on teaching. Evidence suggests that it has had a positive pedagogical impact on both students and teachers (Abu-Alhija, 2007; Brookhart, 2004; Crook, 1988; Gipps, 1994;

Murphy & Broadfoot, 1995; Shepard, 1991; Stecher, 2000; Stiggin, 1995).

First of all, large-scale testing can help students learn more effectively. Through tests, students can become more aware of which part(s) of their current language study is/are below expected academic achievement levels. This, in turn, can give them a clearer direction of what to study in the next stages in their program of study and can motivate them to work harder (Abu-Alhija, 2007; McNamara & Shohamy, 2008;

Tuckman, 2003).

Second, large-scale testing can help teachers to improve their teaching pedagogy.

The results of the test(s) can give them information about their students‘ capacity to learn. Teachers can then use the results to diagnose individual student's needs and select a matching curriculum. Moreover, the information gathered from large-scale testing helps teachers perfect their instruction to improve students' performances

(Elliot & Branden, 2000; Herman & Golan, 1993; Nagy, 2000).

However, large-scale tests are effective in improving students‘ learning and teachers‘ pedagogy under certain circumstances only. When large-scale tests are conducted during a class or a course, they can help learners more effectively with appropriate feedback, whereas when the tests come at the end of semester and serve as summative tests, the results can only help the teachers plan new curricula for the new in-coming students but cannot help the students that have finished the class or course 10

and left. The weaknesses of this type of test and negative consequences are explored in the Chinese context in the following section.

Issues Related to Large-Scale Testing

For Non-English Majors.

Zhao and Cheng (2002) pointed out that most students who were not English majors prefer to spend their time on reading comprehension and grammar exercises since those areas are weighted heavier in examinations than speaking. What has made the situation worse was the fact that many students gave up learning oral English due to the optionality of the CET-SET (Liao & Wolff, 2009).Because of this emphasis on written rather than on oral tests, Chinese scholars reported what they called the ―Mute

English‖ phenomenon. Mute English is generally defined as a long neglected but fundamental oral learning difficulty among all non-English major students. The definition of Mute English varies depending on the researchers. Liu and Carless (2006) suggested that the Mute English phenomenon refers to students who can read and write but cannot express themselves orally in English. Zhao (2007) restricted the definition of Mute English to the speech of college students, who, after years of learning and even passing College English Tests, cannot speak English fluently. Liao and Wolff (2009) defined Mute English as ―a unique Chinese phenomenon ignored by linguistic scholars but [practiced] by Chinese students. It (English) is a communicative language taught as if it were a dead language, like ‖ (p. 1).

Many Chinese scholars have studied the Mute English phenomenon. Wang (2004) reported that even though Chinese students usually begin to learn English at fourth grade in primary school, after an average of nine years of learning, most of them can neither understand English conversations nor express their ideas correctly in English.

Wang (2005) found that although some of the students continue learning English up to and including graduate school and/or even have passed tests like the TOEFL and GRE, 11

they are still limited English speakers. Dong (1998) conducted a survey on student satisfaction about English teaching in China at the college level. There were over

1,000 respondents. The results indicated that an overwhelming 75% of the participants were dissatisfied with the present situation of English language teaching in colleges and universities. Dong (1998) further concluded that although English is one of the subjects that Chinese students take for the longest period of time (i.e. from middle school to college and through to graduate school), only a few of them end up being proficient in speaking and listening.

For English Majors

Due to the great emphasis on written performances in large-scale testing, the

English majors‘ oral test performances suffered. After analyzing oral test data collected from the TEM 4 from 1997 to 1998, He (1999) concluded that students majoring in English could not achieve the required level of oral proficiency. His analysis demonstrated that students not only fail to achieve accuracy in pronunciation but also have problems with grammar. Wen (2001) also reported low level oral English proficiency amongst English majors. She identified five existing problems in her research concerning student performance on the National Spoken English Test for

English Majors from 1995 to 2000. First, students had a low level of accuracy in grammatical structure when speaking English. According to Wen, most students had difficulty forming sentences orally with verbs in the simple past tense with –ed and third person singular –s. Second, students encountered difficulties with fluency. She found many unnecessary pauses and repetitions during student conversations. Third, students‘ conversations were generally similar to each other, lacking personal creativity. Fourth, students required improvement in their interpersonal communication skills. Finally, Wen reported that some students lacked turn-taking skills and often interrupted their partners. 12

As mentioned, students had poor communicative skills because so much of the in- class instruction they received was focused on developing skills for the written proficiency test. Liao and Qin (2000) conducted a survey to explore the attitudes of

English majors towards English instruction. The results showed that 57% were not satisfied with the oral English instruction they were receiving. Over 60% of classroom teaching was based on exam preparation exercises, whereas the use of English for oral communication was ignored. As a result, 94% of the participants found that compulsory oral English courses at school were not beneficial for their future career.

Most English majors claimed that they learned only Mute English in universities, just as did the non-English majors. In other words, they said their spoken English was no better than that of students not majoring in English.

The literature appears to indicate that Chinese university teachers and students are only familiar with large-scale tests and they value test performances, especially a written test, more than genuine learning. Therefore, finding another assessment approach that enables learning as well as measuring it and that serves well in the classroom context is urgent. In recent years, classroom assessment, with the potential to encourage learning information not easily assessed by large-scale testing, has gained scholars‘ attention (Black & Wiliam, 1998a, 1998b; Brown, 1998; Genesee & Upshur,

1996; Pellegrino et al., 2001; Turner, in press).

Classroom Assessment

General Introduction

Some second language tests, like large-scale tests, focusing on learners‘ achievement are categorized as ―external‖ by Shohamy (1994). This means they are administered outside of the classroom and not developed by the classroom teacher.

Besides external language tests, there are also ―internal‖ tests. These are tests 13

administered inside the classroom and normally developed by the classroom teacher.

Shohamy (1994) claimed the classroom context for language testing is ―where tests are used as part of the teaching and learning process‖ (p.133).For the last few decades, classroom-based assessment (CBA) has been considered as a derivation of traditional large-scale testing. However, this situation has changed and a rising awareness of the importance of CBA began to occur by the end of the last century (Black & Wiliam,

1998a, 1998b; Genesee & Upshur, 1996; Gipps 1999; Shepard, 2000; Turner, in press).An indication of its importance is the growing awareness that CBA is a separate paradigm from the large-scale testing measurement paradigm in the assessment literature (Turner, in press).

Beginning in the 1990s, assessment textbooks began to discuss CBA in addition to large-scale tests (sometimes referred to as standardized testing formats).

Generally speaking, classroom assessment includes ―all assessment that happens within the classroom regardless of its purpose‖ (NCEE, 2011, p. 1). In Genesee and

Upshur (1996), CBA was considered as a paradigm apart from the measurement large- scale testing paradigm. Turner (in press) describes the characteristics of CBA as follows:

[CBA] involves strategies by teachers to plan and carry out the collection of

multiple types of information concerning student language use, to analyze and

interpret it, to provide feedback and to use this information to help make decisions

to enhance teaching and learning. Observable evidence of learning (or lack of

learning) is collected through a variety of methods (e.g., observation, portfolios,

conferencing, journals, questionnaires, interviews, projects, task sheets,

quizzes/tests) and most often embedded in regular instructional activities.(p.3)

This definition of CBA not only describes characteristics of CBA but also embraces the diverse approaches of CBA. Besides tests and quizzes, there are other methods. Some are only assessed qualitatively. In the following section, the researcher 14

will focus on one approach of CBA, formative assessment, which broadly includes assessment that happens during the learning process for the purpose of improving teaching and learning in the classroom.

Formative Assessment

The term ―formative assessment‖ can be traced back to Scriven‘s (1967) term

―formative evaluation‖ from a monograph of the American Educational Research

Association (AERA). Later, Bloom et al. (1971) defined formative evaluation as ―the use of systematic evaluation curriculum construction, teaching and learning for the purpose of improving any of these three processes‖ (p.118). Bloom et al. (1971) also suggested teachers should use formative assessment for the feedback it provided to students to remediate/ improve their study. The concepts of formative assessment have been diversely expanded since then by other scholars. Formative assessment, in the most cited work from Black and Wiliam (1998b), was defined as assessment

―encompassing all those activities undertaken by teachers, and/or by their students, which provide information to be used as feedback to modify the teaching and learning activities in which they are engaged‖ (pp. 7-8). In other words, formative assessment, if well planned and effectively implemented, can help teachers check their students‘ understanding of learning and can help the students become more autonomous learners in order to improve their levels of achievement. Atkin, Black, and Coffey‘s (2001) confirmed that even though the fruitful development of formative assessment can be identified, ―there is no single, simple recipe that teachers could adopt [formative assessment] and follow‖ (p.13) Based on views on learning, they proposed a framework for formative assessment with three guiding questions: ―1. Where are you trying to go? (identify and communicate the learning and performance goals);2. Where are you now? (assess, or help the student to self-assess, current levels of

15

understanding);3. How can you get there? (help the student with strategies and skills to reach the goal)‖ (p.14).

Various methods of formative assessment can be used in classrooms to inform teachers what students know and what they do not in order to make adjustments in teaching. Some common examples are observation, questioning, discussion, peer/self assessment and presentations. Black and Wiliam (1998 b) encouraged teachers to use questions and classroom observation to increase students‘ knowledge and to improve their understanding. They recommended teachers to require students to discuss their thinking in pairs, to share their thoughts with larger groups, and to vote after discussion on the response to select an answer for the topic. Such methods facilitate teachers to assess students' understanding.

In some studies, assessments that are used to support learning are described under the broad heading ‗Assessment for Learning‘ (AFL). The use of this term is relatively recent, since formative assessment was coined in 1967 and Assessment for Learning in the mid-1990s (Stobart, 2008). The Assessment Reform Group4 (2002) used and defined AFL as ―the process of seeking and interpreting evidence for use by learners and their teachers to decide where the learners are in their learning, where they need to go and how best to get there‖ (pp. 2-3). Within the context of classroom assessment, many scholars used the two terms formative assessment and AFL interchangeably to express the same idea (Black et al. 2003; Colby-Kelly & Turner, 2007). In this study, the researcher as well will use the two terms to express the same meaning5.

4 The Assessment Reform Group (ARG) is a group of scholars who ensure that assessment policy and practice at all levels takes into account relevant research evidence in Great Britain.

5 Some scholars believe that formative assessment and Assessment for Learning are not the same. For example, Black, Harrison, Lee, Marshall and Wiliam (2002,p. i) made a distinction between the two. 16

Benefits of formative assessment.

Numerous studies have demonstrated that formative assessment is effective in three areas: 1) supporting learning, 2) providing feedback, and 3) enhancing learners‘ motivation.

1. Supporting learning.

Black and Wiliam (1998a) reviewed 250 journal articles and book chapters to examine whether formative assessment can raise the academic standards in the classroom. They claimed that it is effective in the improvement of students‘ performance and in their involvement in learning. In Black and Wiliam‘s (1998b) quantitative research, they reported that formative assessment is statistically effective in helping students‘ learning, especially low-achieving learners. Black and Wiliam

(1998b) also emphasized the significant role feedback played in providing information to correct teaching and learning activities.

2. Providing feedback.

Feedback6 meant that teachers evaluate students‘ work, and communicate the results of evaluation to students. Gradually, researchers expanded the scope of the feedback. Winne and Butler (1994) believed that ―feedback is information with which a learner can confirm, add to, overwrite, tune, or restructure information in memory, whether that information is domain knowledge, meta-cognitive knowledge, beliefs about self and tasks, or cognitive tactics and strategies‖ (p. 5740).

Numerous research studies have been conducted on the timing of feedback: both immediate and delayed (Bangert-Drowns, Kulik, Kulik, & Morgan, 1991; Brackbill,

Blobitt, Davlin, & Wagner, 1963; Schroth & Lund, 1993; Sturges, 1972, 1978;

Swindell & Walls, 1993). Kulik and Kulik (1988) claimed that as different processes,

6 Feedback in this study is different from feedback in second such as corrective feedback, in which more emphasis is put on a correcting function (Kulhavy, 1977; Lyster, & Mori, 2006; Lyster, & Ranta, 1997). 17

both immediate and delayed feedback can be beneficial. In their research, immediate feedback was beneficial at the process level whereas delayed feedback was beneficial at the task level. Clariana, Wagner, and Roher Murphy (2000) found the effectiveness of delayed and immediate feedback depended on test item difficulty. Difficult items that need more time to be processed are compatible with delayed feedback whereas easier items that need less time to be processed are compatible with immediate feedback.

Kluger and DeNisi (1996) reviewed 131 studies on the effect of feedback interventions on the performance of formative assessments. They found that there was an average effect size of .4 related to feedback. Hattie (1999) synthesized 500 meta- analyses, including information about feedback through analysing over 450,000 effect sizes from 180,000 studies. The outcomes reported high effect sizes of students receiving feedback information improving learning. Hattie and Timperley (2007) studied formative feedback and stressed the importance of feedback with an average effect size of .79. After examining the effect size of direct instruction, reciprocal, and students' prior cognitive ability, they reported that "feedback is among the most critical influences on student learning"(p.102). These results indicated that learning is more likely to be encouraged and promoted when feedback is given, especially when task requirements and learning goals are emphasized.

3. Enhancing learners’ motivation.

Feedback can be positive or negative. Positive feedback can inform students which aspect(s) of their work is/are satisfactory whereas negative feedback can provide information about the gaps between what level their work is and what it should be

(Topping, 2009). Both positive and negative feedback can act as triggers to motivate students‘ learning (Heritage, 2007; Miller & Lavin, 2007; Stiggins, 2002). Fishbach,

Eyal, and Finkelstein (2010) claimed that positive feedback can motivate students‘ 18

achievement when there is a need for goal pursuit and negative assessment can also motivate students by identifying weaknesses that need to be improved.

Miller and Lavin's (2007) exploratory study involving 370 students demonstrated that formative assessment helped increase the self-esteem of students aged from ten to twelve. Their study showed that students became more aware of learning and how to achieve learning goals under formative assessment practices. Irons (2007) also reported that formative assessment feedback can encourage motivation and self- confidence in students of higher education. Nowadays in higher education, both self assessment and peer assessment are increasingly used. The following sections will introduce self assessment and peer assessment.

Self Assessment

Self assessment, as one important factor of formative assessment, means ―students assessing their own work‖ (Hanrahan & Issacs, 2001, p.53). In their study, when students applied certain evaluation criteria to their own work, their understandings of the criteria were deepened. Moreover, the process of self assessment enabled the students to be more responsible about their work and to work more collaboratively with their teachers and peers. Klenowski (1995) found that learners involved in self- evaluated learning showed more interest in the evaluation criteria. In White and

Frederiksen(2000), students not only used inquiry criteria to evaluate their own work but also used the criteria to provide feedback when, for example, their peers gave oral presentations in class. Therefore, involving and encouraging self assessment in the instruction can improve students‘ learning when the marks are derived from formal assessment procedures.

Peer Assessment

Peer assessment can play an important role in formative assessment. It has also been reported to be effective in strengthening learning by providing more democratic 19

spaces for students to work individually and collaboratively (Butler & Winne, 1995;

Cole, 1991; Falchikov, 1995, 2001; Fenwick & Parsons, 1999; Hanrahan & Isaacs,

2001; Olson, 1990; O‘Donnell & Topping, 1998; Race, 1998; Salend, Whittaker, &

Ross, 1993; Topping, 2005; Topping & Ehly, 1998; Weaver, 1995). Empirical studies referring to peer assessment demonstrated that it enabled students to find the distance between the actual development level and the potential development level. From the analysis of 233 students who experienced peer assessment, Hanrahan and Isaacs

(2001) found that the students benefited from the intervention. Cheng and Warren

(2005) commented that peer assessment ―enable[s] learners to develop abilities and skills [that] a teacher alone assesses‖ (p. 2). Peer assessment can help students develop their abilities and skills by themselves and can also inform students of the required academic standards and sharpen their critiquing techniques. Topping (2009) explored the reliability, validity, and effects of peer assessment in schools, and describes effective approaches to putting it into practice. Topping (2009) also mentioned that peer assessment ―builds on students‘ strengths and mobilizes them as active participants in the learning process.... in helping and cooperation, listening and communication. Peer assessment encourages personal and social development‖ (p.

71).

Since higher education globally has faced a tremendous growth in student enrolment, maintaining the quality of instruction in large size classes has become the priority of teaching and learning. Pond, Ul-Haq and Wade (1995) claimed that one way to solve the large class issue is through peer assessment. Ballantyne, Hughes and

Mylonas (2002) explored this idea of peer assessment in large classes. They reported that even though peer assessment could encounter administrative and staff commitment difficulties, by encouraging students‘ involvement in the assessment process, the benefits of peer assessing outweighed the disadvantages. 20

Classroom-based (internal) assessment has proved effective in learning. However, some educators and researchers still have doubts about whether it is as valid and reliable as large-scale (external) assessment. The following section will discuss validity and reliability in classroom-based assessment.

Reconceptualizing Validity and Reliability in a Classroom-Based Assessment

Context

For many years, it was taken for granted that two fundamental measurement qualities - validity and reliability, were measured by psychometric criteria and applied in the classroom context. Recently, however, researchers are becoming increasingly aware that validity and reliability concepts as defined and used in large-scale testing contexts may not be appropriate for application in the classroom due to different contexts and purposes for assessment (Brookhart, 2003; Fulcher & Davidson, 2007;

McMillan, 2003; Moss, 2003; Rea-Dickins, 2006; Smith, 2003; Turner, in press).

Fulcher and Davidson (2007) concluded that the main difference between large-scale testing and classroom-based assessment (CBA) is the ―context of the classroom‖ (p.

24). To be more specific, the purpose of validity and reliability in large-scale testing is to evaluate school accountability in a test context, whereas the purpose of CBA is to support students‘ learning in a classroom. Therefore, although these efforts have only just begun, researchers have started to reconceptualize validity and reliability in classroom contexts (Turner, in press).

Validity.

In a large-scale testing context, validity refers to ―the degree to which evidence and theory support the interpretations of test scores entailed in the use of tests‖ (AREA,

APA, & NCME, 1999, p.9). Since no direct way can assess human qualities or abilities, researchers needed to apply indirect methods—usually tests—to collect data with valid information (Genesee & Upshur, 1996). In a large-scale testing context, 21

testing data are gathered to help educators make decisions about students, instruction and schools. If the data collected from tests enable educators to achieve decision- making goals, such as whether or not a student graduates, then the tests administered are valid.

For certain large-scale testing programs, there are relations between test results on the one hand and promotion, graduation, certification, licensure, selection, or college admission of test-takers on the other (Haladyna, 2002). Therefore, the validity of the test becomes ―the most important consideration in evaluating the quality of the uses and interpretations of the results of tests and assessments‖ (Linn, 2002, p.28). In other words, the higher the stakes are, the more demanding the requirement is for validity.

In a classroom context, information is gathered to inform teachers, to better understand students‘ learning, and to guide further instruction. Moss (2003) suggested that classroom validity is concerned with consequences, interpretation of assessments, and capacity to promote students‘ competence in learning. She believed classroom assessment is valid as long as interpretations of assessment and instruction decisions are compatible with each other. More important, Moss required researchers to

―develop and maintain a rich repertoire of cases: not just those that illustrate how our guiding principles can be thoughtfully applied but, equally important, those that have not already been shaped by our principles so that we can learn about their limitations‖

(p.22).

Reliability.

Within large-scale assessment, reliability refers to ―the consistencies and inconsistencies in outcomes on a test‖ (Way, Dolan, & Nichols, 2009, p.308).

Reliability is another important core issue in educational measurement because ―an assessment instrument or procedure (such as a test) can be only as valid as it is reliable‖ (Genesee and Upshur, 1996, p.63). As with validity, reliability cannot be 22

assessed directly but can only be estimated from the test scores. Compared to validity, reliability is more concerned with test scores. Therefore, measurement and statistical analysis are often used in reliability research. According to true score theory, every measurement has two components: true ability (scores) and error. This theory indicates that the higher the error rates, the lower the measurement reliability. In other words, the error of measurement determines the degree of reliability (Koretz, 2008; Ryan &

Shepard, 2008). To enhance reliability, possible sources resulting in unreliability should also be noted, such as (1) student-specific factors, (2) test-specific factors, (3) scoring-specific factors, and (4) situational factors (Ryan & Shepard, 2008). In a large- scale testing context, the outcomes of the testing could also be responsible for some high stakes decisions such as student graduation. Therefore, large-scale testing must be highly reliable (Shepard, 2003).

In a classroom context, reliability of assessment does not need to meet the same standard of stability as in large-scale testing because assessment involves ongoing interaction between teachers and students. Smith (2003) claimed that the most appropriate way to evaluate reliability in the classroom is to observe whether there is sufficient information provided for teachers to make reasonable decisions. More importantly, various data should be collected to ensure the accuracy of classroom decisions. Moreover, formative assessment provides non-evaluative feedback and elicits individual students‘ strengths and weaknesses to promote learning whereas large-scale testing neglectsthis (Cizek, 2009). Based on the review of the literature, it can be seen that formative assessment has proved effective in both European and

North American contexts (Black & Wiliam, 1998a, 1998b; Broadfoot, 1999; Stiggins,

2002). Carless (2011) claimed that ―it does seem fair to say that on both sides of the

Atlantic… there is a consensus that formative assessment is a ‗good thing‘‖ (p. 3).

The researcher‘s interest in this study will focus on how to implement the formative 23

assessment approach into Chinese classrooms and on whether the implementation can effectively promote oral English language learning by Chinese students at the university level.

Summary

In this chapter, the researcher first introduced two Chinese national large-scale

English language tests at the university level: the CET and TEM. Following that, a detailed description of large-scale testing and issues related to it were discussed. This led to a discussion of classroom-based assessment to resolve issues such as the Mute

English phenomenon, lack of oral English communicative skills and dissatisfaction with the fact that language instruction lays so much emphasis on developing test- taking skills. In addition, formative assessment as one approach to classroom-based assessment due to its effectiveness in promoting learning was discussed. In the following chapter, the researcher will state the research questions and explain the research design data collection and data analysis procedures.

24

CHAPTER THREE: METHODOLOGY

Introduction

This chapter begins by addressing the rationale of this study to explore Chinese university-level English learners‘ difficulties in oral skills, AFL effectiveness in improving their oral skills, and teachers‘ and students‘ perceptions of AFL. This is followed by three research questions, the procedure of this study, and an explanation of the research design. In the research design section, the selection of participants, design of instruments, collection of data, and procedures of data analysis are described, followed by a summary in the final section.

Rationale of This Study

The rationale for this study was generated by looking at the gaps discussed previously in chapters two and three. On the one hand, the literature has reported

Chinese university-level English learners‘ difficulties in mastering oral English, and that research is needed to find ways of improving this situation. On the other hand, even though formative assessment has proved its effectiveness in terms of furthering learning in European and North American contexts (Black & Wiliam, 1998a, 1998b), little has been done in terms of formative assessment effectiveness in this area in the

Chinese context. As a result, AFL‘s effectiveness in improving oral skills, and Chinese teachers‘ and students‘ attitudes towards AFL are still unknown. This research is therefore timely in its examination of these issues.

Formative Assessment Model Used in This Study

As mentioned in chapter two, formative assessment can promote learning, the result of which is students‘ higher achievement (Black & Wiliam, 1998a, 1998b). 25

However, greater learning gains rely on proper implementation of formative assessment. In other words, an effective formal assessment model properly implemented can elaborate effective learning.

In this study, the researcher has used Atkin, Black, and Coffey‘s (2001) three- question model of formative assessment. Different tasks in the study correspond to the three key questions posed in this model. First, the pre-study questionnaire that students completed informed the researcher as to ―where students are now‖ in their learning.

Secondly, the TEM 4 criteria pointed out ―where students are trying to go‖. Thirdly, the three AFL tasks were designed in terms of the students‘ learning difficulties and in order to help them ―to get there [achieving instructional goals]‖.

Research Questions

Three research questions were investigated in this study:

1. Does AFL help students of varying proficiency levels improve their oral

English?

2. What evidence can be found that AFL benefits learning? 3. What are teacher and student perceptions of AFL? These research questions led to the design of this study, which is a three-phase mixed methods design consisting of classroom observation in the preparation phase, teachers‘ and students‘ questionnaires and three AFL activities in Phase One, and teachers‘ and students‘ interviews in Phase Two.

Mixed Methods Research Design

The researcher used a mixed methods approach to collect and analyze data. Given that a classroom is a complex environment, understanding the interactions between teachers and students in the processes of teaching, learning, and assessment requires an approach that is able to capture data that reflect these multiple aspects. In such a case, the researcher believed that the mixed methods approach was the most suited to

26

this task. Greene (2007) stated that ―[t]he primary purpose of a study conducted with a mixed methods way of thinking is to better understand the complexity of the social phenomena being studied‖ (p.20). This means that the mixed methods approach enables a better understanding and a more thorough interpretation of research findings.

Creswell and Plano Clark (2009) claimed that the combination of quantitative and qualitative data collected could bring greater insight to the problem as opposed to analyzing either type of data separately.

The researcher first used an explanatory mixed methods design to collect and analyze the data (Figure 1). Quantitative methods were used first to elaborate the collection of qualitative data. Although data were collected in sequence, they had equal weight in this study. Quantitative data collection and analysis enabled the researcher to understand whether AFL helps students of varying proficiency levels improve their oral English and what evidence demonstrates the improvement.

Qualitative data collection and analysis allowed the researcher to understand teachers‘ and students‘ perceptions of AFL. The researcher then corroborated the research findings by combining the quantitative and qualitative data.

Figure 1 Mixed Methods Model Applied in This Study: Explanatory Sequential

Design

27

Procedures of This Study

Figure 2 presents procedures of this mixed methods research study. The procedures of this study will be discussed in detail in the following sections.

Figure 2 Procedures of This Mixed Methods Study

Context

The research was conducted in the English department of a normal university (i.e., teachers‘ university) in Northeast China. Since the implementation of the university expansion policy7, there are four programs in the English department: English

Education, English Translation, Business English, and Technology English. All four programs are four-year full-time programs and they share the same core English

7The university expansion policy is one of the five sectors of China's Education Reform that was proposed in 1998 and administered in 1999. The policy created new programs that allowed more high school graduates to have the chance to receive university education. 28

course: Integrated English8. Among the four programs, English Education major is education-related. Students who enroll in this English Education program are trained to become English teachers in high schools or colleges. The other three programs – that is, English Translation, Business, and Technology English, were established according to the positions most in demand in the job market at the time since the reform. Students who enroll in these programs are expected to work in trade/business related fields. Since the implementation of the expanded enrolment policy in China in

1998, more students with varying levels of proficiency in English are entering the

English department. Currently, there are approximately twenty classes in each year with thirty to forty students in each class on average, which means 600 to 800 students are studying in these four programs annually. In this study, the participants were selected after the researcher‘s classroom observation. The beginner and intermediate students were English Education majors. The high-level/ advanced students were

English Translation majors.

Preparation Phase

The main purposes of the preparation phase were (1) to select participants through classroom observation and (2) to introduce the TEM 4 oral test criteria to selected participants.

Classroom Observation

In this study, the researcher used classroom observation as the main method to select participants for the study. Classroom observation, as one of the most commonly used qualitative approaches, allows researchers to observe, analyse, and

8 Integrated English: means the four skills: speaking, listening, reading and writing of English are taught. The textbook is ―Integrated Skills of English‖ which includes three sections in each unit: 1) Listening and Speaking Activities 2) Reading Comprehension and Language Activities and 3) Extended Activities.

29

interpret teachers‘ teaching (Wajnryb, 1992) and to understand directly what students do in the classroom (Gillham, 2000). The researcher observed 20 classes of undergraduate English majors in Integrated English courses, which was the core course for all four programs, and kept notes of lesson content and classroom interaction. During classroom observation, the researcher mainly focused on the teachers‘ displayed knowledge and teaching effectiveness, students‘ engagement in activities and their oral English proficiency to evaluate and select teacher and student participants. Besides observation, the Dean of the Department assisted the researcher with participant selection. Participant selection criteria include: students‘ attitudes towards learning, students‘ GPAs, and teachers‘ educational and instructional backgrounds. Therefore, after a three-week observation period, three classes - one beginner, one intermediate, and one high level, for a total of 74 students, and nine teachers9 including three teachers in charge of beginner, intermediate and high level students were selected for participation at this stage of the study. Prior to the study being conducted, all participants received a brief introduction to the study. Following that, the researcher asked all the participants to sign consent forms and informed them that they could drop out of the study at any time (Appendices A, B, C, D). To help the participants understand the concept of consent forms for research purposes as used in western countries, the researcher chose to use English instead of their mother tongue10 for the consent forms.

9The researcher asked all nine teacher participants to answer the teacher questionnaire. Following that, the researcher wanted three levels of students (beginner, intermediate and high) and their three teachers to participate in Phases One and two of the study. However, after doing the post-study questionnaires, low level students decided not to continue participating in the rest of the study.

10 The introduction of this study to teachers and students was in Chinese, but the written forms were in English. 30

Introduction to the TEM 4 Oral Test Criteria

After participants were selected, the researcher introduced them to the criteria for oral English for the TEM 4 oral English exam (Appendix E). The criteria require second year students to:

a). talk to English speakers in general everyday social occasions b). talk about topics in daily life c). express themselves correctly with correct pronunciation and intonation and without significant grammatical errors. After the introduction of the criteria, student participants were asked to self- evaluate their oral English proficiency and complete the pre-study student questionnaire. In the following parts of this section, the researcher describes the two phases of the study including participants, instruments, data collection procedures, and the data analysis.

Phase One

Participants

Teacher participants.

Nine teacher participants completed the teacher questionnaire (Appendix F) in

Phase One of this study. They taught the core Integrated English course to second year students in the English department. The questionnaire consisted of two sections. The first section included five multiple choice bio-data questions eliciting information about the teachers‘ sex, age, number of teaching hours per week, and educational background. To help describe the participants, the first section of the questionnaire is summarized in Table 1.

31

Table 1 Teachers’ Bio-Data from the Teacher Questionnaire (Percentages in

Parentheses)

Teacher Participants Background No. of Participants N = 9

Sex Male --- (0%) Female 9 (100%)

Age 20-29 2 (22.2%) 30-39 5 (55.6%) 40-49 2 (22.2%)

Teaching Experience 3-6 years 5 (55.6%) 7-10 years 2 (22.2%) 11 years or more 2 (22.2%)

Hours per Week 0-4 hours 1 (11.1%) 4-10 hours 5 (55.6%) 11 hours 3 (33.3%)

Educational Background Bachelor plus certificate 1 (11.1%) Master 6 (66.7%) PhD 2 (22.2%)

All the teachers who participated in this study were female. Their age range was from 29 to 46. All of the nine teachers had at least 3 years of teaching experience and 2 of them had taught English for more than 11 years. Five teachers worked at least 4 to

10 hours every week and 4 of them taught more than 11 hours every week. Two teachers in charge of high and intermediate level students continued to the next phase of this study.

32

Student participants.

Seventy four undergraduate students started out participating in Phase One of this study. They were 68 female students and six male students. However, only 41 students completed the entire study. The researcher further analyzed the data by different numbers of participants according to various tasks and questionnaires. A detailed description of students‘ participation is summarized in Table 2.

Table 2 Student Participants’ Situation

Pre-study Task 1 Task 2 Task 3 Post-study Student Student Questionnaire Questionnaire

Number of 68 51 37 37 37 Female Students

Number of 6 4 4 4 4 Male Students

Total 74 55 41 41 41

All the students were in their second year of study. They were selected from three out of twenty classes from the same grade. Besides the researcher‘s classroom observation, the students‘ entrance examination scores and teachers‘ evaluations were also taken into consideration. Students‘ entrance examination scores provided the researcher background information on the students‘ previous English language learning study through high school. Teachers‘ evaluations and comments enabled the researcher to better understand the students‘ current English learning situation. Two classes of students were from the English Language Education program, and the third was from the English Translation program. All the students were native Chinese speakers who had begun learning English when they were in secondary school. The majority of the students spoke Mandarin as their first-learned fluent variety of

33

Chinese, whereas others, the ones from South China, spoke another variety that they learned at home as a first language and spoke Mandarin as a second dialect.

Instruments

Teacher questionnaire.

The researcher used a questionnaire to elicit teachers‘ perceptions of formative assessment prior to the study. In this study, there were two sections in the teachers‘ questionnaire (see Appendix F). The first section used five multiple choice questions to gather teachers‘ bio-data, presented above in Table 1. The second section was 29 4- point Likert-scale statements. Twenty eight of the statements in section two were adapted from Colby-Kelly and Turner‘s (2007) teacher questionnaire. In their original

54 questions, teachers‘ perceptions towards formative classroom-based assessment were elicited by four-section statements focused on: learners and assessment, teachers and assessment, learning and assessment, and course assessment needs. In this researcher‘s study, the 28 statements selected were from two of the sections: learners and assessment, and learning and assessment. One question was added by the researcher for a total of 29 statements. Teachers were required to indicate the extent to which they agreed or disagreed with the 29 statements, from ―strongly agree‖ to

―strongly disagree‖.

Student questionnaire.

Students did pre-study (Appendix G) and post-study questionnaires (Appendix H) in Phase One. The two questionnaires were adapted from Cheng and Warren (1997) and Patri (2002) to identify students‘ difficulties in oral English skills and to check the effectiveness of three AFL tasks.

There were five multiple choice questions in the student pre-study questionnaire which can be categorized as bio-data and self-evaluating questions. Students self- evaluated their oral English proficiency and reported their own difficulties in learning 34

oral English. For example, when students were asked about the challenge(s) they faced in their learning, they needed to specify whether they lacked confidence or they had difficulties with speaking, pronunciation, or fluency. They could add other problems concerning spoken English not listed on the questionnaire.

The post-study student questionnaire consisted of four questions. Two were multiple choice and the other two were open-ended multiple choice questions. The post-study questionnaire aimed to check the effectiveness of and learners‘ attitudes towards AFL. For example, students were asked to express their views on the AFL tasks. They needed to select options and then explain the reason for their choice. If their choice was not on the list, they could write down their own opinions and explain them.

Three AFL tasks.

The design of three AFL tasks.

The researcher designed three AFL tasks to improve students‘ oral English language skills.

As mentioned in the previous chapter, AFL has proved its effectiveness in accelerating learning in the classroom (Buhagirar, 2007; Carless, 2011). To ensure the learning effects of formative assessment in the classroom, it is necessary to link formative assessment with test formats from the beginning. In this study, the researcher integrated the TEM 4 oral English test formats with AFL to improve students‘ oral skills and, at the same time, to prepare students for the TEM 4 test. Two of the AFL tasks, retelling a story and role-playing, were adapted from the test. For the third AFL task, students were asked to read aloud words selected based on classroom observation and discussion with the participating teachers. Since students were preparing for the upcoming TEM 4 test and read new words in every lesson, there was no training or practice required to familiarize them with these three AFL activities. 35

AFL task one: retelling the story.

AFL task one—retelling the story (Appendix I)—was an activity adapted from the

TEM 4 oral test. In this activity, students in groups of four listened to four short funny

English stories from Hill and Mallet (1968) and had to retell the stories in their own words. The four stories labeled A, B, C, and D were randomly assigned to students and each story was played twice. During this process, students were allowed to take notes.

If there was a word that few students were likely to know, an explanation could be given by the teacher. After listening to the stories, students were assigned the responsibility of evaluating their peers‘ story retelling using an evaluation form

(Appendix I). The written narrative copies of the four stories were given to the students as a reference for the evaluating. Students accomplished AFL task one following the procedures below:

1. Organizing the students into groups of four. In the case of an additional student,

a group of five was formed. Within this five-person group, three students and

two students worked together. When there were two surplus students, a group of

six was formed with two three-student groups working together. When there

was a surplus of three students, a group of seven was formed in which three

students and four students worked together.

2. Asking students to listen carefully to all four stories because the stories would

be randomly assigned.

3. After listening and note-taking, assigning the stories to the students and making

sure each member in the group got a different story.

4. Giving students time (5 to7 minutes) to prepare their own stories.

5. Retelling the stories in the order of A→B→C→D. When one student retold the

story, the other group members listened carefully and evaluated the retelling

using a checklist consisting of eight questions. The eight questions focused on 36

the five Ws (who, what, where, when, why), organization, details, and

summary of stories. Students chose ―Needs improvement‖, ―Average‖, or

―Good‖ in response to the eight questions to evaluate their peers‘ performances

and referenced the written narrative copies of the four stories.

AFL task two: word reading.

AFL task two—word reading (Appendix J)—was an activity designed to practice and test the accuracy of students‘ pronunciation. Pronunciation is an important component of a learner‘s ability in English. Inaccurate pronunciation causes confusion whereas accurate pronunciation enables better communication (Brown, 1991). In this activity, students in groups of four were given handouts with English words selected from their Integrated English textbook and had to read them out loud. Initially, students were not allowed to check their textbooks or any dictionaries. Following that, students answered four word-checking sheets labeled A, B, C, D randomly assigned to group members. Group members took turns saying the corresponding English words while other members gave the meanings of the English words in Chinese. A pronunciation dictionary helped students check the accuracy of their peers‘ pronunciation and stress. The electronic Longman Pronunciation Dictionary was applied with a full coverage of 225,000 British and American pronunciations, functioning as a ―self study lab‖ to improve students‘ pronunciation. It can record students‘ pronunciation and the recording can be played back to check the difference between their pronunciation and the model ones (Goodwin, 2001). Students accomplished the tasks following the procedures below:

1. Organizing the students into groups of four. In the case of an additional student,

a group of five was formed. Within this five-person group, three students and

two students worked together. When there were two surplus students, a group of

six was formed with two three-student groups working together. When there 37

was a surplus of three students, a group of seven was formed in which three

students and four students worked together.

2. Asking students to read the word sheets carefully.

3. Giving students time (5 to 7 minutes) to prepare.

4. After preparation, group members were randomly selected to answer one of four

word-checking sheets (A, B, C, or D). During the checking process, one group

member told the corresponding Chinese meaning to the students. One student

responded. The other two members checked his pronunciation and stress

accuracy with the checking sheets and assistance from the Longman

Pronunciation Dictionary. Students marked their peers‘ pronunciations as

―Right‖, ―Wrong‖ or ―Word Forgotten11‖ for the ten English words on every

word sheet.

5. Students took turns A→B→C→D to finish the activity.

AFL task three: role playing.

AFL task three—role playing (Appendix K)—was the final activity adapted from the TEM 4 test. In this task, students in groups of four played different roles in three role-playing tasks randomly assigned to them. Every group performed the task in front of the other groups who simultaneously evaluated, in terms of the evaluation form they were given, their peers‘ abilities to construct conversations in real-life situations.

Students finished the tasks following the procedures below:

1. Organizing the students into groups of four. In the case of an additional student,

a group of five was formed. Within this five-person group, three students and

two students worked together. When there were two surplus students, a group of

six was formed with two three-student groups working together. When there

11 Word Forgotten indicates that students forgot either the English pronunciation or the matching Chinese meanings. 38

was a surplus of three students, a group of seven was formed in which three

students and four students worked together.

2. The students were instructed to prepare three role-playing tasks and to perform

in front of the entire class.

3. Giving the students time (7 to 10 minutes) to prepare their role-playing.

4. Every group performed in front of the class.

5. Every group evaluated the other groups‘ performances and wrote down methods

for improvement on the evaluation form.

Data Collection Procedures

Teachers’ data collection.

The teachers‘ data collection in Phase One was through the teacher questionnaire.

This was administered to elicit teachers‘ initial perceptions of and beliefs about formative assessment. Nine second year teachers answered the questionnaire separately. Some teachers finished in class and gave them back immediately whereas others took the questionnaires home and gave them back to the researcher later12.

Students’ data collection.

The students‘ data collection in Phase One was through:1) the pre-study student questionnaire; 2) three AFL tasks; and 3)the post-study student questionnaire.

1. Pre-study student questionnaire.

The pre-study student questionnaire was administered to 74 students to identify their learning difficulties in oral skills. All 74 students answered the pre-study student questionnaire in class together. Students self-evaluated their oral skills proficiency after the researcher introduced and explained the TEM 4 oral English criteria

(Appendix E).

12The inconsistency that teachers answered the questionnaire at different times is caused by teachers‘ different schedules. 39

2. Three AFL activities.

Three AFL activities were implemented in the classroom to improve students‘ oral

English language skills. Two teachers provided their class time for this purpose. The same intermediate and high level students completed all three activities. Fifty-five students from these levels participated in task one and forty-one participated in tasks two and three. Retelling the story checklist, word reading checklist, and role playing checklist were used to collect data. The role playing checklist included two parts. The first part required students‘ evaluations of their peers‘ performances. The second part required students‘ suggestions for their peers‘ further improvement.

The researcher video-recorded the third AFL task, the role play, because videotaped recordings enable teachers‘ and students‘ careful analysis of what happened during task procedure extensively and repeatedly after the tasks (Pinnegar &

Hamilton, 2009).

3. Post-study student questionnaire.

The post-study student questionnaire was administered in order to understand students‘ attitudes towards AFL. All 41 intermediate and high level students answered in class together the post-study student questionnaire on the effectiveness of formative assessment in promoting oral speaking skills.

Data Analysis Procedures

The raw data of Phase One included results from the teacher questionnaire, pre- and post-study student questionnaires, and the three AFL tasks. The data were analyzed with SPSS 17.0 in descriptive statistics to explore teachers‘ general understanding of formative assessment, and in frequency counts to explore students‘ individual difficulties in learning oral English, and their general perceptions of AFL and its effectiveness. The following table provides a detailed description of how various data were analyzed. 40

Table 3 Phase One: Procedures of Data Analysis

Data Resources Ways of analysis

Teacher questionnaires quantitative data analyzed with SPSS 17.0

Student questionnaires quantitative data analyzed with SPSS 17.0

Task One: retelling the story quantitative data analyzed with SPSS 17.0

Task Two: word reading quantitative data analyzed with SPSS 17.0

Task Three: role playing quantitative data analyzed with SPSS 17.0 qualitative data analyzed with Nvivo 8.0

Phase Two

Participants

Two teachers participated in Phase Two of this study. They were teachers in charge of English language teaching to the intermediate and high level students. Twenty intermediate and 21 high level students participated in Phase Two of this study. Four students from the intermediate level class and 19 students from the high level class voluntarily did the interview13.

Instruments

The interview.

One purpose of this study was to explore teachers‘ and students‘ perceptions of assessment for learning. In order to understand their thoughts about AFL, a qualitative interviewing method was used. Scholars have reported that interviews are helpful in understanding participants‘ perceptions of the research (Benson, 2005; Block, 1998;

Gao, Li, & Li, 2002; Patton, 2002). Patton (2002) claimed that ―qualitative interviewing begins with the assumption that the perspective of others is meaningful,

13Since beginner level students did not participate in the three tasks in Phase One, they did not do the interviews. 41

knowable and able to be made explicit … aim[s] to capture the perspectives of program participants, staff, and others associated with the program‖ (p.341). This indicates that the interview is an effective approach and, in this case, can elicit participants‘ thoughts on AFL.

The researcher took a standardized open-ended interview approach in this study. A standardized open-ended interview is ―a set of questions carefully worded and arranged with the intention of taking each respondent through the same sequence and asking each respondent the same questions with essentially the same words‖ (Patton,

2002, p.342). Reasons for choosing this approach were its consistency across interviewers and the fact that this can facilitate comparisons of the interviewees‘ responses. In order to obtain high-quality information by talking with people who have information, the researcher combined the standardized interview with a conversations strategy which allowed the interviewer‘s flexibility to explore certain questions in depth. Teachers and students were interviewed separately in accordance with interview protocols (Appendices L and M). The interviews began with questions about teachers‘ and students‘ opinions of oral English language teaching/learning and then linked this to the topic of AFL. The researcher interviewed teachers and students in Chinese.

Data Collection Procedures

Teachers’ data collection.

The collection of data from the teachers in Phase Two was through interviews. The researcher interviewed both the intermediate and the high level teachers who completed the whole study. Their interviews followed the interview protocol and were videotaped.

Students’ data collection.

The collection of data from the students in Phase Two was also through interviews, as well as through the AFL task three evaluation forms. The researcher interviewed 42

students who participated in the whole study and who had volunteered to express their opinions on it. Their interviews followed the interview protocol and were videotaped.

Data Analysis Procedures

All the interviews were videotaped for the purpose of further analysis. The main advantage of video/digital recording is the continuous sequential record of action as it occurred in real time (Erickson & Wilson, 1982; Erickson, 1992; Goodman

&McGrath, 2002). The videotaped interviews allowed the researcher to watch any number of times and analyze teachers‘ and students‘ perceptions of AFL. Therefore, in this study, the interviews were videotaped from beginning to end.

The researcher translated and transcribed the interview data before data analysis.

There are two modes of data transcription: naturalism and denaturalism (Oliver,

Serovich & Mason, 2005). Naturalism means ―every utterance is transcribed in as much detail as possible‖ and denaturallism means ―idiosyncratic elements of speech

(e.g., stutters, pauses, nonverbals, involuntary vocalizations) are removed‖ (para. 2).

The researcher chose the denaturalized mode of transcription to understand the research reality instead of naturalized transcription that represents the real world situation.

Following that, the researcher used a content analysis approach to analyze the interview data. Content analysis refers to ―any qualitative data reduction and sense- making effort that takes a volume of qualitative material and attempts to identify core consistencies and meanings‖ (Patton, 2002). It enables researchers to analyze interview responses effectively (Babbie, 1979; Burnard, 1991; Couchman & Dawson,

1990; Fox, 1986). One major step in content analysis is to code all data into numerous meaningful categories. Babbie (2001) emphasized coding as "the process of transforming raw data into a standardized form" (p.309). The final step in content analysis is to elaborate and expand the data into each category when finished coding it. 43

In this study, the researcher followed three procedures to analyze the interview data. First of all, the researcher used NVivo 8 to transcribe the teachers‘ and students‘ interviews. Secondly, the researcher coded the interviews in terms of the literature discussed in the previous chapter. Once the data was coded, the researcher selected and compared the data in the analytic categories of interest for the study among the interviewees.

Triangulation of Data from Phases One and Two

The researcher triangulated the data collected in Phases One and Two for analysis.

The principal reason for combination is that combining both the qualitative and quantitative data collected can bring greater insight to the problem than when applying each type of data separately (Creswell & Plano Clark, 2009; Johnson & Onwuegbuzie,

2004).

Summary

In this chapter, the rationale of this study (i.e. applying AFL in a mixed method framework to improve Chinese students‘ oral English language skills) was presented.

Then, the three research questions were stated. Following that, there was a detailed description of the research design, including participants, research instruments, data collection, and data analysis of the three phases. In the following chapter, the presentation and discussion of data collected in the two main phases will be presented.

44

CHAPTER FOUR: PRESENTATION OF RESULTS

Introduction

In this chapter, the results of the data collected in phases one and two of the research will be presented. In Phase One, the results of the teacher questionnaire Likert items are reported using descriptive statistics. Results of student questionnaires and the three AFL tasks are reported in frequency counts. In Phase Two, teachers‘ and students‘ perceptions of AFL are categorized in terms of content analysis. The two types of teachers‘ comments include both positive attitudes towards and concerns about AFL. The two types of students‘ comments are on AFL effectiveness and AFL without grading. Following that, student participants‘ comments on and suggestions about their peers‘ role-play performances are presented. A summary ends this chapter.

Presentation of Results -Phase One

Results of Teacher’s Questionnaire

In this section, the results of the teacher questionnaires are presented by summarizing them in separate tables (Tables 4 and 5).Tables 4 and 5 describe responses to the teacher questionnaires that attempted to elicit their perceptions of formative assessment. Table 4 describes the distribution of the numbers of teachers on the selection of each question and Table 5 describes teachers‘ answers using descriptive statistics. The mean calculated in provide the researcher the general tendency of teachers‘ selection of all the questions. The description is divided into two parts. Part one (Q1 to 14) was designed to elicit teachers‘ perceptions of the relationship between learners and assessment. Part two (Q15 to 29) aimed to elicit their perceptions of the relationship between assessment and learning. Nine teacher 45

participants completed the questionnaires (Appendix F). The general tendency is that teachers believed students preferred various assessment methods (7 out of 9 teachers) and teachers believed that their feedback can effectively promote students‘ learning

(also 7 out of 9). However, teachers were divided widely in their responses to several of the questions. A detailed description is given in the following sections.

Part one: teachers’ attitudes towards learners and assessment (Q1 to 14).

In this part, answers demonstrating teachers‘ attitudes towards learners and assessment are presented. Teachers agreed or strongly agreed when answering certain questions in this section, such as self-evaluation fostering learning (Q1), peer feedback useful for learning (Q2), students‘ active involvement (Q4), students‘ input on how work is assessed (Q5), various assessment methods applied (Q8), and students‘ brainstorming of what successful tasks are (Q12). However, they were divided when answering other questions.

Teachers were slightly divided between ―agree‖ and ―disagree‖ on the following questions: students providing input into how work should be assessed (Q6), students believing that assessment benefits learning (Q10), students believing in grades (Q13), and students knowing the values of activities towards their final grades (Q14). In response to question 7 asking whether most students prefer to be assessed by various methods, two teachers disagreed and seven agreed.

Teachers showed no consensus in their responses to two questions. In response to

Question 3, whether ‗Most students value peer feedback in learning‘, four teachers disagreed but five teachers agreed. Question 11, when asked ‗Using one primary assessment method allows students to perfect their performances‘, five teachers agreed and one strongly agreed, but three teachers disagreed.

46

Part two: teachers’ attitudes towards learners and assessment (Q 15 to 29).

In this part, answers demonstrating teachers‘ attitudes towards assessment and learning are presented. Teachers agreed or strongly agreed with 6 questions in this section. They are: students‘ need for positive feedback (Q16), the importance of teachers‘ comments (Q18), teachers‘ awareness of students‘ development (Q20), teachers‘ attention to their students‘ development (Q21), helpful effects of evaluation forms (Q22), and the impact of assessments on learning (Q26).

Teachers‘ responses showed no one tendency to the following questions: teachers‘ feedback promoting learning (Q15), teachers‘ and students‘ understanding of assessment goals (Q19), evaluation form usages (Q23), applying various assessment methods continually (Q24), assessment contributing to learning (Q27), and students‘ success in achieving the requirements (i.e. Tem 4 oral English criteria)(Q28).

In particular, teachers were divided greatly on three questions in this section. For question 17, when teachers were asked whether ‗students need to receive negative feedback in order to progress‘, five teachers disagreed whereas three teachers agreed and one strongly agreed. On question 25, when asked to respond to ‗one primary method of assessment should be used continually‘, four teachers disagreed and five agreed. On question 29, three teachers disagreed that ‗students are aware of their oral

English proficiency‘, whereas five agreed and one strongly agreed.

No teacher selected ‗Strongly disagree‘ for any of the questions on the teacher questionnaire.

47

Table 4 Teacher Questionnaire Responses ( n= 9)

Strongly Disagree Agree Strongly Disagree Agree

1.Students self-evaluation fosters 5 4 learning

2.Students peer review feedback 5 4 is useful for learning.

3.Most students value peer 4 5 feedback in learning.

4.Students should be actively 3 6 involved in their assessment.

5.It is important for students to 4 5 have input on how their work is assessed.

6.It is important for students to 1 6 2 provide input on how their work is assessed.

7.Most students prefer to be 2 7 assessed by various methods.

8.Varied assessment methods 4 5 give more students a chance to do well.

9.Most students prefer 6 2 1 assessment by one primary method.

10.Most students believe 1 6 2 assessment contributes to learning.

11.Using one primary assessment 3 5 1 method allows students to perfect their performances.

12.It's good for students to 7 2 brainstorm what successful tasks should 'look like'.

13.Most students believe grades 1 5 3 more than anything else.

48

14.It is helpful for students to 1 7 1 know activities‘ worth towards final grades.

15.Teacher feedback is effective 1 1 7 in promoting student learning.

16.Students need to receive 6 3 positive feedback in order to progress.

17.Students need to receive 5 3 1 negative feedback in order to progress.

18.Teachers' comments to 3 6 students are important in student learning.

19.Teachers and students should 1 4 4 share an understanding of assessment goals.

20.Effective teachers need to be 3 6 aware of student development.

21.Assessment focusing directly 7 2 on student development is best.

22.Evaluation forms aid in 9 communicating specific evaluation criteria to students.

23.Using evaluation forms aids in 1 7 1 recording student evaluation.

24.Varied assessment methods 1 4 4 should be used continually.

25.One primary method of 4 5 assessment should be used continually.

26.Assessment may have an 8 1 impact on the course of student learning.

27.Assessment can contribute to 1 7 1 student learning.

49

28.Students can achieve the 1 7 1 requirements of the teaching program.

29.Students are aware of their 3 5 1 oral English proficiency.

50

Table 5 Descriptive Statistics of Teacher Questionnaire

n Minimum Maximum Mean Std. Deviation Question1 9 3 4 3.44 .527 Question2 9 3 4 3.44 .527 Question3 9 2 3 2.56 .527 Question4 9 3 4 3.67 .500 Question5 9 3 4 3.56 .527 Question6 9 2 4 3.11 .601 Question7 9 2 3 2.78 .441 Question8 9 3 4 3.56 .527 Question9 9 2 4 2.44 .726 Question10 9 2 4 3.11 .601 Question11 9 2 4 2.78 .667 Question12 9 3 4 3.22 .441 Question13 9 2 4 3.22 .667 Question14 9 2 4 3.00 .500 Question15 9 2 4 3.67 .707 Question16 9 3 4 3.33 .500 Question17 9 2 4 2.56 .726 Question18 9 3 4 3.67 .500 Question19 9 2 4 3.33 .707 Question20 9 3 4 3.67 .500 Question21 9 3 4 3.22 .441 Question22 9 3 3 3.00 .000 Question23 9 2 4 3.00 .500 Question24 9 2 4 3.33 .707 Question25 9 2 3 2.56 .527 Question26 9 3 4 3.11 .333 Question27 9 2 4 3.00 .500 Question28 9 2 4 3.00 .500 Question29 9 2 4 2.78 .667

51

Result of Pre-study Student Questionnaire

In this section, the researcher presents the results of the pre-study student questionnaires by summarizing them in separate tables. Question one determines the number of female and male participants (see chapter 3, table 2). Sixty-eight female and six male students participated, a total of 74 student participants.

Table 6 presents students‘ self-evaluation on achieving the objectives of the curriculum. A percentage of 71.6 of the students reported that they could not achieve them. The highest failure rate was reported by intermediate level students with a percentage of 72.7%, whereas the lowest was reported by high level students with a percentage of 70.6%.

Table 6 Q2. As a second year English-major student, do you think you can achieve the goals /objectives listed in the curriculum? (In percentage, n = 74)

Achieving the Goals/Objectives Yes No of the Curriculum? Beginner Level 27.8% 72.2%

Intermediate Level 27.3% 72.7%

High Level 29.4% 70.6%

Total 28.4% 71.6%

Table 7 summarizes the frequency of students speaking English in the classroom among the three levels. An average of 37.8% of the students claimed that they often spoke English in the classroom whereas 59.5% reported that they seldom did. A great difference in frequency was found among the three levels of students, especially when comparing beginner-level to high-level students. Only 22.2% of beginner-level

52

students claimed that they often spoke English in the classroom whereas 44% of high level students reported that they frequently did. Of the beginner-level students, 66.7% claimed that they seldom spoke English in the classroom whereas 56% of the high- level students claimed they rarely did. Only beginner-level students reported that they spoke English in the classroom ―Most of the time‖ or they ―Never‖ did. Neither intermediate nor high level students claimed to speak English in the classroom ―Most of the time‖ or ―Never‖.

Table 7 Q3. How often do you speak English in the classroom? (In percentage, n =

74)

Frequency of Speaking Never Seldom Often Most of the English time

Beginner Level 5.5% 66.7% 22.2% 5.5%

Intermediate Level --- 59% 41% ---

High Level --- 56% 44% ---

Total 1.4% 59.5% 37.8% 1.4%

Table 8 presents how students described their oral English proficiency. Among all participants, 64.9% self-evaluated their English oral proficiency as ‗Average‘ and 33.8% claimed that their oral English was ‗Not good‘. A significant difference was found between beginner-level and intermediate students. 44% of beginner-level students claimed that their oral English proficiency was ‗Average‘ whereas the majority of intermediate students (72.7%) believed they achieved an ‗Average‘. 50% of beginner-level and 27.3% of intermediate students claimed that their oral English was ‗Not good‘. Only beginner-level students reported ‗Very good‘ (5.6%) when self- assessing their oral proficiency. No student claimed ‗Excellent‘ in oral proficiency.

53

Table 8 Q4. How would you describe your speaking proficiency in English?

(In percentage, n = 74)

Describe Your Spoken Not Well Average Very Good Excellent English

Beginner Level 50.0% 44.4% 5.6% ---

Intermediate Level 27.3% 72.7% ------

High Level 29.4% 70.6% ------

Total 33.8% 64.9% 1.4% ---

Table 9 summarizes students‘ difficulties in speaking oral English through 5 multiple choice questions including one open-ended choice question14. Most students reported that they were facing multiple challenges in their oral English language learning. Over 25% of the students claimed that ‗Confidence in talking‘, ‗Fluency‘, and ‗Pronunciation‘ were their biggest challenges. Students also added ‗Other‘ challenges. Over 90% of the students claimed that vocabulary was their biggest challenge. Another ‗Other‘ challenge added was grammar. Beginner-level students identified only one item as a challenge and 38.9% of them believed that ‗Fluency‘ was the most difficult one. Intermediate and high level students reported multiple items -

‗Confidence in talking‘, ‗Fluency‘, and ‗Pronunciation‘ as their difficulties in spoken

English.

14The open choice question allowed students to add their own difficulties in oral English learning. If their difficulties were not ―Confidence in talking‖, ―Fluency‖, and ―Pronunciation‖, they could write ―Other‖. 54

Table 9 Q5. What is (are) the challenge(s) you face when speaking English?

(Percentages in parentheses, n = 74)

Student Participants Chief Challenge(s) Other Challenge(s)

Beginner Level Fluency (38.9%) Vocabulary (97.3%)

Intermediate Level Confidence in talking, Vocabulary (91.2%) Fluency, and Pronunciation (27.3%)

High Level Confidence in talking, Vocabulary (93.5%) Fluency, and Pronunciation (35.5%)

Total Confidence in talking, Vocabulary (90.4%) Fluency, and Pronunciation (25.7%)

Results of Three AFL Tasks

In this section, the researcher presents the results of the three AFL tasks in the following order: 1) retelling the story, 2) word reading, and 3)role playing.

Intermediate and high level students participated in all three AFL tasks. Fifty-five students completed task one, and 41 students completed tasks two and three (see chapter three, table 2).

1. Task one: retelling the story.

Table 10 summarizes the results of task one, retelling the story, done by 55 students: 21 intermediate and 34 high level students. Every student had to answer all 8

55

questions on the checklist. Therefore, a total of 168 assessments were collected from intermediate and a total of 272 assessments from high level students.

At the intermediate level, 70% of the participants marked their peers‘ story retell as

‗Good,‘ 26% marked ‗Average‘ and only 2% thought that their peers‘ performances

‗Need improvement.‘

In contrast, only 54% of high level students evaluated other students‘ performances as

‗Good,‘ which was much lower than the 70% at the intermediate level. A percentage of 12 of the students at the high level thought that their peers‘ performance still ‗Need improvement‘ while only 2%at the intermediate level thought so. Clearly, high level students were more cautious when marking their peers‘ performances.

Table 10 Results of Retelling the Story (Percentages in parentheses, n = 55)

Student Participants Total Good Average Need Assessment Improvement Number Intermediate Level (n = 21) 168 118 (70%) 43 (26%) 4 (2%)

High Level (n = 34) 272 147 (54%) 93 (34%) 33 (12%)

Total 440 265 (60%) 136(31 %) 37(9%)

2. Task two: word reading.

Table 11 describes the results of the 20 intermediate and 21 high level students who did the second task – the word reading activity. Every student had to say out loud all 10 words on the word sheet. Therefore, a total of 200 assessments were collected from the intermediate level students and a total of 210 assessments from the high.

In this task, 66% of the intermediate students marked their peers‘ pronunciations as ‗Right‘, 18% thought their peers‘ pronunciations ‗Wrong‘, and 16% reported their

56

peers ‗Word Forgotten‘ for words they could not say. Among high level students, 54% marked pronunciation as ‗Right‘ and 30% of the students could not remember the words during the assessment. So, in this section, the ‗Word Forgotten‘ rate of high level students is higher than that of intermediate students.

Table 11 Results of Word Reading (Percentages in parentheses, n = 41)

Student Participants Total Right Wrong Word Assessment Forgotten Number Intermediate Level (n = 20) 200 132 (66%) 36 (18%) 32 (16%)

High Level (n = 21) 210 113 (54%) 36 (16%) 61 (30%)

Total 410 245 (60%) 72(17%) 93 (23%)

3. Task three: role playing.

There are two parts in the role playing checklist. The first part, from Tables 11 to

14 inclusive, summarizes the results of the 20 intermediate and 21 high level students who finished the third task, the role playing activity. The second part, peer feedback, includes students‘ comments and suggestions for their peers‘ further improvement in oral English.

Table12 summarizes whether students role play performances were clearly audible to all listeners in the classroom. Both intermediate and high level student groups could produce clear and understandable performances: 83% of the intermediate level and

88% of the high level students claimed that their peers‘ performances were clearly audible.

57

Table 12 Q1.Does this group speak audibly and clearly? (Percentages in parentheses, n = 41)

Student Participants Yes No

Intermediate Level (n = 20) 17 (83%) 3 (13%)

High Level (n = 21) 18 (88%) 3(13%)

Total 35 (85%) 6 (15%) Table13 presents percentages of students who performed in an intelligible manner.

Role playing requires students to tell a story logically in their presentation. From the table, it can be seen that both intermediate and high level students performed the role playing tasks in understandable and reasonable ways at 93% and 95% respectively.

Table 13 Q2.Does this group make the role play understandable, reasonable?

(Percentages in parentheses, n = 41)

Student Participants Yes No

Intermediate Level (n = 20) 19 (93%) 1 (3%)

High Level (n = 21) 20 (95%) 1 (5%)

Total 39 (95%) 2 (5%)

Tables14 and 15 present percentages of students who used Chinese or paused during their performances. 17% of high level students reported that their peers used

Chinese during their performances whereas only 3% of intermediate students claimed so. When students were required to evaluate their peers‘ pauses during their performances, both intermediate and high level students reported that 17% of their peers paused on one or more than one occasion during the role playing.

58

Table 14 Q3. Does this group use Chinese during the role play? (Percentages in parentheses, n = 41)

Student Participants Yes No

Intermediate Level (n = 20) 1 (3%) 19 (93%)

High Level (n = 21) 4 (17%) 17 (83%)

Total 5 (12%) 36 (88%)

Table 15 Q4. Does this group pause a lot during the role play? (Percentages in parentheses, n = 41)

Student Participants Yes No

Intermediate Level (n = 20) 3 (17%) 17 (80%)

High Level (n = 21) 4 (17%) 17 (83%)

Total 7 (17%) 34 (83%)

In the role playing checklist, students were also required to assess their classmates‘ performances and to write down suggestions for improvement. Their comments and suggestions were collected, analyzed and sent to two teachers15. First, intermediate level students, divided into six groups, commented on their classmates‘ performances (see Table16). Table16 indicates that most intermediate students offered simple and short suggestions to their classmates. For example, group 4 was given a simple two-word suggestion: ―speak loudly‖.

15The researcher sent teachers the students‘ comments and suggestions on the role plays as delayed peer feedback one month before students took the TEM 4 Oral Test. 59

Table 16 Intermediate Level Students’ Comments and Suggestions

Student Participants Comments for Groups and Individuals

Group 1 ―make the dialogue longer‖

Group 2 ―good, don‘t pause a lot in the conversation‖

Group 3 ―They‘d better not speak Chinese and should organize language more logically.‖

Group 4 ―speak loudly‖

Group 5 ―better slow down‖ Group 6 ―speak louder and speak more‖

Second, high level students, divided into seven groups, commented on their classmates‘ performances (see Table17). Table17 indicates that most high level students can offer more detailed suggestions to their peers. For example, group1 was given suggestions like ―should slow down‖, and ―should have a conclusion‖. In addition, students even suggested that the ‗stepfather‘ in group 1 ―speak a little slowly‖ in order to ―understand him better‖.

60

Table 17 High Level Students’ Comments and Suggestions

Student Participants Comments for Groups and Individuals

Group 1 ―speak too quickly, should slow down; should have a conclusion to solve the problem; they can slow down to make them understand better; and we hope the ‗stepfather‘ [one male student]can speak a little slowly, so that we can understand him better.‖

Group 2 ―should be easy, short, brief and clear; and we hope that the two students can speak a little louder, and make their dialogue more vivid.‖

Group 3 ―there are many ‗er…‘ (pauses) in it and we hope they can speak more fluently.‖

Group 4 ―the conversation is disordered and can‘t be understood; speak louder and some of the speakers should improve their pronunciation.‖

Group 5 ―pay attention to the pronunciation of some words, such as ‗perfect‘.‖

Group 6 ―speak loudly and the arrangement of speaking should be divided equally and every partner should be active.‖

Group 7 ―job[job description task] should be distributed properly.‖

Therefore, it is not difficult to observe that compared to intermediate students, high level students were more capable of pointing out their peers‘ mistakes and could give more detailed suggestions for improvement. The following section will present results

61

from the post-study student questionnaire.

Results of Post-Study Student Questionnaire

In this section, the researcher will present the results of the post-study student questionnaire by summarizing them in separate tables.

Table18 presents percentages of students who had previously done/not done tasks similar to the AFL tasks. An average of 64% of the students reported that they had not done similar kinds of tasks before.

Table 18 Q1. Have you done these kinds of tasks before? (Percentages in parentheses, n = 41)

Student Participants Yes No

Intermediate Level (n = 20) 9 (47.4%) 11 (52.6%)

High Level (n = 21) 5 (25%) 16 (75%)

Total 14 (35.9%) 27 (64.1%)

Table19 summarizes the results of asking whether students thought AFL tasks helped them to identify their own difficulties in oral English learning. Most intermediate students (89.5%) and all (100%) of high level students acknowledged that their difficulties in oral English learning had been identified by AFL tasks.

62

Table 19 Q2 (a). Do you think AFL tasks help you identify your own difficulty in oral

English learning? (Percentages in parentheses, n = 41)

Student Participants Yes No

Intermediate Level (n = 20) 18 (89.5%) 2 (10.5%)

High Level (n = 21) 21 (100%) 0 (0%)

Total 39 (97.8%) 2 (2.6%)

Table 20 summarizes how students‘ learning improved by performing the three AFL tasks. Of the intermediate students, 27.7% reported that their oral English fluency improved. Of the high level students, 33.3% claimed that their confidence in talking improved. Students could add ‗Other‘ as an option for improvements not listed in the open-ended question. Both intermediate and high level students reported that their perceptions of vocabulary study had improved by completing the AFL tasks.

Table 20 Q2 (b): If yes, what has improved? (Percentages in parentheses, n = 41)

Student Participants Confidence Fluency Pronunciation Other

Intermediate Level (n = 20) 1 (5.5%) 6 (27.7%) 1 (15.5%) 12 (61.3%)

High Level (n = 21) 7 (33%) 3 (14.8%) 2 (11.1%) 9 (41.1%)

Total 8 (22.2%) 9 (20%) 3 (8.8%) 21 (49%)

63

Table 21 Q3. Have you improved in being able to identify strengths and weaknesses in your classmates’ spoken/oral English? (Percentages in parentheses, n = 41)

Student Participants Yes No

Intermediate Level (n = 20) 18 (89.5%) 2 (5.3%)

High Level (n = 21) 19 (90%) 2 (10%)

Total 37 (90.9%) 4 (5.1%)

Tables22 to 25 summarize students‘ views on the AFL tasks. A significant majority of students claimed that the AFL exercises were useful, interesting, and motivating by percentages of 97.4%, 89.7% and 84.6%. Generally speaking, students held positive attitudes about the AFL exercises.

Table 22 Q4 (a): What is your view about the AFL exercises? (Percentages in parentheses, n = 41)

Student Participants Useful Not Useful

Intermediate Level (n = 20) 20 (100%) 0 (0%)

High Level (n = 21) 20 (95%) 1 (5%)

Total 40 (97.4%) 1 (2.6%)

64

Table 23 Q4 (b): What is your view about the AFL exercises?(Percentages in parentheses, n = 41)

Student Participants Interesting Not Interesting

Intermediate Level (n = 20) 16 (78.9%) 4 (21.1%)

High Level (n = 21) 20 (95%) 1 (5%)

Total 36 (89.7%) 5 (10.3%)

Table 24 Q4 (c): What is your view about the AFL exercises?(Percentages in parentheses, n = 41)

Student Participants Motivating Not Motivating

Intermediate Level (n = 20) 15 (73.7%) 5 (26.3%)

High Level (n = 21) 19 (90%) 2 (10%)

Total 34 (84.6%) 7 (15.4%)

Table 25 Q4 (d): What is your view about AFL exercises?(Percentages in parentheses, n = 41) Student Participants Other No Other Perception

Intermediate Level (n = 20) 6 (31.6%) 14 (68.4%)

High Level (n = 21) 5 (25%) 16 (75%)

Total 11 (28.2%) 30 (71.8%)

65

Presentation of Results - Phase Two

In this section, the researcher presents the results of the student and teacher interviews conducted in Phase Two. Twenty-five students and two teachers were interviewed in order to further explore students‘ and teachers‘ attitudes towards AFL and its effectiveness. The interviews took place in Chinese and the researcher translated their answers into English and transcribed them. Following that, a content analysis was conducted to reveal their opinions of AFL.

The Students’ Interviews

Of the 41 students who completed all three of the AFL tasks, 25 were interviewed.

Among these 25 students, 4 were at the intermediate and 21 at the high level. They answered three questions from the questionnaire protocol: (1) Which AFL task do you like best? Why? (2) Do you like the idea of assessment without grades? Why? and (3)

Do you think that doing more similar activities in your future study will improve your oral proficiency?

In analyzing the content of the interviews, the researcher looked at two points.

First, students were asked to express their preference for different AFL tasks. This is important because their opinions demonstrated how effective AFL tasks can be in helping students improve in certain areas of difficulty in oral English. Second, students were asked to express their attitudes towards AFL feedback which involved no grading.

Students’ preferences for AFL tasks.

In the interviews conducted after students participated in the three AFL tasks, certain task preferences were found. Two students preferred task one, story retelling, because, as one respondent said: ―I can practice listening and speaking at the same

66

time‖. One high level student made a similar comment and explained how task one helped her identify her difficulty in learning:

―I think task one is very helpful. It made me realize that I cannot even understand

what people are saying in the recording! For me, it is not only language input but also

language output. It is a very good learning strategy to listen first and repeat in your

own words.‖

(第一个.发现自己连磁带都听不懂。我觉得它就即是一种信息的输入,也是一

种信息的输出。听了然后再复述,这样就是听了原声的,然后再用自己的方式表达

出来,这是一种输入也是一种输出。我觉得这样很有效)

Some students preferred the second task—word reading—because, as one student said: ―it relates (oral English) learning to test (TEM 4) and it enlarges your [my] vocabulary‖.

Most of the students, however, reported that they preferred the third task, role play, mainly because ―it requires you to speak more and it practices our oral English‖. The following excerpt from a student explained the popularity of the task:

―I feel it (task 3 role play) improves our practical communication ability. We are

different from English education majors; we are translation major students. [In

comparison to them]We need to pay more attention to practicing English listening

and speaking. I believe that the key point of [oral English] learning is

communication, not learning grammar and passing TEM 4. I feel that if we can

communicate with each other in a relaxing environment, our oral English proficiency

would improve.‖

((第三个)我觉得它增强我们的实践能力。我们是英语翻译专业的学生,跟英教

不大一样,我们要增强口头表达和听力这样的一些能力。我觉得学英语最重要的就

是和别人交流。重点不在于过专 4,去学那些语法还有做题什么的。我觉得像今天

67

这样,大家在一个比较轻松的环境然后以英语的形式进行交流,能对我们的能力有

整体的大的提高)

This answer also demonstrated that students were aware that they needed to improve their English communicative ability, not only their test-taking skills. In addition, some students even realized that the third task could bring another benefit besides the improvement of oral English skills. As one student said:

―This kind of activity (task three) will not only improve individual oral

proficiency but will also inform us how to work as a team, which is cooperation

under certain circumstances.‖

(因为这样的活动不仅能提高个人的口语水平还能提高团队合作精神。在特定

的环境下去创造一种合作)

The three AFL tasks were performed and preferred by students in terms of their own oral English learning difficulties. In the next section, students express their thoughts towards certain features of AFL.

Students’ attitudes towards AFL feedback with no grading.

In the interviews, students also expressed their attitudes towards AFL peer feedback without grading (i.e., the aspect of providing feedback other than grades). All students had positive attitudes towards it. The reasons why students liked it can be seen in the following comments: ―[AFL peer feedback] informs us about our own weaknesses and strengths in learning (AFL 的学生评价让我们了解到我们学习中的

优势和弱点)‖; and ―It is much easier for me to accept [AFL peer feedback].

According to the (AFL) suggestions, we can improve our difficulties –solve our problems in learning (我觉得我更能接受 AFL 的学生评价。我能接受意见并且解 68

决学习中困难)‖ Students also expressed the idea that AFL peer feedback without grading was encouraging. As one student said: ―(Regular evaluation with) grades will put us under a lot of pressure, but AFL without grades gives no pressure. I like (AFL feedback without scores) because it would not blow our confidence and we do not feel pressure in language learning(平常的练习都是注重分数,这样就会有很多压力。

但是 AFL 没有分数就能让我们学习的时候没有压力更有信心)‖.

To sum up, students held positive attitudes towards the AFL tasks and they felt comfortable receiving feedback from their peers. The following section will discuss teachers‘ perceptions of AFL.

The Teachers’ Interviews

Two teachers in charge of intermediate and high level students were interviewed.

They shared their opinions about implementing AFL in a Chinese university context by answering two questions: (1) Do you think that using AFL tasks in the classroom continuously will help students to improve their oral English? and (2) Will you use

AFL tasks in the future in your teaching? The teachers‘ answers formed a descriptive analytical framework for analysis and two themes were categorized from their interview transcripts. They are: 1) teachers‘ positive attitudes about AFL and 2) their concerns about AFL.

Teachers’ positive attitudes about AFL.

Both teachers confirmed the effectiveness of the AFL tasks. The teacher of intermediate level students mentioned that ―it [AFL] can help [students] to identify

[their] problems. I think by keeping on using (AFL tasks), my students will be more aware of their difficulties and their abilities to achieve the required proficiency level will be improved. Moreover, they become autonomous learners and will finally master the language (AFL 能帮助学生们找到自己的问题所在。我认为持续的用 AFL 活 69

动,学生们能够更好的认识到学习中的困难并且能够帮助他们达到需要达到的

英语口语成度要求。最终会帮助学生成为自主学习者)‖. The other teacher, of the high level students, expressed a similar opinion. In addition, she noted that ―[for AFL tasks] I did not expect there were so many students. They are so motivated to participate, especially when you ask them to make conversation (我原本没有想到会

有那么多的学生参加。学生们很喜欢 AFL 而且在编对话的时候积极参与)‖. She also noted that students could learn from their peers in the AFL tasks. She said:

―I think the best part of this method (she takes role play as one example) is that

students participated when others were performing. They can listen carefully and

check themselves to see whether they had made the same mistakes. A lot of students

do not pay attention to what others say during oral practices. Some good students

may prepare their own, but some low-level students neither prepare their own nor

listen to other students‘ performances.‖

(我觉得这个方法好在让所有的学生都参与进来。他们会很专注的听并审

视其他人的错误。之前大多数的同学根本不会听其他同学的发言。层次好的

同学会准备自己的发言,稍差的同学既不准备也不会注意其他同学发言)

The researcher observed different attitudes in the two teachers about implementing

AFL in the classroom. They both agreed to use AFL in future classroom instruction.

The intermediate level teacher already had plans to implement AFL in the classroom in the future. She said: ―I need four classes to finish one unit. I think if I try, I can finish teaching within three classes and leave one lesson for students to practice oral English.

So in that case, we can have oral English practice every week‖. The teacher of high level students said: ―If I use AFL for students to practice their oral English, I will use it once a month‖. So the difference between the two teachers was in the frequency of the 70

use of AFL tasks in the classroom. The intermediate level teacher will use AFL once a week whereas the high level teacher will use it only once a month.

Besides expressing their positive attitudes about AFL, teachers also expressed their concerns.

Teachers’ concerns about AFL.

Teachers‘ concerns about AFL mainly focused on design and implementation. The intermediate level teacher said that: ―AFL tasks really ask teachers to design them well. We want the students to think that the tasks are interesting, helpful, and at the same time, improving their proficiency. It really requires the teachers to spend a lot of time on the design (AFL 活动的设计对教师要求很高。我们想让学生觉得有趣,

有帮助,同时能提高口语的水平。但是教师需要华很长的时间来进行设计)‖. The high level teacher commented on AFL implementation in the classroom context. She mentioned that: ―The most important thing is how to make AFL tasks fit into the classroom context. Take task two for example. I like this task. However, I do not think

I will ask the students to spend so much time on pronunciation. I have the curriculum to finish. You know the situation when examinations are coming. At that time, I prepare students for those examinations through reading comprehension and grammar study. How can I use AFL during that time? (最重要的是怎样才能让 AFL 适合课堂

教学。那第二个任务来说,我个人很喜欢这个任务。问题是我觉得我不会让学

生花那么多的事件在发音上。我有课程任务要完成。当考试期来临的时候,我

会让学生做阅读理解和语法联系,这样我不会用 AFL)‖ To sum up, the concerns were how to design AFL tasks and how these AFL tasks could be implemented in the classroom.

71

Summary

In this chapter, the quantitative and qualitative data from Phase One and Phase

Two were presented. The data from Phase One was presented in terms of the descriptive statistics and frequency counts. Also in Phase One, peer feedback on task three, role play, was offered by intermediate and high level students. The data from

Phase Two was presented in terms of content analysis. Themes emerging from teachers‘ and students‘ comments on AFL were also presented. The following chapter will discuss the findings related to the research questions.

72

CHAPTER FIVE: DISCUSSION OF RESULTS

Introduction

In this chapter, the results presented in Chapter 4 are discussed. The discussion is directly related to the three research questions given in Chapter 3. The quantitative data obtained from the questionnaires and AFL tasks in Phase One was analyzed using descriptive statistics. The qualitative data from interviews with teachers and students in Phase Two was analyzed using content analysis. The triangulation of quantitative and qualitative data leads to an enriched understanding of the participants‘ perceptions of AFL. This chapter also includes a section discussing the consistency and inconsistency of these results with others presented in the previous literature review.

Answers to Research Question One

Research question one: Does AFL help students of varying proficiency levels to improve their oral English skills?

The results of question one can be explained from the quantitative analyses of the pre- and post-study student questionnaires. In the pre-study student questionnaire, difficulties in learning oral English were identified by three levels of students. Over

25% of the students claimed that they lacked ―Confidence in talking‖, and encountered

―Fluency‖ and ―Pronunciation‖ difficulties. In the post-study student questionnaire, intermediate and high level students reported having fewer difficulties in learning oral

English. A majority of intermediate and high level students, an overwhelming 97.8%, claimed that the AFL tasks helped them identify their own difficulties in oral English learning. Their learning in the areas of ―Confidence in talking‖, ―Fluency‖, and

―Pronunciation‖ improved by percentages of 22%, 20%, and 8.8% respectively after participating in the AFL tasks in the self-report. 73

However, in this study, only certain levels of students thought they had improved their oral skills through AFL. Specifically, intermediate and high level students benefited from doing the AFL tasks. On the other hand, it is unclear whether or not beginner level students could have benefited from the AFL tasks because no data were collected from them.

Answers to Research Question Two

Research question two: What evidence can be found that AFL benefits learning?

The results of this second question can be demonstrated from the quantitative analysis of the post-study student questionnaires and the qualitative analysis of the student and teacher interviews.

In the post-study student questionnaires, students reported that AFL tasks not only allowed them to identify their weaknesses and difficulties in oral English learning but also provided opportunities to speak English more confidently, fluently, and accurately. In addition, students reported that AFL tasks without giving marks - just peer feedback, helped them to reduce or even eliminate the heavy psychological burden brought by traditional evaluation methods. Furthermore, students stated that the AFL tasks adapted from the format of the TEM 4 oral English test not only prepared them for the TEM test but also helped them to collaborate with their classmates and improved their communicative skills in a friendly and supportive classroom environment.

In the student interviews, students expressed six reasons why they liked the three

AFL activities. Students claimed that the AFL tasks could enhance: 1) English speaking, 2) English listening, 3) vocabulary learning, 4) the language learning environment in that it becomes more authentic and friendly, 5) peer collaboration in the classroom, and 6) TEM 4 oral test preparation. 74

In the teacher interviews, teachers stated that AFL tasks could identify students‘ learning difficulties, motivate students to practice oral English, and help them improve their oral English skills. Since they had witnessed the effectiveness of AFL with their own students, teachers also expressed their willingness to use AFL tasks in their future instruction.

Answers to Research Question Three

Research question three: What are teacher and student perceptions of formative assessment?

The results of this third research question can be demonstrated from the quantitative analyses of the teacher and post-study student questionnaires and from qualitative analysis of the student and teacher interviews.

In the teacher questionnaires, teachers believed that different types of assessment approaches including formative assessment could benefit English language learning.

In addition, most of the teachers believed that various assessment methods should be used widely.

In the post-study student questionnaires, students held positive attitudes about the

AFL tasks. Students reported that doing the AFL tasks was effective in helping them identify their own weaknesses and identifying their future targets in oral language learning. Over 84% of the students claimed that the AFL tasks were useful, interesting, and motivating after completing the three tasks.

In student and teacher interviews, both groups showed positive attitudes towards

AFL. In the student interviews, their preferences for different AFL tasks reveal the effectiveness of the tasks in resolving various learning difficulties. In addition, students also expressed the opinion that AFL peer feedback encouraged learning better than traditional grading. In the teacher interviews, teachers also commented that AFL could point out students‘ weaknesses and promote their learning through peer 75

collaboration. The following section will discuss the consistency and the inconsistency of the results with the literature.

Consistency and Inconsistency of Results and Literature

Consistency of Results and Literature

Consistency between questionnaire results and literature review.

The results of all three questionnaires are found to be consistent with those discussed in the previous literature review from three perspectives: 1) test-driven study; 2) failure to achieve required criteria and 3) the effectiveness of AFL.

In the teachers‘ questionnaire, 89% of the teachers claimed that their students were extremely grade-driven English learners. In other words, teachers believe that students learn English only to receive high grades and pass tests whereas their practical communicative skills do not develop. This finding corresponds to studies that report students pay attention to marks that determine whether they can pass or fail national examinations (Liao & Wolff, 2009; Wang, 2005; Zhao & Cheng, 2002).

In the pre-study student questionnaires, 70% of the students claimed that they could not achieve the required standard of the curriculum. This indicates that the three levels of English major students perceived their ability to be below the required level of oral proficiency. This result corresponds to the findings of He (1999) and Wen

(2001) about Chinese students‘ frequent failure to reach the required standards of the curriculum.

In the post-study student questionnaires, over 80% of the intermediate and high level students reported that AFL tasks helped them to reduce or eliminate their difficulties in learning oral English. For example, in the pre-study student questionnaires, 27.3% of intermediate and 35.5% of high level students reported

‗Confidence in talking‘, ‗Fluency‘, and ‗Pronunciation‘ to beproblem areas in learning

English. However, after doing the three AFL tasks, these same students reported that 76

their ‗Confidence in talking‘ (22%), ‗Fluency‘ (20%), and ‗Pronunciation‘ (8.8%) improved. These results correspond to the findings of Black and Wiliam (1998 a) which indicate that formative assessment can effectively improve students‘ performance and learning.

Consistency between interview results and literature review.

The results of both student and teacher interviews reveal the consistency with those reported in the literature review.

In student interviews, students expressed their dissatisfaction with the current test- oriented system and non-functional oral English instruction. Students described themselves as simply learning machines for passing various English examinations without mastering the English language. This corresponds to Liu and Carless (2006), and Zhao‘s (2007) findings about the problems of Chinese EFL pedagogy in China.

Their findings criticize the misuse of and issues related to large scale tests.

In the teacher interviews, teachers explained their inability to promote practicing spoken English in the classroom. Since there was a preference for vocabulary and grammar teaching to prepare students for the large-scale tests, teachers gradually changed the instructional materials into examination practice papers which leaves limited time for oral English instruction. This finding also corresponds to the results of studies of current Chinese classroom oral English teaching (He 1999, Wen, 2001, Zhao

& Cheng, 2002).

Inconsistency of Results and Literature

Some inconsistency is found between the results of research question one and those reported in the literature. This first question checks whether AFL could help students of varying proficiency levels to improve their oral English skills. The results indicate that AFL can benefit intermediate and high level students. Black and Wiliam

(1998 b) reported that low-level students benefited more than other level students in 77

learning through formative assessment. However, since neither the teacher nor students at the beginner level participated in the AFL tasks, it is impossible to know the effectiveness of AFL for beginner level students and teachers and their perceptions of AFL in this study.

Sun and Cheng (2002) identified four items that cause the low level of motivation among low level students to participate in a new activity: 1) their lack of ability to express themselves in English, 2) their suspicious attitudes towards the effectiveness of a new methodology, 3) the overwhelming pressure of examinations characterized by discrete language points, and 4) they simply don‘t know for sure what‘s going on (p.

12).

The researcher believes the principal reason is that low level students hold erroneous beliefs about learning oral English. For example, most of the low level students in this study seemed to believe that they could successfully pass the TEM 4 oral test by intensively practicing only one week before they take the test. In addition, they claimed that it was more important to focus on preparation for their final exam than to do the AFL tasks. They showed no interest and refused to participate in the three AFL tasks.

To sum up, students and teachers from intermediate and high-level classes showed positive attitudes towards AFL. They believed that AFL could identify students‘ learning difficulties and help them to improve their oral English skills. In contrast, the low level students showed negative attitudes as they did not agree to participate in the three AFL tasks

Summary

This chapter discussed the results presented in Chapter Four. Three research questions were answered based on the triangulation of various data sources.

Consistency and inconsistency of results in this study with the literature were 78

discussed. In the following chapter, the conclusions, implications, limitations, and suggestions for future directions based on the findings of this study will be discussed.

79

CHAPTER SIX: CONCLUSIONS

Introduction

In this chapter, the findings of research questions and related findings are provided. The implications, limitations, further research directions, and contributions to AFL and assessment approaches in the Chinese context will be discussed. Questions proposed and answered in this study will be addressed in a summary of findings and implications. Following that, limitations and further directions of this study will be discussed. In the final section, a brief description of the contribution of this study will be set forth.

Summary of Findings

Main Findings

This mixed methods study was designed to examine the effectiveness of AFL tasks in learning oral English and to explore student and teacher perceptions of AFL in a

Chinese university context. Quantitative and qualitative data were collected from classroom observations, questionnaires, AFL task results, and interviews. Findings from a triangulation analysis of the data reveal answers to the three research questions asked.

Firstly, findings from pre-study and post-study student questionnaires demonstrate that AFL is effective in reducing oral learning difficulties amongst the intermediate and high level university English majors who did the tasks. After students‘ weaknesses were identified, the researcher implemented three AFL tasks to help students improve their oral English learning. Over 84% of the intermediate and high level students who

80

participated claim that the AFL tasks were helpful in improving their learning. No data is available on the beginners as they did not do the tasks.

Secondly, findings from the post-study student questionnaires and student and teacher interviews support the argument that the AFL tasks can benefit the oral English learning of Chinese university level English as Foreign Language (EFL) students. In the results of the post-study student questionnaires, students claim that the three AFL tasks helped to improve their speaking confidence, oral fluency, and pronunciation accuracy. In addition, in the student interviews, students identified which of the three

AFL tasks they prefer and stated that peer feedback lessened their psychological stress about language learning and testing. Furthermore, in the teacher interviews, teachers confirm the effectiveness of AFL and express their willingness to apply AFL in their classroom.

Thirdly, findings from the teacher and post-study student questionnaires demonstrate that both teachers and students (at the intermediate and high levels but not at the beginner level) hold positive attitudes towards AFL. Teachers agree in the questionnaire that various assessment approaches including formative assessment should be applied to encourage oral English learning. This statement is further confirmed in the teacher interviews. Students claim in the post-study student questionnaires that doing the AFL tasks not only helped them identify their difficulties in spoken English but also helped them to improve in their areas of difficulty.

Related Findings

In this study, the intermediate and high level students show more enthusiasm about participating in the three AFL tasks than the beginners. The researcher believes that this phenomenon is caused by the mistaken beliefs of these lower level students about learning oral English. From the researcher‘s observation, these students are test-

81

driven learners and they believe that their oral English proficiency can be improved within a short period of time (i.e. one week before TEM 4 oral examination) and with little effort. Because of these kinds of beliefs about oral English learning, beginner level students never practiced any of the AFL tasks.

From information in the teacher interviews, it can be seen that teachers believe that in order to speak English fluently, students should not rely only on their teachers and on practice inside the classroom. They suggest that students need to learn more by themselves and gradually become masters of their own language learning. For example, the teacher of the high level students says that:

―At present, they [students] are not under pressure and they do not realize the

importance of oral English. They always rely on teachers, schools, classroom

learning. If you only rely on 90-minutes of classroom learning, it is impossible to

learn and practice a lot. The biggest challenge is how to make students autonomous

learners, how to guide students to learn and practice by themselves.‖

(现阶段,学生们并没有意识到英语口语的重要性。他们通常依赖老师,学校

的学习。可是实际上如果只是通过课堂上的 90 分钟学习还远不够。最大的挑战是

学会自学)

This indicates that teachers are aware of the importance of autonomous learning and are concerned about the challenges of turning their students into autonomous learners.

Implications

Before this study was conducted, research into applying AFL to improve students‘ oral English skills at the university level in China was limited. This is mainly because most of the studies focus on the effects of large-scale testing. However, China needs to 82

adopt various evaluation methods to better understand its complex linguistic phenomena. This research is therefore a pioneering one on implementing the formative assessment approach. The findings of this study have implications for counterbalancing test-driven oral English instruction and for developing communicative skills.

As reported in the literature review chapter, Liao and Qin (2000) mentioned that

57% of English major students were not satisfied with test-driven oral English instruction, and 94% of the participants reported that they learned no communicative skills from their compulsory oral English courses at university. In other words, Liao and Qin‘s survey revealed the contradiction between instruction for the development of test-taking proficiency and for the development of students‘ practical communicative skills.

This study explores solutions for this kind of contradiction through adapted AFL tasks. Two of the AFL tasks, story retelling and role play, were adapted from the original TEM4 oral examination. When doing the two adapted AFL tasks, students were required to make comments and offer feedback on their classmates‘ performances instead of making traditional grade evaluations. Providing peer feedback instead of marks not only reduces the students‘ levels of stress and psychological burdens but also creates a much friendlier environment encouraging English speaking. Therefore, on the one hand, the adapted AFL tasks keep the function of developing students‘

TEM4 oral test proficiency; on the other hand, the tasks encourage more speaking, thus developing students‘ communicative skills.

83

Limitations

This study has some limitations that the researcher hopes can be improved upon in future studies.

Firstly, the data were collected within a limited population in a Chinese context.

Nine teachers and 74 students in this study (not all of whom participated in the whole study) may not be representative of all Chinese teachers and students, even though the case study approach could provide in-depth insight into a social phenomenon

(Yin,2009). Therefore, the results cannot be generalized to similar contexts without further research.

Secondly, beginner level students did not complete the whole study. Some data, were, therefore, missing for this level. A limitation to the study is that the beginner level was not included. In fact, they finished only the pre-study student questionnaire and did not participate in the three AFL tasks, complete the post-study student questionnaire, nor do the interview. Therefore, only their oral learning difficulties are identified in the pre-study student questionnaire. Their attitude about doing the AFL tasks is clearly negative.

Thirdly, although delayed peer feedback was provided to students for the TEM 4 oral test, the effect of the feedback is unclear. The researcher sent delayed feedback because it can benefit high level students more than immediate feedback (Kulik

&Kulik, 1988). However, since none of the students who participated in the study received a mark over 80 in the TEM 4 written exam to become eligible to do the TEM

4 oral test, the effect of the delayed feedback is not clear.

84

Future Research Directions

This study answers three research questions and it points out further research directions. First, how to motivate low level students to speak English more and whether AFL can effectively improve their oral English needs more exploration. In this study, the only data collected from low level students concerned their difficulties.

Since they did not do the tasks, the effectiveness of AFL and these students‘ perceptions remain unknown. Therefore, if teachers have plans to use AFL tasks in their instruction in the future, there are some questions to ask - Do they need to make the tasks compulsory and require low level students to complete them? And if so, would the AFL tasks be as effective for them as for the students at higher levels who participated in this study voluntarily?

Secondly, another future research direction is whether the findings of this study can extend beyond English major to non-English major students. Compared to English majors, non-English majors are reported to be facing bigger challenges in learning oral

English, one of which is known as the ―Mute English‖ phenomenon (Liao & Wolff,

2009; Liu& Carless, 2006; Zhao, 2007) described in the second chapter. Currently, oral

English instruction remains one of the most heated issues in English instruction at the university level in China. Perhaps adapting and using AFL tasks in the oral English instruction of non-English majors could benefit their learning.

Thirdly, the exploratory implementation of AFL in a Chinese university level classroom requires further research into combining assessment, teaching, and learning.

Colby-Kelly and Turner (2007) and Pellegrino et al. (2001) claimed that assessment cannot be designed and implemented without consideration of three important elements - learning, teaching, and testing linking together as an assessment bridge. To achieve this goal, two challenges that appeared from the results of this study are 1) how teachers can solve the problem of the very time-consuming designing of AFL and 85

2) how students can focus on developing their oral communicative capacity in addition to test-taking skills.

Contribution of This Study

Contribution to AFL

This study contributes to the literature on formative assessment, especially through conducting AFL to improve oral English skills in a particular context. In the Chinese context, most of the research is focused on the consequences of large-scale tests, whereas the effects of formative assessment remain under-investigated. Studies of large-scale tests cannot broaden the investigative scope of formative assessment. As a result, this study investigated the effects of formative assessment in improving the oral

English language skills of Chinese university English majors and explored teachers‘ and students‘ perceptions of this type of assessment as well. The data collected demonstrate that AFL can benefit the intermediate and high level students who did the tasks. In addition, teachers and the students who did the tasks show positive attitudes towards AFL and confirm the effects of AFL tasks in improving learning. Therefore, this study broadens the potential use of formative assessment by conducting formative assessment tasks with Chinese university level students in a classroom context

(Cumming, 2004).

Contribution to Assessment Approaches in the Chinese Context

This study also contributes to the idea of using alternative assessment which is classroom-based to improve students‘ oral English skills. Teachers often use test materials in the classroom for students to practice for their oral English exams but these materials ignore developing students‘ practical oral English (He, 1999; Liao &

Qin, 2000; Wen, 2001). In this study, the researcher used adapted AFL tasks to fulfill the dual purposes of preparing students for the TEM4 oral tests and, at the same time,

86

developing students‘ communicative language skills. The positive feedback from the intermediate and high level students and teachers who participated in the study is encouraging for the future of using adapted AFL tasks in a Chinese classroom.

It is hoped that these contributions will motivate others to pursue formative assessment in contexts where teachers seldom apply it in instruction. Moreover, the researcher believes that more learners who are facing difficulties with oral English learning would benefit from formative assessment.

87

References

Abu-Alhija, F. N. (2007). Large-scale testing: Benefits and pitfalls. Studies in Educational Evaluation, 33(1), 50-68.

Assessment Reform Group. (1999).Assessment for learning: Beyond the black box.Cambridge: University of Cambridge School of Education.

Assessment Refoprm Gtroup. (2002). Testing, motivation and learning. Retrieved from www.assessment-reformgroup.org

American Educational Research Association, American Psychological Association, & National Council on Measurement in Education. (1999). Standards for educational and psychological testing. Washington, DC: American Educational Research Association.

Atkin, J.M., Black, P., & Coffey, J.E. (Eds.). (2001). Classroom assessment and the National Science Education Standards. Washington, DC: National Academy Press.

Babbie, E. (1979). The practice of social research (3rd ed.), Belmont, CA:Wadsworth/Thomson Learning.

Babbie, E. (2001). The practice of social research (9th ed.). Belmont, CA: Wadsworth/Thomson Learning.

Ballantyne, R., Hughes, K., & Mylonas, A. (2002). Developing procedures for implementing peer assessment in large classes using an action research process. Assessment & Evaluation in Higher Education, 27(5), 427-441.

Bangert-Drowns, R. L., Kulik, C.-L. C., Kulik, J. A., & Morgan, M. (1991). The instructional effect of feedback in test-like events. Review of Educational Research, 61, 213-238.

Benson, P. (2005). (Auto) biography and learner diversity. In P. Benson & D. Nunan (Eds.), Learners’ stories: Difference and diversity in language learning (pp. 4– 21). Cambridge: Cambridge University Press.

Black, P., & Wiliam, D. (1998a). Assessment and classroom learning. Education, 5(1),7-73.

Black, P., & Wiliam, D. (1998b). Inside the black box: Raising standards through classroom assessment. Phi Delta Kappan, 80(2).

Black, P., Harrison, C., Lee, C., Marshall, B., & Wiliam, D. (2003). Assessment for learning: Putting it into practice.Buckingham: Open Univ Press.

88

Block, D. (1998). Tale of a language learner. Language Teaching Research, 2(2), 148- 176.

Brackbill, Y., Blobitt, W.E., Davlin, D., & Wagner, J.E. (1963). Amplitude of response and the delay-retention effect.Journal of Experimental Psychology, 66(1), 57– 64.

Brookhart, S. (2003). Developing measurement theory for classroom assessment purposes and uses, Educational Measurement: Issues and Practices, 22(4), 5- 12.

Brookhart, S. M. (2004). Classroom assessment: Tensions and intersections in theory and practice. Teachers College Record, 106(3), 429.

Brown, H. D. (1991). Breaking the language barrier. Yarmouth, ME: Intercultural Press.

Brown, S. (Ed.). (1998).Peer assessment in practice.Birmingham: SEDA Publications.

Brown, J.D. (2001). Using surveys in language programs. Cambridge: Cambridge University Press.

Buhagiar, M. (2007). Classroom assessment within the alternative assessment paradigm: revisiting the territory. Curriculum Journal, 18(1), 39-56.

Burnard, P. (1991). A method of analysing interview transcripts in qualitative research. Nurse Education Today, 11(6), 461-466.

Butler, D., & Winne, P. (1995). Feedback and self-regulated learning: A theoretical synthesis. Review of Educational Research, 65(3), 245.

Carless, D. (2011). From testing to productive student learning: Implementing formative assessment in Confucian-heritage settings. New York: Routledge.

Cheng, L. (2009). The history of examinations: Why, how, what and whom to select? InL. Cheng & A. Curtis (Eds.), English language assessment and the Chineselearner (pp. 13-25). New York and London: Taylor & Francis.

Cheng, W., & Warren, M. (1997). Having second thoughts: student perceptions before and after a peer assessment exercise. Studies in Higher Education, 22(2), 233- 239.

Cheng, W., & Warren, M. (2005). Peer assessment of language proficiency. Language Testing, 22(1), 93-121.

Cizek, G. J. (2009). An Introduction to Formative Assessment. In H. L. Andrade & G. J. Cizek (Eds.), Handbook of Formative Assessment (pp. 3-17). New York: Routledge.

89

Clariana, R. B., Wagner, D., Rohrer-Murphy, L. C. (2000). A connectionist description of feedback timing.Educational Technology Research and Development,48(3), 5–21.

Cole, D. (1991). Change in self-perceived competence as a function of peer andteacher evaluation.Developmental Psychology, 27(4), 682-688.

Colby-Kelly, C. & Turner, C.E. (2007). AFL research in the L2 classroom and evidence ofusefulness: Taking formative assessment to the next level.Canadian Modern LanguageReview, 64(1), 9-38.

Couchman, W & Dawson, J.(1990). Nursing and health-care research: the use and applications of research for nurses and other health care professionals. London:Scutari.

Creswell, J.,& Plano Clark, V. L. (2009). Designing and Conducting Mixed Methods Research. Thousand Oaks, CA: Sage.

Crooks, T. J. (1988). The impact of classroom evaluation practices on students. Review of Educational Research, 58, 438-481.

Cumming, A. (2004). Broadening, deepening, and consolidating. Language Assessment Quarterly, 1, 5 - 18.

Dong, H. (1998). Do not learn "Mute English". People's Daily. Retrieved from http://www.people.com.cn/9807/10/current/newfiles/e1070.html

Dörnyei, Z., & Taguchi, T. (2010). Questionnaires in second language research: Construction, administration, and processing (2nd Ed.). New York: Routledge.

Elliot, S., & Branden, P.J. (2000). Educational assessment and accountability for all students: Facilitating the meaningful participation of students with disabilities in district and statewide assessment programs. Madison, WI: Wisconsin Department of Public instruction.

Erickson, F., & Wilson, J. (1982). Sights and sounds of life in schools: a resource guide to film and videotape for research and education. East Lansing, MI: Institute for Research on Teaching, College ofEducation, Michigan State University.

Erickson, F. (1992). Ethnographic microanalysis of interaction. In M. D.LeCompte (Ed.), The handbook of qualitative research ineducation (pp. 201-225). San Diego, CA: Academic Press.

Falchikov, N. (1995). Peer feedback aarking: Developing peer assessment. Innovations in Education and Training International, 32(2), 175-187.

Falchikov, N. (2001). Learning together: peer tutoring in higher education.New York: Routledge.

90

Fenwick, T.,& Parsons, J. (1999). Incorporating peer assessment in adult education.Retrieved from http://www.ualberta.ca/~tfenwick/ext/pubs/peereval.htm

Firestone, W., Mayrowetz, D., & Fairman, J. (1998). Performance-bases assessment and instructional change: The effects of testing in Maine and Maryland. Educational Evaluation and Policy Analysis, 20(2), 95-113.

Fishbach. A. & Eyal, T., & Finkelstein, S. R. (2010). How positive and negative feedback motivate goal pursuit. Social and Personality Psychology Compass, 4, 517-530.

Fox, D. J.(1982). Fundamentals of research in nursing (4th ed.).Norwalk, CT: AppletonCentury Crofts.

Fulcher, G. & Davidson, F. (2007).Language testing and assessment: An advanced Resource book, London: Routledge.

Fulcher, G. (2010, April). An introduction to assessment for learning. Retrieved from: http://languagetesting.info/features/afl/formative.html

Gao, Y., Li, Y., & Li, W. (2002). EFL learning and self-identity construction: Three cases of Chinese college English majors. Asian Journal of English Language Teaching, 12. Retrieved from http://cup.cuhk.edu.hk/ojs/index.php/AJELT/article/view/657

Gardner, J. (2006). Assessment and learning. Thousand Oaks, CA: SAGE.

Genesee, F., & Upshur, J. A. (1996). Classroom-based evaluation in second language education. Cambridge: Cambridge University Press.

Gibbs, G. & Simpson, C. (2005). Conditions under which assessment supports students‘ learning. Learning and Teaching in Higher Education, 1(1), 3-31.

Gillham, B. (2000). Case study research methods. London: Continuum. Gipps, C. (1999). Socio-cultural aspects of assessment. Review of research in education, 24, 355-392.

Gipps, C. V. (1994). Beyond Testing: towards a theory of educational assessment. London: Falmer Press.

Goodman, R. M., & McGrath, P. (2002). Editing digital video. New York: McGraw- Hill.

Goodwin, J. (2001). Teaching pronunciation. In M. Celce-Murcia (Ed.),Teaching English as a second or foreign language (3rd ed., pp. 117-138). Boston: Heinle & Heinle.

Greene, J. C. (2007). Mixed methods in social inquiry.San Francisco: Jossey-Bass.

91

Haladyna, T. M. (2002). Supporting documentation: Assuring more valid test score interpretation and uses. In. J. Tindal& T. M. Haladyna (Eds),Large-scale assessment programs for all students: Validity, technical adequacy, and implementation(pp. 89-108). Mahwah, NJ: Lawrence Erlbaum.

Hanrahan, S., & Isaacs, G. (2001). Assessing Self-and Peer-assessment: the students views. Higher Education Research & Development, 20(1), 53-70.

Harlen, W., & Winter, J. (2004). The development of assessment for learning: Learning from the case of science and mathematics. Language Testing, 21(3), 390.

Hattie, J. A. (1999, June). Influences on student learning. (Inaugural professorial address, University of Auckland, New Zealand). Retrieved from

Hattie, J. & Timperley.H. (2007). The Power of feedback. Review of Educational Research, 77, 81-112.

He, N. (1999). A discuss of English major oral English instruction. Foreign Language World, 3, 40-43.

Herman, J.L. & Golan, S. (1993). The effects of standardized testing on teaching and schools. Educational Measurement: Issues and Practice, 12 (2), 20-26.

Heritage, M. (2007). Formative assessment: What do teachers need to know and do? Phi Delta Kappan, 89(2), 140-145.

Hill, L., & Mallet, D. (1968). Advanced stories for reproduction. New York: Oxford University Press.

Hu, D. (2011). Washback effects of TEM4. The World and Chongqing, 28(1), 103-105.

Irons, A. (2007). Enhancing learning through formative assessment and feedback.London: Routledge.

Johnson, R.B., & Onwuegbuzie, A.J. (2004). Mixed methods research: A research paradigm whose time has come. Educational Researcher, 33 (7) 14-26.

Klenowski, V. (1995). Student self-evaluation processes in student-centred teaching and learning contexts of and England. Assessment in Education: Principles, Policy & Practice,2(2), 145–154.

Kluger & DeNisi. (1996). The effects of feedback interventions on performance: A historical review, a meta-analysis, and a preliminary feedback intervention theory. PsychologicalBulletin, 199(2), 254-284.

Koretz, D. M. (2008). Measuring up: What educational testing really tells us. Cambridge, MA: Harvard University Press.

Kulhavy, R. W. (1977). Feedback in written instruction. Review of Educational Research, 47(2), 211-232.

92

Kulik, J. A, & Kulik, C.-L. C. (1988). Timing of feedback and verbal learning. Review of Educational Research, 58, 79-97.

Leung, C. (2004). Developing formative teacher assessment: Knowledge, practice and change. Language Assessment Quarterly, 1(1). 19-41.

Li, S.H. (2005). English and Localized . Yinchuan: Ningxia Renmin Press.

Liao, L., & Qin, A. (2000). Investigation Report of English Major Instruction. Foreign Language World,3, 26-30.

Liao, Y., & Wolff, M. (2009). Mute English: the Latin of China. Retreived from http://chinaholisticenglish.com/?s=mute+English

Linn, R. L. (2002). Validation of the uses and interpretations of results of stateassessment and accountability systems. In G. Tindal & T. H. Haladyna(Eds.), Large-scale assessment programs for all students: Validity,technical adequacy, and implementation (pp. 27–48). Mahwah, NJ:Lawrence Erlbaum Associates.

Liu, N., & Carless, D. (2006). Peer feedback: the learning element of peer assessment. Teaching in higher education, 11(3), 279-290.

National College English Testing Committee. (1999). CET-SET Syllabus. Retrieved from http://www.en.cet.edu.cn/kouyudisplay.asp?id=281

National Test for English Major Testing Committee. (2000). TEM Oral Test Syllabus. Retrieved from http://wenku.baidu.com/view/bef0341b227916888486d729.html

Madaus, G. F. (1985). Public policy and the testing profession: You've never had it so good? Educational Measurement: Issues and Practice, 4(4), 5-11.

McMillan, J. H. (2003). Understanding and improving teachers‘ classroom assessment decision making: Implications for theory and practice. Educational Measurement: Issues and Practice, 22(4).34-43.

McNamara, T. & Shohamy, E. (2008). Language tests and human rights. International Journal of Applied Linguistics, 18 (1), 89-95.

Miller, D., & Lavin, F. (2007). But now I feel I want to give it a try: formative assessment, self-esteem and a sense of competence. Curriculum Journal, 18(1), 3-25.

Moss, P. (2003). Reconceptualizing validity for classroom assessment,Educational Measurement: Issues and Practices, 22, 13-25.

Murphy, R., & Broadfoot, P. (Eds.). (1995). Effective assessment and the improvement of education. London: Falmer.

93

Nagy, P. (2000). The three roles of assessment: Gate-keeping, accountability, and instructional diagnosis. Canadian Journal of Education, 25, 262-279.

NCEE (2011). Classroom Assessment for Student Learning: Impact on Elementary School Mathematics in the Central Region: Final Report.

O‘Donnell, A. M., & Topping, K. J. (1998). Peers assessing peers: Possibilities and problems. In K. J. Topping & S. Ehly (Eds.), Peer-assisted learning (pp. 255– 278). Mahwah, NJ &London, UK: Lawrence Erlbaum.

Oliver, D. G., Serovich, J. M., & Mason, T. L. (2005). Constraints and opportunities With interview transcription: Towards reflection in qualitative research. Social Forces, 84(2),1273-1289.

Olson, V. (1990). The revising processes of sixth-grade writers with and without peer feedback. Journal of educational research, 84(1), 22-29.

Olson, J., Bond, L., & Andrews, C. (1999). Annual survey of state student assessment programs: A summary report. Washington, DC: Council of Chief State School Officers.

Patri, M. (2002). The influence of peer feedback on self-and peer-assessment of oral skills. Language Testing, 19(2), 109.

Patton, M. Q. (2002). Qualitative research & evaluation methods (3rd ed.). Thousand Oaks, CA: Sage Publications.

Pellegrino, J., Chudowsky, N., & Glaser, R. (Eds.). (2001). Knowing what students know: The science and design of educational assessment. Washington, DC: National Academy Press.

Phelps, R. P. (2000). Trends in Large Scale Testing Outside the . Educational Measurement: Issues and Practice, 19(1), 11-21.

Pinnegar, S., &Hamilton, M. L. (2009). Self-study of practice as a genre of qualitativeresearch: Theory, methodology, and practice. Dordrecht: The Netherlands:Springer.

Pond, K., Ul-Haq, R., & Wade, W. (1995). Peer review: a precursor to peer assessment. Innovations in Education and Teaching International, 32(4), 314- 323.

Race, P. (1998). Practical pointers on peer assessment. In: S. Brown (ed.), Peer assessment in practice, Birmingham: SEDA.

Rea-Dickins, P., & Gardner, S. (2000). Snares and silver bullets: disentangling the construct of formative assessment. Language Testing, 17(2), 215-243.

Rea-Dickins, P. (2006). Currents and eddies in the discourse of assessment: A learning-focused interpretation, International Journal of Applied Linguistics, 16(2), 163-188. 94

Ryan, K. E., & Shepard, L. A. (2008). The future of test-based educational accountability. Mahwah,NJ: Lawrence Erlbaum Associates.

Salend, S. J., Whittaker, C. R., & Reeder, E. (1993). Group evaluation: A collaborative, peer-mediated behavior management system. Exceptional Children, 59(8), 203-209.

Schroth, M.L., & Lund E. (1993). Role of delay of feedback on subsequent pattern recognition transfer tasks.Contemporary Educational Psychology, 18, 15–22.

Shepard, L. A. (1991). Will national tests improve student learning?. The Phi Delta Kappan, 73(3), 232-238.

Shephard, L. A. (2000). The role of assessment in a learning culture, Educational Researcher, 29(7). 4-14.

Shepard. L. A. (2003). Reconsidering large-scale assessment to heighten its relevance to learning. In J. M. Atkin & J, E. Coffey (Eds.), Everyday assessmell1 in the science classroom (pp. 121-146). Arlington, VA: NSTA Press.

Shohamy, E. (1994). The use of language tests for power and control1. In Alatis, J.,editor, Georgetown University Round Table on Language and Linguistics. Washington, DC: Georgetown University Press, 57-72

Smith. J. K. (2003). Reconsidering reliability in classroom assessment and grading. Educational Measurement: Issues and Practice, 22(4), 26-33.

Stecher, B., Barron, S., Chun, T., & Ross, K. (2000). The effects of the Washington State Education Reform on schools and classrooms (CSE Tech. Rep. No. 525). Los Angeles: University of California, National Center for Research on Evaluation, Standards, and Student Testing (CRESST).

Stiggins, R. J. (2002). Assessment Crisis: The Absence of Assessment FOR Learning. The Phi Delta Kappan, 83(10), 758-765.

Stiggins, R. J. (1995). Assessment literacy for the 21st century. Phi Delta Kappan, 77, 238–245.

Sturges, P.T. (1972). Information delay and retention: Effect of information in feedback and tests. Journal of Educational Psychology, 63(1), 32–43.

Sturges, P.T. (1978). Delay of informative feedback in computer-assisted testing. Journal of Educational Psychology, 70(3), 378–387.

Swindell, L.K., & Walls, W.F. (1993). Response confidence and the delay retention effect.Contemporary Educational Psychology, 18, 363–375.

Thurlow, M.L., Ysseldyke, J.E., Gutman, S., &Geenen, K. (1998). An analysis of inclusion of students with disabilities in state standards documents (Tech. Rep.

95

No. 19). Minneapolis: University of Minnesota, National Center on Educational Outcomes.

Topping, K., & Ehly, S. (1998). Peer assisted learning. Mahwah, NJ: Erlbaum.

Topping, K. (2005). Trends in peer learning. Educational Psychology, 25(6), 631-645.

Topping, K. J. (2009). Peers as a source of formative assessment. In H. L. Andrade & G. J. Cizek (Eds.), Handbook of Formative Assessment (pp. 61-74). New York: Routledge.

Tuckman, B. W. (2003). The effect of learning and motivation strategies training on college studentsi achievement. Journal of College Student Development, 44(3), 430-437.

Turner, E. C. (in press). Classroom Assessment. In Fulcher, G. & Davidson, F. (Eds.),The Routledge Handbook of Language Testing. London and New York: Routledge.

Underhill, N. (1987). Testing spoken language: A handbook of oral testing techniques.Cambridge: Cambridge University Press.

Wajnryb, R. (1992). Classroom observation tasks: A resource book for language teachers and trainers. Cambridge: Cambridge University Press.

Wang, L. (2004). Teaching Foreign Languages in Communicative Classroom. Foreign Languages and Their Teaching, 187(10), 22-25.

Wang, J. (2005). Reasons of and solutions to the "deaf and mute English" problem. Journal of Jiangxi Radio & TV University,3, 69-71.

Way, W. D., Dolan, R. P., & Nichols, P. (2009). Psychometric challenges and opportunities in implementing formative assessment. Handbook of Formative Assessment, 297.

Wen, Q. (2001). National Spoken English Test for English Majors and teaching spoken English. Foreign Language World, 4, 24-28.

White, B., & Frederiksen, J. (2000). Metacognitive facilitation: An approach to making scientific inquiry accessible to all. In J. Minstrell and E. van Zee (Eds.). Inquiring into Inquiry Learning and Teaching in Science. (pp. 331-370). Washington, DC: American Association for the Advancement of Science.

Winne, P. H., & Butler, D. L. (1994). Student cognition in learning from teaching. In T. Husen & T. Postlethwaite (Eds.), International encyclopedia of education (2nd ed., pp. 5738-5745). Oxford, England: Pergamon.

Yin, B. (2007). Causes and strategies of Chinese college students‘ mute English. Journal of Chizhou Teachers College, 21(1), 109-112.

96

Yorke, M. (2003). Formative assessment in higher education: Moves towards theory and the enhancement of pedagogic practice. Higher Education, 45(4), 477-501.

Zhao, J., & Cheng, L. (2002). Exploring the relationship between Chinese university students‘attitudes towards the college English test and their test performance. English Language Assessment and the Chinese Learner, 190.

Zhao, B. (2007). Thoughts of mute English phenomenon. Journal of Ningbo University Educational Science, 29(003), 108-110.

97

Appendices

Appendix A: Teacher Participation Consent Form

CONSENT DOCUMENT FOR PARTICIPATION IN RESEARCH McGill University Integrated Studies in Education Investigator: Yang Song

1. Who is conducting the research? The research is being done with the approval of McGill University where the main researcher is currently a master student under Dr. Carolyn Turner‘s supervision.

2. Why is this study being done? The researchers want to find out Chinese teachers' and students' perceptions of Assessment for Learning (AFL) and the effectiveness of AFL in improving oral English learning.

3. What will happen? You will answer a questionnaire which is composed of two parts. Part one is background information and part two asks teachers' opinions about classroom-based formative assessment.

4. Confidentiality The researcher will ensure the confidentiality of your records. In her master‘s, she will not reveal your name or anything that could identify who you are.

5. Voluntary participation Participation is completely voluntary and you can withdraw at any time by choosing not to complete the study.

6. Questions? Do you have any questions? If you have any questions or worries regarding this study, or if any problems come up, you may ask the principal investigator Yang Song and email her [email protected] or reach her supervisor Dr. Carolyn Turner by emailing her at [email protected], calling her at 514-398-6984, or writing her at: Department of Integrated Studies in Education, McGill University,3700 McTavish Street, Montreal QC, Canada H3A 1Y2 I agree to participate in this study

Signature : ______Date : : ______

Investigator’s Signature: ______Date: ______

98

Appendix B: Teacher Interview Participation Consent Form TEACHER CONSENT DOCUMENT FOR INTERVIEW IN RESEARCH McGill University Integrated Studies in Education Investigator: Yang Song

1. Who is conducting the research? The research is being done with the approval of McGill University where the main researcher is currently a master student under Dr. Carolyn Turner‘s supervision.

2. Why is this study being done? The researchers want to find out Chinese teachers' and students' perceptions of Assessment for Learning (AFL) and the effectiveness of AFL in improving oral English learning.

3. What will happen? You will be interviewed with some questions about AFL and your thoughts about oral English teaching. Your interview will be videotaped.

4. Confidentiality The researcher will ensure the confidentiality of your records. If she writes master thesis about this research, she will not reveal your name or anything that could identify who you are.

5. Voluntary participation Participation is completely voluntary and you can withdraw at any time by choosing not complete the experiment.

6. Questions? Do you have any questions? If you have any questions or worries regarding this study, or if any problems come up, you may ask the principal investigator Yang Song and email her [email protected] or reach her supervisor Dr. Carolyn Turner by emailing her at [email protected], calling her at 514-398-6984, or writing her at: Department of Integrated Studies in Education, McGill University, 3700 McTavish Street, Montreal QC, Canada H3A 1Y2

The videotaped will only be used by the researcher and not shown publically. I agree to be videotaped during the interview Yes No. I agree to participate in this study Yes No.

Signature : ______Date : : ______

Investigator’s Signature: ______Date: ______

99

Appendix C: Student Participation Consent Form STUDENT CONSENT DOCUMENT FOR PARTICIPATION IN RESEARCH McGill University Integrated Studies in Education Investigator: Yang Song 1. Who is conducting the research? The research is being done with the approval of McGill University where the main researcher is currently a master student under Dr. Carolyn Turner‘s supervision.

2. Why is this study being done? The researchers want to find out Chinese teachers' and students' perceptions of Assessment for Learning (AFL) and the effectiveness of AFL in improving oral English learning.

3. What will happen? What will happen: First, you will answer a questionnaire. Then you will do AFL tasks in Integrated English class for 3 weeks. During this period, you will work in small groups and work with your peers on oral English tasks. Your performance will be videotaped. After all the tasks, you will answer another questionnaire. In the end, some of you will be interviewed and videotaped.

4. Confidentiality The researcher will ensure the confidentiality of your records. If she writes master thesis about this research, she will not reveal your name or anything that could identify who you are.

5. Voluntary participation Participation is completely voluntary and you can withdraw at any time by choosing not complete the study.

6. Questions? Do you have any questions? If you have any questions or worries regarding this study, or if any problems come up, you may ask the principal investigator Yang Song and email her [email protected] or reach her supervisor Dr. Carolyn Turner by emailing her at [email protected], calling her at 514-398-6984, or writing her at: Department of Integrated Studies in Education, McGill University,3700 McTavish Street, Montreal QC, Canada H3A 1Y2 The videotape will only be used by the researcher and not shown publically. I agree to be videotaped in class Yes No. I agree to participate in this study Yes No. ______

Signature : ______Date : : ______

Investigator’s Signature: ______Date: ______

100

Appendix D: Student Interview Consent Form STUDENT CONSENT DOCUMENT FOR INTERVIEW IN RESEARCH McGill University Integrated Studies in Education Investigator: Yang Song

1. Who is conducting the research? The research is being done with the approval of McGill University where the main researcher is currently a master student under Dr. Carolyn Turner‘s supervision.

2. Why is this study being done? The researchers want to find out Chinese teachers' and students' perceptions of Assessment for Learning (AFL) and the effectiveness of AFL in improving oral English learning.

3. What will happen? You will be interviewed to talk more about AFL. Your interview will be videotaped.

4. Confidentiality The researcher will ensure the confidentiality of your records. If she writes master thesis about this research, she will not reveal your name or anything that could identify who you are.

5. Voluntary participation Participation is completely voluntary and you can withdraw at any time by choosing not complete the experiment.

6. Questions? Do you have any questions? If you have any questions or worries regarding this study, or if any problems come up, you may ask the principal investigator Yang Song and email her [email protected] or reach her supervisor Dr. Carolyn Turner by emailing her at [email protected], calling her at 514-398-6984, or writing her at: Department of Integrated Studies in Education, McGill University, 3700 McTavish Street, Montreal QC, Canada H3A 1Y2

The videotaped will only be used by the researcher and not shown publically. I agree to be videotaped during the interview Yes No. I agree to participate in this study Yes No.

Signature : ______Date : : ______

Investigator’s Signature: ______Date: ______

101

Appendix E: TEM 4 Oral English Exam Criteria Chinese Teaching Program of English Department of the Institutions of Higher Learning (http://www.bfsu.edu.cn/chinesesite/gxyyzyxxw/zywj/yyjxdg.html#1)is written in Chinese. Translations of each grade are listed as following:

According to the teaching program, First year students: a). after listening to chunk for questions in certain text, can answer and repeat; b). can talk about daily life topics; c). can express themselves correctly with the right pronunciation, intonation without significant grammar errors;

Second year students: a). can talk to English speakers in general social occasions; b). can talk about daily life topics; c). can express themselves correctly with the right pronunciation, intonation without significant grammar errors;

Third year students: a). can communicate with others on familiar topics; b). can introduce China's places of interest and current situation and policies fluently and accurately to foreigners; c). can express their own views systematically, deeply and coherently;

Fourth year students: a) can communicate with foreigners about domestic and international major issues fluently and appropriately. b). can express their own views systematically, deeply and coherently;

102

Appendix F: Teacher Questionnaire This information will help us understand better your impressions of formative classroom assessment. All information will be treated in the strictest confidence. Thank you very much for your time.

Part 1: Your Background Information Please check [ ] the appropriate answer.

(1) Your gender: [ ] male [ ] female

(2) Your age: [ ] 20-29 [ ] 30-39 [ ] 40-49 [ ] above 50

(3) Number of years you have been teaching: [ ] 0-2 years [ ] 3-6 years[ ] 7- 10 years [ ] 11 years or more

(4) Number of hours you teach Integrated English per week: [ ] 0-4 hours[ ] 4-10 hours [ ] 11 hours or more

(5) Your academic background: [ ] Bachelors [ ] Bachelors plus Certificate [ ] Masters [ ] PhD [ ] other, Specify: ______

Part 2: Attitude toward Formative Assessment In the brackets [ ], please mark the following on a four point scale as: [1] strongly disagree [2] disagree [3] agree [4] strongly agree

1. Student self-evaluation fosters learning. [ ]

2. Student peer review feedback is useful for learning. [ ]

3. Most students value peer feedback in learning. [ ]

4. Students should be actively involved in their assessment. [ ]

5. It is important for students to have input on how their work is assessed. [ ]

6. It is important for students to provide input on how their work is assessed. [ ]

7. Most students prefer to be assessed by various methods. [ ]

8. Varied assessment methods give more students a chance to do well. [ ]

9. Most students prefer assessment by one primary method. [ ]

10. Most students believe assessment contributes to learning. [ ]

11. Using one primary assessment method allows students to prefer their performances. [ ]

12. It's good for students to brainstorm what successful tasks should 'look like.' [ ]

13. Most students believe grades more than anything else. [ ]

103

14. It is helpful for students to know activities worth towards final grade. [ ]

15. Teacher feedback is effective in promoting student learning. [ ]

16. Students need to receive positive feedback in order to progress. [ ]

17. Students need to receive negative feedback in order to progress. [ ]

18. Teachers‘ comments to students are important in student learning. [ ]

19. Teachers and students should share an understanding of assessment goals. [ ]

20. Effective teachers need to be aware of student development. [ ]

21. Assessment focusing directly on student development is best. [ ]

22. Evaluation forms aids in communicating specific evaluation criteria to students. [ ]

23. Using evaluation forms aids in recording student evaluation. [ ]

24. Varied assessment methods should be used continually. [ ]

25. One primary method of assessment should be used continually. [ ]

26. Assessment may have an impact on the course of student learning. [ ]

27. Assessment can contribute to student learning.[ ]

28. Students can achieve the requirement of teaching program. [ ]

29. Students are aware of their oral English proficiency. [ ]

104

Appendix G: Pre-Study Student Questionnaire 1. You are A. female B. male

2. As a second year English-major student, do you think you can achieve the goals/objects listed in the curriculum? A. Yes B. No

3. How often do you speak English in the classroom? A. Most of the time B. Often C. Seldom D. Never

4. How would you describe your speaking proficiency in English? A. Excellent B. Very good C. Average D. Not well

5. What is (are) the challenge(s) you face when speaking English? A. Confidence in talking B. Pronunciation C. Fluency D. Other (If your answer is not in the choices above please write down your own) My challenge(s) in spoken English is (are):

105

Appendix H: Post-Study Student Questionnaire 1. Have you done these kinds of tasks before? A. Yes B. No

2. Do you these tasks have helped improve your speaking skills? Yes No If yes, what has improved? A. Confidence in talking B. Fluency C. Pronunciation D. Other (If your answer is not in the choices above please write down your own) My improvement in spoken English is:

3. Have you improved in being able to identify strengths and weaknesses in your classmates' spoken/oral English? A. Yes B. No

4. What is your view about AFL exercises? A. Useful B. Not useful Why?

A. Interesting B. Not interesting Why?

A. Motivating B. Not motivating (boring) Why?

Other Or you have other perceptions on AFL, write them down: I think AFL is because

106

Appendix I: AFL Task One – Retelling the Story Duration: 90 minutes Aim: Oral fluency practice Summary: Students retell the stories they listened to

Preparation: The students will listen to four short passages which are suited to their language level. If there is a word which few students are likely to know, introduction will be given. Stories clearly labeled as A, B, C, D will be randomly assigned to four group members. Evaluation forms will be handed out to groups. Enough copies will be made so each student can receive the written narrative copies after the evaluation. Procedure: 1. Organize the students into groups of four. If the number of students is not divisible by four, then see the What to do about surplus students section, below. 2. Tell the students that you have prepared four different stories. Explain that you will ask the students to listen to all the stories, take down notes, and retell the randomly assigned stories. 3. After listening and note-taking, assign the stories to the students so that each member in the group gets a different story. It's best if stories A, B, C, D go around the group like this: A B

D C

4. Give the students time to prepare their stories. 5. Retell the stories in the order of A-B-C-D. When one student retells the story, other group members listen carefully and evaluate the reproduction with an evaluation form. 6. Hand out the written narrative copies of the four stories to the students. What to do about surplus participants Unfortunately, things get quite awkward if the number of students is not divisible by 4. Suggestions on how to cope with this are: Surplus 1 student: One group of 5, where two students work together to retell to another students.

Surplus 2 students: One group of 6 divided into two three-student groups. Surplus 3 students: One group of 7 divided into two groups. One group includes three students and the other four.

107

Materials The stories are from Advanced Stories for Reproduction by L. A. Hill.

Story A (Page 22, story 27) The women's college had a very small car-park, and as several of the teachers and students, and many of the students' boy-friends, had cars, it was often difficult to find a place to park. The head of the college, whose name was Miss Baker, therefore had a special place in the car-park for her own small car. There were white lines round it, and it had a notice saying, 'Reserved for Head of College'.

One evening, however, when Miss Baker got back to the College a few minutes before the time by which all students had to be in, she found another car in her parking space. There were two people in it, one of her girl-students and a young man. Miss Baker knew that the young man would have to leave very soon, so she decided to ask him to move his car a bit, so that she could park hers in the proper place for the night before going to bed.

As the young man's car was close to the railings, Miss Baker had to drive up beside it on the other side, where the girl was sitting. She therefore came up on this side, opened her own window and tapped her horn lightly to draw attention to the fact that she was there. The girl, who had head on the boy's shoulder, looked around in surprise. She was even more surprised when she heard Miss Baker say, 'Excuse me, but may I change places with you?'

Story B (Page 23-24, story 29) When sailors are allowed ashore after a long time at sea, they sometimes get drunk and cause trouble. For this reason, the navy always has naval police in big ports. When sailors cause trouble, the naval police come and deal with them.

One day, the naval police in one big seaport received an urgent telephone call from a bar in the town. The barman said that a big sailor had got drunk and was breaking the furniture in the bar. The petty officer who was in charge of the naval police guard that evening said that he would come immediately. Now, petty officers who had to go and deal with sailors who were violently drunk usually chose the biggest naval policeman they could find to go with them. But this particular petty officer did not do this. Instead, he chose the smallest and weakest-looking man he could find to go to the bar with him and arrest the sailor who was breaking the furniture.

Another petty officer who happened to be there was surprised when he saw the petty officer of the guard choose this small man, so he said to him, ‗Why don‘t you take a big man with you? You may have to fight the sailor who is drunk.‘

‗Yes, you are quite right,‘ answered the petty officer of the guard. ‗That is exactly why I am taking this small man. If you saw two policemen coming to arrest you, and one of them was much smaller than the other, which one would you attack?‘

108

Story C (Page 26, story 33) Jack was young, rich, and fond of girls. He hardly did any work, and spent most of his time enjoying himself.

One summer he bought a big motor-boat. As soon as it was ready to go to sea, he telephoned to one of the girls he had met somewhere, and invited her for a trip in his new motor-boat. It was the first of many successful invitations of this kind.

The way Jack used to invite a girl for a trip in his boat was like this: he would begin by saying, ‗Hullo, Laura (or whatever the girl‘s name was). I have just bought a beautiful new motor-boat, and I would like to take you out for a trip in it.‘

The girl‘s answer was usually cautious, because everybody in that part of the country knew Jack‘s reputation with girls. She would say something like this: ‗Oh, really? That‘s nice. What name have you given to the boat?‘

Jack would then answer, ‗Well, Laura, I have named it after you.‘

Of course, the girl would feel very proud that Jack had chosen her name for the boat out of the names of all his many girl-friends, and she would think that Jack must really love her. She would therefore be quite willing to accept his invitation to go for a trip in his motor-boat.

It would not be until she got down to the harbor and actually saw the boat that she would understand how cleverly Jack had tricked her. Because there in neat gilded letters on the boat she would see its name---‗After you‘.

Story D (Page 29, story 37) Mary was very fond of television, so when she met a young man who worked for a television company, she was very interested and asked him a lot of questions. She discovered that he had also worked for a film company, so she asked him whether there was any difference between film work and television work. ‗Well,‘ answered the young man, ‗there is one very big difference. If someone makes a mistake while a film is being made, it is, of course, possible to stop and do the same scene again. In fact, one can do it over and over again a lot of times. Mistakes waste time, money and film, but the audiences who see the film when it is finished don‘t know that anything went wrong. In a live television show, on the other hand, the audiences can see any mistakes that are made.

‗I can tell you a story about that. One day, a live television show was going on, and one of the actors was supposed to have been shot. He fell to the ground, and the camera moved somewhere else to allow time for me to run out with a bottle of tomato sauce to pour on to him to look like blood. But unfortunately the camera turned back to him before I had finished, and the audience saw me pouring the sauce on to the man.‘ ‗Oh, how terrible!‘ Mary said. ‗And what did you do?‘ ‗Well,‘ answered the young man, ‗our television director is a very strict man. If anyone makes a mistake, he dismissed him at once. So what could I do? I just had to pretend that this was part of the story, and eat the man.‘

109

Student Retelling the Story Checklist Does the retelling: Needs Average Good Improvement 1. Have a good beginning telling when and where the situation takes place? 2. Name the person(s) involved? 3. Tell the main points of the situation? 4. Tell some supporting details? 5. Sound organized? 6. Keep the sequence of the situation?

7. Tell what the main problem was in the story?

8. Was the situation solved and how did it come about?

110

Appendix J: AFL Task Two – Word Reading Duration: 90 minutes Aim: Pronunciation practice Summary: Students speak English words

Preparation: The students will read and remember the words on the handout which are selected from Integrated English textbook. During this process, students are not allowed to check textbooks or dictionaries. Word-checking sheets clearly labeled as A, B, C, D will be randomly assigned to four group members. Enough word-checking sheet will be copied and given to students. Laptop installed with Longman pronunciation dictionary will be prepared to help students with this activity.

Procedure: 1. Organize the students into groups of four. If the number of students is not divisible by four, then see the What to do about surplus students section, below. 2. Tell the students that you have prepared a sheet of English words. Ask the students to read the words and remember their Chinese meanings. 3. Give the students time to prepare. 4. After preparation, group members are randomly selected to answer four similar word-checking sheets (A, B, C, and D). During the checking process, one group member will say the corresponding Chinese meaning to the student and the other two members will check his/her pronunciation and stress accuracy with the checking sheets and the help of Longman pronunciation dictionary. 5. Students take turns (A-B-C-D) to finish the activity. 6. Students use Longman pronunciation dictionary to correct their mistakes.

What to do about surplus participants Unfortunately, things get quite awkward if the number of students is not divisible by 4.Suggestions on how to cope with this are: Surplus 1 student: One group of 5 distributes them into two groups. One group includes three students and the other two.

Surplus 2 students: One group of 6 distributes them into two three-student groups.

Surplus 3 students: One group of 7 distributes them into two groups. One group includes three students and the other four.

111

Materials 1.English words handout: argue 争论 leather 皮革 devastated 惊慌的 psychological 心理上的 cement 水泥 cancel 取消 crew (飞机,船等)全体工作人员 geographically 在地理位置上 colonel 上校 humble 地位低下的 infinitely 无限 insult 侮辱 coon 浣熊 ease 悠闲 plunge (使)突然下跌 inadvertently 不经心地,非故意地 jut 突出,伸出 epileptic 患癫痫病的 alternative 替换物 jolly 快活的 contact 接触,联系 intend 想要,打算 legitimate 合理,合法 plead 承认 writhe 扭曲 contact 接触,联系 mingle (使)混合 occasion 场合,时节 genius 天资,天赋,天才 advertise 做广告 glue 胶水 ominously 不吉利地 vain 自负的,虚荣的 breathtaking 惊人的 alternative 替换物 extort 敲诈,勒索 ultimate 最终 miraculously 奇迹般地 beloved 受爱戴的 underneath 在下面 autograph 亲笔签名

112

2. Four word-checking sheets Word-checking list A Chinese meaning Correspondent English Student Pronunciation Stress of the words words Response Problem Problem and 1 自负的,虚荣的 vain [veɪn] 2 场合,时节 occasion [ə'keɪʒn] 3 亲笔签名 autograph ['ɔːtəgrɑːf] 4 水泥 cement [sɪ'ment] 5 无限 infinitely ['ɪnfɪlnɪtlɪ] 6 在下面 underneath [‚ʌndə(r)'nɪːθ] 7 争论 argue ['ɑːgjuː] 8 不吉利地 ominously ['ɑmɪnəsli] 9 敲诈,勒索 extort [ɪk'stɔːt] 10 替换物 alternative [al'ternətɪv]

Word-checking list B Chinese meaning Correspondent English words Student Pronunciation Stress of the words and Phonetics Response Problem Problem 1 惊人的 breathtaking

2 皮革 leather ['leðə(r)] 3 取消 cancel ['kænsl] 4 突出,伸出 jut [dʒʌt]

5 估计过高 overestimate [,əʊvə(r)'estɪmeɪt] 6 飞机,船等) crew [kruː] 全体工作人员 7 侮辱 insult [ɪn'sʌlt] 8 奇迹般地 miraculous [mɪ'rækjələs] 9 受爱戴的 beloved [bɪ'lʌvd] 10 心理上的 psychological [‚saɪkə'lɑdʒɪkl]

113

Word-checking list C Chinese meaning Correspondent English words Student Pronunciation Stress of the words and Phonetics Response Problem Problem 1 地位低下的 humble ['hʌmbl] 2(使)混合 mingle ['mɪŋgl] 3 合理,合法 legitimate [lɪ'dʒɪtɪmət] 4 悠闲 ease [iːz] 5 胶水 glue [gluː] 6 想要,打算 intend [ɪn'tend]

7(使)突然下跌 plunge [plʌndʒ] 8 患癫痫病的 epileptic [‚epɪ'leptɪk] 9 快活的 jolly ['dʒɑlɪ] 10 在地理位置上 geographically [dʒɪə'græfɪkli]

Word-checking list D Chinese meaning Correspondent English Student Pronunciation Stress of the words words Response Problem Problem and Phonetics 1 浣熊 coon [kuːn] 2 不经心地,非故意地 inadvertent [,ɪnəd'vɜrtntli] 3 承认 plead [plɪːd] 4 扭曲 writhe [raɪð] 5 接触,联系 contact ['kɒntækt] 6 惊慌的 devastated ['devəsteɪtid] 7 做广告 advertise ['ædvətaɪz]

8 天资,天赋,天才 genius ['dʒɪːnɪəs] 9 上校 colonel ['kɜːnl] 10 最终 ultimate ['ʌltɪmət]

(In the column of Student Response, R stands for Right, W stands for Wrong and WF stands for Word Forgotten. In the column of Pronunciation Problem and Stress Problem, R stands for Right and M stands for Mistake)

114

Appendix K: AFL Task Three – Role Playing Duration: 90 minutes Aim: Oral fluency practice Summary: Students make conversations in realistic conversational context

Preparation: All groups are assigned three role playing tasks which correspond to their language level. These tasks labeled as A, B, C will be randomly assigned to each group. Evaluation forms will be handed out to every group.

Procedure: 1. Organize the students into groups of four. If the number of students is not divisible by four, then see the What to do about surplus students section, below. 2. Tell the students that you have prepared three different role-playing tasks. Ask the students to discuss and prepare themselves to act in front of the classroom. 3. Give the students time to prepare their role-playing. 4. Every group performs in front of the classroom is being evaluated by other groups. 5. Hand out the written narrative copies of the four stories to the students. 6. Every group watches their performance and discusses the ways for improvement with the evaluation from. What to do about surplus participants Unfortunately, things get quite awkward if the number of students is not divisible by 4. Suggestions on how to cope with this are: Surplus 1 student: One group of 5, they work together as one group. Surplus 2 students: One group of 6 distributes them into two three-student group. Surplus 3 students: One group of 7 distributes them into two groups. One group includes three students and the other four.

Task A: Giving Directions

Caller: call the receivers and ask them out to dinner. Receiver: suggest a restaurant and give directions to the restaurant. (Adopted from http://bogglesworldesl.com/directions.htm)

115

Task B: Teenage Dilemmas Son/Daughter: You‘re going to ask one of your parents if you can go on holiday with your friends. You really want to go but you think your parent is going to say no. Explain why s/he should let you go. Try not to get angry or upset with your parent, but continue trying to persuade him/her. Parent: Your 17-year-old son/daughter is going to ask if s/he can go on holiday without you. You love your child and want the best for him/her, but you don‘t think s/he should go. Listen to what s/he says, but explain your reasons too. (Adopted from http://www.tefllogue.com/in-the-classroom/esl-speaking-activities- two-fun-roleplays.html)

Task C:Restaurant interview role playing

Interviewer: You own a restaurant near the school. You need a server (waiter or waitress). Decide if you will hire the interviewee.

Interviewee: You have never worked in a restaurant before, but you really need a job (and the money)! (Adopted fromhttp:// www.eslgo.com/resources/sa/restaurant_interview.html)

Role-playing Checking list Yes No Does this group speak audibly, clearly?

Does this group make the play understandable, reasonable?

Does this group use Chinese during the role playing (If yes, how many times)? Does this group pause a lot during the role playing (If yes, how many times)?

Please give suggestions for further improvement Suggestions for group:

Suggestions for individuals:

116

Appendix L: Student Interview Protocol

(1) Which AFL task do you like best? Why?

(2) Do you like the idea of assessment without grades? Why?

(3) Do you think that doing more similar activities in your future study

will improve your oral proficiency?

117

Appendix M: Teacher Interview Protocol

(1) Do you think using AFL tasks in the classroom continuously will

help students to improve their oral English?

(2) Will you use AFL in the future in your teaching?

118