USAID QUALITY READING PROJECT: KYRGYZ REPUBLIC

Early Grade Reading Assessment (EGRA) 2016 Midterm Analytical Report

September 30, 2016

A partnership with ® ® American Institutes for Research (AIR ) and Save the Children International

Contract No.: AID-176-C-13-00001-00

USAID QUALITY READING PROJECT: KYRGYZ REPUBLIC

Early Grade Reading Assessment (EGRA) 2016 Midterm Analytic Report

Submitted by American Institutes for Research®

September 30, 2016

This midterm study of the early grade reading assessment in the Kyrgyz Republic is made possible by the support of the American people through the United States Agency for International Development (USAID). The contents are the sole responsibility of American Institutes for Research and Save the Children International and do not necessarily reflect the views of USAID or the United States Government.

TABLE OF CONTENTS

Acronyms ...... iv

Foreword ...... v

Acknowledgments ...... vi

Executive Summary ...... 1

I. Introduction ...... 3 The 2016 Midterm Data Analytic Report ...... 3 The State of Reading Outcomes in the Kyrgyz Republic ...... 3 The USAID Quality Reading Project ...... 4 Purpose and Scope of the EGRA ...... 4 What Does the EGRA Assess?...... 5 The EGRA Subtasks in Brief ...... 6 . Research Methods ...... 7 Data Collection Plan ...... 7 Description of the Sample ...... 7 Within-School Pupil Sampling ...... 8 Weighting the Sample...... 9 Reliability and the EGRA subtasks ...... 9 Study Limitations ...... 11 II. 2016 EGRA Results ...... 13 Comparing 2014 and 2016 (Grade 2) ...... 15 Comparing 2014 and 2016 (Grade 4) ...... 16 Oral Reading Fluency Proficiency in 2014 and 2016...... 17 2016 Subtask Scores ...... 20 Letter Name Recognition Results (Grade 2) ...... 21 Initial Letter Sound (Grade 2) ...... 22 Familiar Word Recognition (Grades 2 and 4) ...... 23 Nonsense Word Recognition (Grades 2 and 4) ...... 24 Oral Reading Fluency with Comprehension (Grades 2 and 4) ...... 25 Dictation (Grades 2 and 4) ...... 28 Listening Comprehension (Grades 2 and 4) ...... 29 Oral Vocabulary (Grades 2 and 4) ...... 30 III. Findings and Discussion ...... 32 Overall results ...... 32 For Ministry and USAID Discussion ...... 33 Focus on Gender ...... 34 For Ministry and USAID DISCUSSION...... 37 School Location ...... 38

EGRA Midterm Data Analytic Report for the Kyrgyz Republic, September 2016 Page i

For Ministry and USAID DISCUSSION...... 40 Appendix 1. Kyrgyz Alphabet ...... 41

Appendix 2. Factor Analysis ...... 42

Appendix 3: The EGRA Subtasks in Full ...... 44

Appendix 4. Subtask Development and Piloting ...... 48

Appendix 5. Administrator Training ...... 49

Appendix 6: Administration and Monitoring ...... 50

Appendix 7. Data Analysis Methods ...... 51

Appendix 8. Equating Subtasks ...... 52

Appendix 9. Relationship Between ORF and Reading Comprehension ...... 54

Appendix 10. 2016 Results by Treatment/Control Status ...... 56

References...... 58

LIST OF TABLES

Table 1: Early Grade Reading Assessment Subtasks in the Kyrgyz Republic ...... 6 Table 2: Cross-Sectional and Longitudinal Design ...... 7 Table 3: EGRA Sample by School and Region, 2016 ...... 8 Table 4: EGRA School Sample by Language, 2016 ...... 8 Table 5: Reliability Estimations, 2016 ...... 10 Table 6: Approaches to Equating and Creating Equivalent Subtasks ...... 11 Table 7: Comparison of 2014 (Baseline) with 2016 (Equivalent) Results: Grade 2 ...... 15 Table 8: Comparison of 2014 (Baseline) and 2016 (Equivalent) Results: Grade 4 ...... 16 Table 9: Percentage Meeting Reading Fluency Standards by Language and Grade ...... 18 Table 10: Percentages Meeting National Reading Standards by Gender, All Languages ...... 19 Table 11: Percentages of Grade 2 Pupils Meeting National Reading Standards by Language, by School Location (All Languages) ...... 19 Table 12: Percentages of Grade 4 Pupils Meeting National Standards by Gender, All Languages ...... 20 Table 13: Percentages of Grade 4 Pupils Meeting National Standard by Language, by School Location, (All Languages) ...... 20 Table 14: Percentages of Pupils Meeting Reading Standards and Reading with 80% Comprehension ...... 20 Table 15: Letter Name Recognition Results by Gender, 2016 ...... 21 Table 16: Letter Name Recognition Results by School Location, 2016 ...... 22 Table 17: Initial Letter Sound Results by Gender, 2016 ...... 22 Table 18: Initial Letter Sound Results by School Location, 2016 ...... 22 Table 19: Familiar Words Results by Gender, 2016 ...... 23 Table 20: Familiar Words Results by School Location, 2016 ...... 23 Table 21: Nonsense Words Results by Gender, 2016 ...... 24

EGRA Midterm Data Analytic Report for the Kyrgyz Republic, September 2016 Page ii

Table 22: Nonsense Words Results by School Location, 2016 ...... 24 Table 23: Oral Reading Fluency by Gender, per Minute, 2016 ...... 25 Table 24: Reading Comprehension by Gender, Percent Correct, 2016 ...... 26 Table 25: Oral Reading Fluency per Minute by Location, 2016 ...... 26 Table 26: Reading Comprehension, Percent Correct by Location, 2016 ...... 26 Table 27: Dictation Percent Correct by Gender, 2016 ...... 28 Table 28: Dictation Percent Correct by Location, 2016...... 28 Table 29: Listening Comprehension Percent Correct by Gender, 2016...... 30 Table 30: Listening Comprehension Percent Correct by School Location, 2016 ...... 30 Table 31: Oral Vocabulary Results by Gender and School Location, 2016 ...... 31 Table 32: Oral Vocabulary Results by Gender and School Location, 2016 ...... 31 Table 33: Number and Percentage of Pupils with Zero Scores, 2016 ...... 32 Table 34: Gender Comparison, 2016...... 35 Table 35: Russian Language Gender Comparison, 2016 ...... 37 Table 36: Kyrgyz Language School Location Comparison, 2016 ...... 39 Table 37: Russian Language School Location Comparison, 2016 ...... 40 Table 38: Grade 2 Factor Analysis Results ...... 42 Table 39: Grade 4 Factor Analysis Results ...... 43

LIST OF FIGURES

Figure 1: Kyrgyz Grade 2: Oral Reading Fluency ...... 16 Figure 2: Relationship between Oral Reading Fluency and Reading Comprehension, 2016 ...... 19 Figure 3: Number Correct, Kyrgyz Grade 2, Reading Comprehension, 2016 ...... 27 Figure 4: Number Correct, Reading Comprehension, 2016 ...... 28 Figure 5: Number Correct, Listening Comprehension, 2016 ...... 29 Figure 6: Oral Reading Fluency Distribution by Gender (Kyrgyz Grade 2), 2016 ...... 36 Figure 7: Oral Reading Fluency Distribution by Gender (Kyrgyz Grade 4), 2016 ...... 36 Figure 8: Kyrgyz Grade 2 Reading Fluency by Reading Comprehension Score, 2016 ...... 54 Figure 9: Kyrgyz Grade 4 Reading Fluency by Reading Comprehension Score, 2016 ...... 54 Figure 10: Russian Grade 2 Reading Fluency by Reading Comprehension Score, 2016 ...... 55 Figure 11: Russian Grade 4 Reading Fluency by Reading Comprehension Score, 2016 ...... 55

EGRA Midterm Data Analytic Report for the Kyrgyz Republic, September 2016 Page iii

ACRONYMS

AIR American Institutes for Research CEATM Center for Educational Assessment and Teaching Methods D Cohen’s d DICT Dictation EGRA Early Grade Reading Assessment ILS Initial Letter Sound IRB Institutional Review Board IST In-Service Teacher Training KAE Kyrgyz Academy of Education LCQ Listening Comprehension Questions lpm Letters per minute LNR Letter Name Recognition M& Monitoring and Evaluation MOES Ministry of Education and Science NTC National Testing Center OECD Organization for Economic Cooperation and Development ORF Oral Reading Fluency OV Oral Vocabulary PISA Program for International Student Assessment QRP Quality Reading Project RTI Research Triangle Institute International SD Standard deviation UNICEF United Nations Children’s Fund USAID United States Agency for International Development wpm Words per minute

EGRA Midterm Data Analytic Report for the Kyrgyz Republic, September 2016 Page iv

FOREWORD

The United States Agency for International Development (USAID) is strategically focused on improving early grade reading around the world. In the Kyrgyz Republic, USAID partners with the Ministry of Education and Science (MOES) and its affiliates to improve reading instructional practices, expand the availability of age-relevant reading materials, and increase the culture of reading across the country. Learning to read is a foundational building block in a child’s development. Reading helps children learn about the world around them, connect with the past, explore the present, and imagine the future. And, as long as a book is in hand, reading can enjoyed alone or cherished alongside a companion. Learning to read is a key predictor of future academic success; thus, it is critical to measure students’ progress over time. Reading assessments, such as the Early Grade Reading Assessment (EGRA), provide insight into how students are performing and can help to identify areas of strength as well as areas in which further reading instruction is needed. These assessments provide valuable data to inform decision making and strategic planning for ministries of education, international donors, and other education stakeholders. This 2016 EGRA midterm report is a result of the strong partnership between USAID/Kyrgyz Republic and the Kyrgyz Republic Ministry of Education and Science. This report presents results from the 2016 Midterm EGRA and highlights progress on several key reading indicators. USAID is pleased to present this EGRA, and we hope that it will serve as a valuable tool for all partners as we work together to improve reading skills for primary students in the Kyrgyz Republic.

—Dr. Amy v. Scott USAID/Central Asia Education Development Officer

EGRA Midterm Data Analytic Report for the Kyrgyz Republic, September 2016 Page v

ACKNOWLEDGMENTS

The successful implementation of the 2016 Midterm EGRA was made possible thanks to the hard work and contributions of many individuals and organizations. The Ministry of Education and Science (MOES) of the Kyrgyz Republic, the National Testing Center (NTC), the Pedagogical University Arabaeva, and the Kyrgyz Academy of Education (KAE) all have provided vital guidance and oversight of the assessment process since the inception of USAID’s Quality Reading Project. The project team wishes to express special thanks to Mr. Artur Bakirov, Director of the NTC, as well as the entire NTC staff for their constant involvement and support. The EGRA was implemented with the technical and logistical assistance of the American Institutes for Research (AIR) and Save the Children International. AIR planned and executed the administration of the EGRA with the support of team members in both its Washington, DC- based headquarters and the USAID Quality Reading Project office in the Kyrgyz Republic. The EGRA would not have been possible without the generosity of the American people through the United States Agency for International Development (USAID). The authors of this report also gratefully acknowledge the following individuals for their leadership and support: Nate Park, Acting Mission Director; Amy Scott, Contracting Officer’s Representative/Central Asia Education Development Officer; Inna Kirulyuk, Alternate Contracting Officer’s Representative; Nora Madrigal, Health and Education Advisor; Pamela Teichman, Health and Education Advisor; and Guljan Tolbaeva, Education Project Management Specialist.

EGRA Midterm Data Analytic Report for the Kyrgyz Republic, September 2016 Page vi

EXECUTIVE SUMMARY The 2016 Early Grade Reading Assessment (EGRA) in the Kyrgyz Republic was administered in Grades 2 and 4 (Kyrgyz and Russian languages) in the Osh, Batken, Naryn, and Issyk-Kul regions. Mean scores on key EGRA subtasks notably increased from 2014 to 2016 for both grades and languages. In 2016, a larger proportion of pupils in both grades and languages are also meeting basic national standards in Oral Reading Fluency than in 2014. Despite the positive trends in both mean scores and proportions meeting reading proficiency levels, however, just under half of the Kyrgyz language groups and just over half of the Russian language groups are meeting basic national reading standards. On several of the foundational subtasks, pupils in both grades are performing well. The ability to read 66.7 letters per minute (Kyrgyz Grade 2) amounts to just above one letter per second. Approximately 68% of these pupils are reading between 46 and 86 letters per minute. This indicates strong alphabetic knowledge in Grade 2. Furthermore, a 94% mean score (Kyrgyz Grade 2) and an 89% mean score (Russian Grade 2) on Initial Letter Sound indicates that phonemic awareness is not posing significant challenges. Overall, there were mean score increases on 11 of the 18 comparable subtasks (cohorts 2 and 3) in Grade 2 between 2014 and 2016. There were mean score declines on only three comparable subtasks. For the Kyrgyz group, six of nine subtasks saw mean score increases, including the important Letter Name Recognition (+7.7 letters per minute) and Oral Reading Fluency (+4.3 words per minute) subtasks.1 The largest increases were in Dictation (+18.1 percentage points) and Familiar Word Recognition (+7.8 words per minute). For Russian Grade 2, seven of nine subtasks saw mean score increases, including the important subtasks of Letter Name Recognition (+11 letters per minute) and Oral Reading Fluency (+12.6 words per minute). The largest increase was in Familiar Word Recognition (+17.3 words per minute). The comparative situation in Grade 4 was similar to that of Grade 2 in some respects. Overall, mean scores increased on eight of 14 subtasks for both language groups. In the Kyrgyz group, scores on five out of seven subtasks increased, with the largest gains occurring in Familiar Word Recognition (+19.3 words per minute), Oral Reading Fluency (+14.5 words per minute), and Reading Comprehension (+13.8 percentage points). Mean score declines were small for Oral Vocabulary (-1.3 percentage points). In the Russian Grade 2 group, mean scores were down for three of seven subtasks. However, large gains were seen in two key subtasks: Familiar Word Recognition (+40.1 words per minute) and Oral Reading Fluency (+29.7 words per minute). Girls outscored boys on 14 of 16 2016 Kyrgyz language subtasks (both grades). Not only did girls outperform boys on most subtasks, these gender gaps appear to be increasing between Grades 2 and 4. The subtasks on which girls had the largest advantage were those subtasks that tapped into decoding skills. Score differentials in favor of girls included Familiar Word

1 Change in mean scores on Nonsense Word Recognition is not reported, neither increases nor decreases. Change in Dictation mean scores are reported as either increases or decreases. However, as noted, while subtask design entailed strict protocols for attaining equivalency, the nature of the task itself makes comparative inferences for Dictation somewhat tenuous.

EGRA Midterm Data Analytic Report for the Kyrgyz Republic, September 2016 Page 1

Recognition, Grade 2 (+.9.3 words per minute); Familiar Word Recognition, Grade 4 (+13.8 words per minute); Oral Reading Fluency, Grade 2 (+9.4 words per minute); Oral Reading Fluency, Grade 4 (+16.9 words per minute); and several others.

Russian medium girls outscored boys numerically on all 16 subtasks (both grades); however, only seven of these score differences were statistically significant, with five of the seven being at Grade 2. Notably, there was not an increase in the gender gap between Grades 2 and 4 for the Russian LOI group.

EGRA Midterm Data Analytic Report for the Kyrgyz Republic, September 2016 Page 2

I. INTRODUCTION

THE 2016 MIDTERM DATA ANALYTIC REPORT This 2016 Midterm Data Analytic Report presents results from the April 2016 midterm Early Grade Reading Assessment (EGRA) in the Kyrgyz Republic. Following a brief summary of the state of reading in the republic, the report contains an overview of the EGRA study design; a full description of the purpose and content of the EGRA subtasks; a detailed review of all research, sampling, and data analysis methods employed in the study; information about how the EGRA tasks were developed, piloted, administered, and analyzed; EGRA results for both languages in Grades 2 and 4; and a discussion of the implications of the EGRA findings. Significantly, the report also presents 2016 results from cohorts 2 and 3 in comparison with results from the 2014 EGRA baseline of those same cohorts.

THE STATE OF READING OUTCOMES IN THE KYRGYZ REPUBLIC Basic literacy and the ability to read with comprehension is the foundation for all learning. Research indicates that it is essential that children acquire solid reading skills at the earliest stages of their cognitive development. Falling behind can lead to gaps in learning that become increasingly difficult to overcome in later years (Patrinos & Velez, 2009). Despite high literacy attainment in the Soviet period, improving primary reading outcomes has become a national priority in the Kyrgyz Republic since 2012 (AIR, 2014). Considerable evidence indicates a steady decline in reading outcomes at multiple grade levels in the Kyrgyz Republic over the past decade. According to a study by the United Nations Children’s Fund, from 2001 to 2005, only one-half of all pupils met basic reading standards (UNICEF, 2005). The results of nationally representative assessments at the fourth- and eighth-grade levels in 2007, 2009, and 2014 revealed that a majority of pupils at both levels were performing poorly in reading (Center for Educational Assessment and Teaching Methods [CEATM], 2010; CEATM, 2014). The results of the 2006 and 2009 Program for International Pupil Assessment (PISA) also indicated that 83% of 15-year-olds were not achieving the minimum PISA reading standards (CEATM, 2009a; OECD, 2010). In response to this state of affairs, the United States Agency for International Development (USAID) and the government of the Kyrgyz Republic have intensified their collaboration with the aim of improving reading achievement at the primary levels. With USAID’s support, the first EGRA was administered in 2012 to a nationally representative sample of more than 4,000 pupils.2 The results led to a review of teaching practices in the republic as almost half of the pupils assessed were unable to meet national standards in oral reading fluency, an important indicator of early reading skills (Tvaruzkova & Shamatov, 2012). The USAID Quality Reading Project was initiated in 2013.

2 RTI International developed the EGRA for use in many countries around the world. This first EGRA study in the Kyrgyz Republic was conducted by JBS International in 2012. See Gove (2009) for more information about the purpose and history of EGRA and how it has been used in various contexts.

EGRA Midterm Data Analytic Report for the Kyrgyz Republic, September 2016 Page 3

THE USAID QUALITY READING PROJECT The purpose of the USAID Quality Reading Project is to build upon earlier international and domestic initiatives to improve reading achievement in the republic.3 With support from the project, the Ministry of Education and Science (MOES) has taken steps to improve teacher preparation; to develop and improve standards, materials, and resources; and to promote public awareness of the importance of early reading. For example, in 2013 and 2014, the MOES developed The Minimum Requirements for Reading in Elementary Schools, an effort to develop rigorous reading content standards for the primary school levels. The final iteration of this work was reviewed by a broad constituency of stakeholders during roundtables at the Kyrgyz Academy of Education (KAE). In-service teacher training (IST) materials and reading standards for Grades 1 through 4 were approved at the KAE Academic Council meeting in May 2014 and were recently amended and approved in August 2016. The project is also responsible for the implementation of the EGRA in the republic. More recent work has focused on developing the capacity of the National Testing Center (NTC) to sustain EGRAs in the coming years.

PURPOSE AND SCOPE OF THE EGRA The EGRA in the Kyrgyz Republic collects data on the attainment of pre-reading and early reading skills in Grades 2 and 4. EGRA results can best be utilized to direct attention and resources toward areas for instructional improvement. Although the EGRA does not serve as a complete assessment of reading comprehension, EGRA results can focus attention on the pedagogical obstacles to reading development and can stimulate new policy and classroom- level approaches to improving reading outcomes. Through identification of areas for improvement and monitoring of progress over time, evidence-based instructional approaches can be enhanced and teacher education programs improved (Dubeck & Gove, 2015). The EGRA in the Kyrgyz Republic does not serve as a skills diagnostic tool for individual pupils, nor does it serve as a high-stakes examination for individual teachers and schools. Significantly, the design of the EGRA in the Kyrgyz Republic enables an evaluation of the effectiveness of a teacher professional development program implemented by the USAID Quality Reading Project. By comparing EGRA results of pupils whose teachers received support (a treatment group) with the results of pupils whose teachers did not (a control group), the impact of the program on student reading outcomes can be established. The results presented in Appendix 10 of this report are aggregated by treatment and control group. A full analysis of the difference in reading skills growth rates of the control and treatment populations will be examined in the forthcoming 2016 project report, EGRA Impact Study in the Kyrgyz Republic. In addition to collecting data on pre-reading and early reading skills, the EGRA gathers contextual and pupil background information through surveys of the pupils assessed as well as some of their teachers and principals. These data can be used to determine the nature of the relationships between reading skills and selected school, home, and environmental factors.

3 See AIR (2015) for more on the USAID Quality Reading Project in the Kyrgyz Republic.

EGRA Midterm Data Analytic Report for the Kyrgyz Republic, September 2016 Page 4

The impact study will present survey results and highlight those factors that are associated with reading achievement outcomes in the Kyrgyz Republic.4

WHAT DOES THE EGRA ASSESS? The EGRA in the Kyrgyz Republic is a battery of pre and early reading skills assessments (subtasks) designed for pupils in Grades 2 and 4 in both the Kyrgyz and Russian languages. These skills are the initial building blocks of reading that are essential for the development of skills in reading fluency and comprehension. Subtasks assess skills such as alphabetic knowledge, phonological awareness, decoding and reading fluency, listening comprehension, and vocabulary knowledge (Vaughn & Linan-Thompson, 2004). For an in-depth discussion of these core elements of reading with references to further reading, see the USAID EGRA Toolkit, Second Edition (RTI International, 2015). In educational assessment, certain abilities, such as math skill, logical reasoning, or reading ability, are posited to be latent constructs. The existence of these constructs must be demonstrated through the accumulation of behavioral or performance evidence that supports that claim. Data collected from the EGRA administrations in 2014, 2015, and 2016 provided researchers at AIR the opportunity to conduct empirical analyses (factor analyses) of the underlying data structure of the EGRA subtasks. The primary purpose of factor analysis is to determine the number of distinct dimensions or constructs (also referred to as factors) that theoretically underlie a domain of knowledge, trait, or ability measured by an assessment or survey instrument (Kim & Mueller, 1978). The results of factor analyses of the full 2014, 2015, and 2016 complement of EGRA subtasks (both languages) indicated that the reading subtasks were tapping into two latent constructs, decoding and linguistic comprehension. Hoover & Gough (1990) define decoding as “efficient word recognition; the ability to rapidly derive a representation from printed input that allows access to the appropriate entry in the mental lexicon, and thus, the retrieval at the word level” (p. 130). They define linguistic comprehension as “the ability to take the lexical information (i.e., semantic information at the word level) and derive sentence and discourse interpretations” (p. 131). Accordingly, a measure of linguistic comprehension must evaluate the ability to understand spoken language. According to the Simple View of Reading, although these two constructs are distinct, both are necessary in combination for successful reading with comprehension (Hoover & Gough, 1990). In regression analyses carried out for each grade and language, the first and second most powerful predictors of reading comprehension were indeed both a linguistic comprehension subtask and a decoding subtask, each explaining a considerable portion of variance. A fuller explication of the factor analysis method employed for the EGRA data analysis, as well as the results of the analyses, is presented in Appendix 2.

4 The Institutional Review Board (IRB) reviewed all EGRA materials and instruments, and participants were offered the opportunity to refuse participation or were allowed to discontinue participation at any time.

EGRA Midterm Data Analytic Report for the Kyrgyz Republic, September 2016 Page 5

THE EGRA SUBTASKS IN BRIEF The 2016 midterm consisted of nine total subtasks for Grade 2 and seven total subtasks for Grade 4.5 All EGRA subtasks were administered orally in one-on-one scenarios with a test administrator and a single pupil over a 25-minute period. In Grade 2, four of the nine tasks were timed; in Grade 4, three of the seven subtasks were timed (with up to 120 total seconds allowed for each). Given that pace in reading can vary over time (e.g., it may be faster or slower at the task outset), use of this metric enabled administrators to record multiple data points: time at 1 minute, time at 2 minutes, and a rate per minute. In addition, use of the metric enabled administrators to determine the difference between numbers read in the first and second minutes. The timed subtasks were Letter Name Recognition, Familiar Word Recognition, Nonsense Words, and Oral Reading Fluency. The number of letters, words, or pseudowords successfully read was calculated at a per-minute rate for reporting purposes. The score calculations required the administrator to determine how many letters were attempted, how many were read correctly, and in what amount of time. Table 1 presents a summary of the EGRA subtasks for both the Kyrgyz and Russian languages and describes which skills were assessed by each subtask. A detailed rationale and description of each subtask is provided in Appendix 3.

Table 1: Early Grade Reading Assessment Subtasks in the Kyrgyz Republic 2016 Midterm Subtasks Subtask (Grade) Skills Pupils were asked to: 1. Letter Name Correctly identify letters of the alphabet in lower Alphabetic knowledge Recognition (2) and upper cases (TIMED) Identify the first phoneme from 10 commonly used words by isolating and sounding out just the first 2. Initial Letter Sound (2) Phonemic awareness sound (phoneme) from a whole word read by the administrator 3. Familiar Word Word recognition and Read aloud 40 familiar,6 grade-appropriate words Recognition (2, 4) decoding (TIMED) 4. Nonsense Word Letter–sound Read aloud 40 grade-appropriate pseudowords Recognition (2, 4) correspondence, decoding (TIMED) Identify 10 objects from a set of pictures after listening to a list of objects read by the 5. Oral Vocabulary (2, 4) Receptive oral vocabulary administrator (based on the Peabody Picture Vocabulary Test format) 6. Oral Reading Fluency Reading with fluency, Demonstrate oral reading of grade-appropriate (2, 4) accuracy, and speed passage (TIMED) Demonstrate comprehension of passage by 7. Reading Reading comprehension of answering four or five oral questions, including at Comprehension (2, 4) text least one inferential question Oral language 8. Listening Demonstrate listening comprehension of grade- comprehension, vocabulary Comprehension (2, 4) appropriate text by answering four questions knowledge Oral language Listen to a sentence and reproduce it correctly in 9. Dictation (2, 4) comprehension, decoding, written form and writing skills

5 In 2014, the subtask Unfamiliar Words required pupils to read actual words judged to be above the pupils’ grade level. In 2015 and 2016, this task was changed to Nonsense Words, which is an assessment of the decoding of pseudowords. 6 “Familiar” words were identified through an EGRA protocol that required word count analyses on grade-level textbooks to derive counts of the most commonly encountered words.

EGRA Midterm Data Analytic Report for the Kyrgyz Republic, September 2016 Page 6

I. RESEARCH METHODS DATA COLLECTION PLAN In April 2014, the USAID Quality Reading Project administered EGRA subtasks at the pre- intervention stage to establish a baseline. The baseline EGRA was administered nationwide in Grades 1, 2, and 4. In order to monitor progress, the EGRA was administered in the spring of 2015 (Cohort 1 midterm) and again in the spring of 2016 (Cohort 2 and 3 midterm). The 2015 EGRA midterm was administered in Grades 2 and 4 in the Bishkek, Chui, Jalal-abad, and Talas regions of the Kyrgyz Republic (Cohort 1). The 2016 EGRA midterm was carried out in the Osh, Batken, Naryn, and Issyk-Kul regions (Cohorts 2 and 3). As will be highlighted in later sections, the comparative results between 2014 and 2016 presented in this report include only results from cohorts 2 and 3. According to the study design, EGRA data are collected across the years in two ways. Data are collected from a different cohort of pupils at the same schools in the same grades in different years (cross-sectional design). Data are also collected from the same pupils in different years in 2014, 2105, and 2017 (longitudinal design). This report presents the results of the cross-sectional data collected in 2016 for Cohorts 2 and 3. Table 2 shows the assessment data collection plan across the years of the project.

Table 2: Cross-Sectional and Longitudinal Design Cohort 2014 2015 2016 2017 Cross-Sectional Design G2 G2 G2 1 G4 G4 G4 G2 G2 G2 2 and 3 G4 G4 G4 Longitudinal Design G1 1 G2 G4 Source: Baseline Analytic Report (2014)

DESCRIPTION OF THE SAMPLE The 2016 EGRA midterm was administered in 71 randomly selected schools drawn from four regions: Batken, Issyk-Kul, Naryn, and Osh. These schools, selected in 2014, were the data collection sites in 2014 and 2016.7 These four regions comprise Cohorts 2 and 3 of the intervention and serve as the location of project activities rolled out in 2015. Schools were sampled proportionally to the school population in these regions. These 71 schools included 35 schools receiving the USAID Quality Reading Project intervention and 36 control schools

7 In 2016, one control school was added in Issyk-Kul. This change was made to increase the balance between treatment and control for that region for the purposes of the impact study. Note also that the 2014 EGRA was conducted nationwide, while the midterm was not.

EGRA Midterm Data Analytic Report for the Kyrgyz Republic, September 2016 Page 7

not receiving the intervention. The data presented in this report are for all 71 schools. Characteristics of the 2016 schools are summarized in Tables 3 and 4. Schools were chosen using a stratified sample that allowed for adequate representation from the region in each of the necessary characteristics. First, the number of schools needed in each region, in proportion to the total number of schools (with each school having an equal probability of being selected), was determined. Then the sampling frame was divided into stratum according to region. The necessary number of schools was selected within each of these categories for each region.

Table 3: EGRA Sample by School and Region, 2016 Number of Number of Number of Total Number Region Schools Grade 2 Students Grade 4 Students of Students Batken 11 217 217 434 Issyk-Kul 15 298 297 595 Naryn 10 179 185 364 Osh 35 695 699 1394 Total 71 1,389 1,398 2,787

The project administered EGRA to 2,787 pupils in 2016. Unlike the 2015 Cohort 1 midterm described in the 2015 EGRA report, this sample does not include any students from the longitudinal cohort. The longitudinal sample is only in Cohort 1 regions, which allows us to capture their progress throughout the full length of the project (Grade 1 baseline in 2014, Grade 2 after 1 year of intervention in 2015, and Grade 4 after 3 years of intervention in 2017). Table 4 below presents the EGRA sample broken down by language of instruction.

Table 4: EGRA School Sample by Language, 2016 Schools Schools Region Tested in Tested in Mixed Language-of-Instruction Schools Kyrgyz Russian The languages of instruction in two schools are Kyrgyz, Uzbek, Batken 10 1 and Russian. One school was tested in Kyrgyz; the other, in Russian. Kyrgyz and Russian are the languages of instruction in 13 Issyk-Kul 8 7 schools. Six schools were tested in Kyrgyz; seven, in Russian. Kyrgyz and Russian are the languages of instruction in one Naryn 10 0 school, which was tested in Kyrgyz. In two schools, Kyrgyz and Russian are the languages of instruction; both schools were tested in Kyrgyz. In two other Osh 28 7 schools, Kyrgyz, Uzbek, and Russian are the languages of instruction; both schools were tested in Russian. Total 56 15

WITHIN-SCHOOL PUPIL SAMPLING To randomly select pupils, administrators first calculated the gender ratio of male to female pupils in each grade to be tested. The 20 pupils per grade were then divided between boys and girls according to that ratio, with 10 boys and 10 girls generally chosen per grade. Administrators then calculated an interval by dividing the number of pupils per grade by the

EGRA Midterm Data Analytic Report for the Kyrgyz Republic, September 2016 Page 8

number needed for EGRA, separately by gender. The interval was used to randomly select pupils from the pupil roster list. If a selected pupil was not in school that day, or did not consent to be tested, the next pupil on the roster was selected. To ensure an adequate sample size for subgroup analysis by language, at least 10 Russian schools, or mixed-language schools in which Russian was a language of instruction, were built into the randomized school sample. If the school was designated as a Russian school for EGRA testing, pupils were given the Russian EGRA. In the case of mixed-language schools, the school was randomly designated (using a random number generated in Excel) as a Russian school for data collection purposes until the 10 Russian-school quota was met.

WEIGHTING THE SAMPLE Sample weights were calculated and applied to the reported means to adjust for three factors: (1) oversampling of control schools compared to the population; (2) oversampling of Russian schools compared to the population; and, (3) the size of the school. This was done because the same number of pupils was sampled per school per grade, thus resulting in oversampling of pupils from small schools compared to the population. For the purposes of weighting, mixed-language schools were treated as Kyrgyz language schools unless they were part of the sample and treated as a Russian language school. The project team made this choice because many mixed-language schools tend to have a Russian language track. All results in this report are presented using the sampling weights.

RELIABILITY AND THE EGRA SUBTASKS Before the 2016 results are presented, a note on reliability estimation in EGRA studies is in order. It is important to determine the level of reliability, or internal consistency, of any assessment. Internal consistency refers to the extent to which the items in the test are consistently measuring the same construct. As the reliability coefficient increases, the portion of a score that can be attributed to error will decrease; thus, higher values (generally above .80) are desirable, although a high coefficient does not always indicate quality. The EGRA poses two challenges to employing standard reliability estimations such as Cronbach’s alpha. First, several of the subtasks (e.g., Listening Comprehension, Reading Comprehension) have a very low number of test items, making reliability estimations tenuous. Second, standard estimation approaches are not applicable when the subtasks are timed. One approach for estimating the reliability of timed tasks is to estimate the coefficients by entering total subtask scores (rather than item-level data) into the estimation formula for the subtasks that are measuring the same constructs (RTI, 2015). As we have completed the factor analyses, we can confidently assert that the subtasks for Letter Name Recognition, Familiar Word Recognition, Nonsense Word Recognition, and Oral Reading Fluency all load clearly on decoding; thus, their scores can be composited for an accurate reliability estimation. Coefficients on all four EGRAs for these timed subtasks were reasonably high: all above .80. The reliability coefficients of the Reading Comprehension questions were estimated using Cronbach’s alpha with each item entered into the model. These coefficients were normal

EGRA Midterm Data Analytic Report for the Kyrgyz Republic, September 2016 Page 9

considering the low number of items on the subtask. Listening Comprehension and Oral Vocabulary (non-timed tasks that allow traditional estimation approaches) were also estimated using Cronbach’s alpha, but they were composited. Cronbach’s alpha was also used to estimate reliability for the Dictation and Initial Letter Sound subtasks. The results of the reliability analyses are presented in Table 5, below.

Table 5: Reliability Estimations, 2016 Subtasks Kyrgyz 2 Kyrgyz 4 Russian 2 Russian 4 Letter Name Recognition, Familiar Word Recognition, Nonsense Word Recognition, Oral .90 — .83 — Reading Fluency (decoding) Familiar Word Recognition, Nonsense Word — .85 — .85 Recognition, Oral Reading Fluency (decoding) Reading Comprehension (Cronbach’s alpha) .56 .68 .71 .73 Listening Comprehension, Oral Vocabulary .56 .72 .79 .57 (linguistic comprehension) (Cronbach’s alpha) Initial Letter Sound (Cronbach’s alpha) .59 — .87 — Dictation (Cronbach’s alpha) .80 .83 .81 .77

COMPARISON OF EGRA RESULTS ACROSS YEARS EGRA design must enable the comparison of scores across the administration years highlighted above. In order to draw valid inferences about observed changes in subtask scores, subtasks must be equivalent in meaning and comparable in difficulty across the years.8 This challenge can be addressed through the employment of strictly parallel test development protocols or test form equating which relies on empirical evidence (in the form of statistical data) to ensure equivalence. There were several methods employed in this study to ensure equivalent or equated test forms:9 1. The subtasks Letter Name Recognition, Initial Letter sound, and Familiar Words Recognition, and Reading Comprehension, were judged equivalent based on the consistent application of strict parallel subtask development principles (see Table 6 below). 2. The subtasks Oral Vocabulary, and Listening Comprehension were equated using a design that relied on the use of common items in subtask forms, whereas the subtask Oral Reading Fluency was equated using the design based on an external anchor test (Algina & Crocker, 2006). In the common items approach, each grade and language in the baseline, midterm and end line, contained a set of core items unique to that form as a well as a set of common items that appeared at the exact same locations in all forms. These common items composed from 20 to

8 In theory, using the exact same test forms year after year enables valid comparison. However, this is not best practice for some subtasks as communities inevitably become familiar with test forms once they have been administered. Therefore, new forms that can be equated to the baseline iteration need to be developed, especially for key tasks such as Oral Reading Fluency. 9 A full explication of how the various subtasks were equated in 2016 can be found in Appendix 8.

EGRA Midterm Data Analytic Report for the Kyrgyz Republic, September 2016 Page 10

37 percent of the total items on the subtasks. See Appendix 8 for more information on the equating of Oral Vocabulary and Listening Comprehension using a common items approach. The fact that some subtasks were timed poses challenges to test form equating using common items approaches. Common items approaches are appropriate when examining “total correct score” without the transformation of a score count to a per minute rate. See Appendix 8 for a detailed explanation of how was the timed subtask Oral Reading Fluency equated using Familiar Words Recognition as an external anchor test. The USAID Quality Reading Project study design allows for both horizontal and vertical equating as the common items were embedded vertically across the grades as well as horizontally within the same grade across years.10 Horizontal equating refers to the equating of tests administered to groups with similar abilities. For example, two subtasks administered to pupils at the same grade level in two consecutive assessment years. This type of equating was done for the EGRA results presented in this report. We were primarily interested is seeing change at each grade level over the years (e.g. Grade 2 in 2014 and Grade 2 in 2016). Table 6 below presents the approaches used to ensure equivalency or equated scores across each of 2014 and 2016 subtasks.

Table 6: Approaches to Equating and Creating Equivalent Subtasks Subtask Method 1. Letter Name Recognition (2) Equivalent subtask design approach 2. Initial Letter Sound (2) Equivalent subtask design approach 3. Familiar Word Recognition (2,4) Equivalent subtask design approach 4. Nonsense Word Recognition (2,4) Different subtasks in 2014 and 2016 (not equivalent) 5. Oral Vocabulary (2,4) Item response theory, internal common items 6. Oral Reading Fluency (2,4) Classical test theory, external common items 7. Reading Comprehension (2,4) Equivalent subtask design approach 8. Listening Comprehension (2,4) Item response theory, external common items 9. Dictation (2,4) Equivalent subtask design approach

STUDY LIMITATIONS The study has a number of limitations that the reader should keep in mind when interpreting the EGRA results. First, comparisons across the Kyrgyz and Russian language groups are not an appropriate way to use the results. Learning to read in the Kyrgyz and Russian languages is two different processes, and differences in rates of acquisition, knowledge accumulation, and effective use are functions of the language properties in question as well as the sociocultural context of the learner. The EGRA results by language should be considered independently of each other. Second, note also that the 2016 data were collected from four regions of the country while the 2014 data were collected from all regions. Therefore, the comparative data presented here compares only cohorts 2 and 3 in 2014 to cohorts 2 and 3 in 2016.

10 Vertical equating can be employed when tests are administered to groups of students with different abilities such as students in different grades. It is frequently used to identify growth over time in particular cohorts and can be used to look at pupil growth in the longitudinal cohorts.

EGRA Midterm Data Analytic Report for the Kyrgyz Republic, September 2016 Page 11

Third, several subtasks contain a very small number of test items (e.g., Reading Comprehension, Listening Comprehension) and are, therefore, amenable to large changes in percent-correct scores attributable to a single item: A single item on a short test becomes “high stakes,” much more so than on reading assessments that may have 40 to 60 test items per assessment. The reader should not over interpret the meaning of the results on these subtasks nor make high-stakes decisions based on these subtasks. Another caution in interpretation is related to the number of items on the Familiar Word Recognition subtasks across Grades 2 and 4. Although Familiar Word Recognition and Oral Reading Fluency both load on decoding skills and are highly correlated (ranging from .60 to .75 across both grades and languages), comparison of results across these subtasks should be made with caution, especially at Grade 4. Note that at both Grades 2 and 4, the total number of words on the Familiar Word Recognition subtask is 40. At Grade 2, the total number of words on the Oral Reading Fluency subtask is also approximately 40. At Grade 4, however, that number doubles to approximately 80 words on the Oral Reading Fluency subtasks. This makes these two subtasks much less comparable (in terms of difficulty) at Grade 4 than at Grade 2. Thus, interpretation and comparison of the results of these two subtasks should be made with caution. For the exact number of words on the Oral Reading Fluency subtask at both grades and in both languages, see Appendix 3. A more specific qualification concerns the Russian Grade 4 Familiar Word Recognition subtask. The large score increases noted on this subtask in 2016 were investigated post-administration. Researchers discovered six items that consisted of shorter words (from 40 total items) on the 2016 subtask than on the 2014 subtask, making 2016 a plausibly easier subtask. This could account for some portion of the large score increases from 2014 to 2016; however, a slightly easier test form would not fully account for these score increases. When the Kyrgyz Grade 4 Familiar Word Recognition subtask was also investigated, it was discovered that the 2014 and 2016 subtasks were parallel in terms of word size, per the subtask development protocols. Yet, there were also large gains on the Kyrgyz Grade 4 Familiar Word Recognition subtask from 2014 to 2016. There were also large gains on the Grade 2 subtasks from 2014 to 2016 and these test forms were also parallel. Finally, it is important to note that increases or decreases in subtask scores or proportions of students who demonstrated gains in proficiency levels cannot aid in directly answering the question of whether or not the USAID Quality Reading Project intervention has contributed to improved pre-reading outcomes in the Kyrgyz Republic.

EGRA Midterm Data Analytic Report for the Kyrgyz Republic, September 2016 Page 12

II. 2016 EGRA RESULTS In the results section(s), we first examine overall subtask performance of comparable cohorts between the baseline (2014) and the midterm (2016). The 2016 mean scores presented in Tables 7 and 8 are equivalent subtask scores, except for the two subtasks shaded in light gray, Nonsense Words and Dictation.11 Data from cohort 1 (2015 midterm assessment) is also included in the tables for informational purposes. The EGRA subtasks for both midterms (2015, 2016) employed the same assessment instruments and as can be seen in the tables, the results for the two midterm samples were in fact quite similar. On Comparison across Assessment Years It is important to remember that the presentation of these three years of data should not be understood “longitudinally.” The scores from the three indicated years does not represent the same samples assessed across all three years; the 2015 and 2016 assessment years both represent midterm data collected in two different years from different cohorts in the republic. Therefore, in the analysis that follows, there will be no comparisons between the 2015 and 2016 results as the trend of interest is between the 2014 baseline and the comparable 2016 midterm for the exact same cohorts (cohorts 2 and 3). Recall that the 2014 baseline assessment included cohorts 1, 2, and 3. In order to determine whether or not the 2016 midterm results, which included only cohorts 2 and 3, could be compared to full 2014 baseline, researchers at AIR ran several statistical tests to determine if there were statistically significant differences between the 2015 and 2016 midterm sample groups. In order to test the assumption that the two samples were similar, we tested the null hypothesis that the schools representing the different cohorts would have comparable, not statistically significant mean score differences in the baseline data. While the results for many key subtasks such as Oral Reading Fluency indicated that the two midterm groups were similar in characteristics (no statistically significant differences between the two cohort groups at 2014 baseline), there were several subtasks where the differences between midterm cohort 1 (2015) and midterm cohorts 2 and 3 (2016) were statistically significant. Therefore, it was decided that in order to be 100% certain of our comparative inferences across years, we employed an “apples to apples” approach in which comparisons between 2014 and 2016 were only made for the exact same cohorts: Comparison of cohorts 2 and 3 in 2014, with cohorts 2 and 3 in 2016. These are the results presented in Tables 7 and 8 of this report.

11 The term equivalent subtask encompasses both equated scores as well as scores determined to be equivalent through the application of strict, parallel subtask development protocols. Nonsense Words (2016) was actually a different subtask than Unfamiliar Words (2014), which assessed actual word knowledge instead of pseudo word decoding. Dictation development in both years used parallel test development protocols but the nature of the subtask makes it difficult to compare across iterations with certainty.

EGRA Midterm Data Analytic Report for the Kyrgyz Republic, September 2016 Page 13

On Control and Treatment Schools The tables and figures included throughout this report include the results from all treatment and control schools in the 2016 sample, unless otherwise noted. However, mean scores for the 2016 subtasks have been broken down by control and treatment group and can be found in Appendix 10 for both languages and grades. The results are presented in the appendix for descriptive purposes and no causal attribution should be made by comparing the differences in means between the treatment and control groups. While it is possible to assess for statistically significant differences between the treatment and control group mean scores through an independent samples t-test, in the forthcoming impact report we use a difference in differences (DiD) analytical approach that allows us to account for all the key features of the data simultaneously.

For example, the DiD approach allows us to adjust the follow-up difference between the treatment and control group by any existing differences in achievement between these two groups at baseline, which addresses any concerns of the impact estimates being driven by differences in the student composition of the groups and not by the program itself. Also, the DiD strategy allows for the inclusion of additional explanatory variables, which contribute to the statistical precision of the program’s causal estimate. The approach also allows us to take into account the sampling design and the fact that different observations had different probabilities of selection (i.e., use of sampling weights); and that students in the sample were not sampled independently, but nested within schools, so that their academic outcomes are correlated with peers from the same school (i.e., clustering). In sum, it is advisable to discuss the comparisons between the treatment and control groups in full detail in the impact report that will be released in December of 2016.

How to Interpret the Tables Tables seven and eight include the name of the subtask and how it was scored (rate per minute, percent correct), total mean scores by language and grade, and the standard deviation for each subtask, presented in parentheses directly under the mean score. The standard deviation indicates the degree to which the mean scores are spread across the score distribution. For example, a smaller standard deviation indicates clustering or “bunching” of pupils’ scores around the mean, whereas a larger standard deviation indicates that scores are spread out farther from the mean, both above and below.

Table 9 displays the percentages of pupils meeting the national reading fluency standards as set by the government of the Kyrgyz Republic. The values indicate the percentages attaining the “basic” or above level (explained below). Note that these percentages were estimated based on the comparable 2014 and 2016 subtask scores, cohorts 2 and 3. Starting with Table 15, however, the scores presented for include un-equated 2016 scores, and thus they should not be compared to the 2014 scores as were the scores in Tables 7 and 8. This means that scores for key tasks such as Oral Reading Fluency will appear to have different values in the different tables, depending on which table is being examined. Again, the “equivalent scores”

EGRA Midterm Data Analytic Report for the Kyrgyz Republic, September 2016 Page 14

in Tables 7 and 8 are 2016 scores expressed in 2014 terms; or, scores that have been statistically adjusted to take into account differences in difficulty in forms across the years.

COMPARING 2014 AND 2016 (GRADE 2) Overall, there were mean score increases on 11 of the 18 comparable subtasks in Grade 2 for cohorts 2 and 3 between 2014 and 2016. There were mean score declines on only three comparable subtasks. As can be seen in Table 7, for Kyrgyz Grade 2, six of nine subtasks saw mean score increases, including the important Letter Name Recognition (+7.7 letters per minute) and Oral Reading Fluency (+4.3 words per minute) subtasks.12 The largest increases were in Dictation (+18.1 percentage points) and Familiar Word Recognition (+7.8 words per minute). For Kyrgyz Grade 2, the only mean score declines from 2014 were in Reading Comprehension (-.2 percentage points) and Listening Comprehension (-.8 percentage points), very small declines. For Russian Grade 2, seven of nine subtasks saw mean score increases, including the important subtasks of Letter Name Recognition (+11 letters per minute) and Oral Reading Fluency (+12.6 words per minute). The largest increase was in Familiar Word Recognition (+17.3 words per minute).

Table 7: Comparison of 2014 (Baseline) with 2016 (Equivalent) Results: Grade 2 Kyrgyz Language Russian Language Baseline Midterms Baseline Midterms 2014 2015 2016 2014 2015 2016 Cohorts 2, 3 Cohort 1 Cohorts 2, 3 Cohorts 2, 3 Cohort 1 Cohorts 2,3 Subtask Letter Name Recognition 59.0 69.1 66.7 54.7 56.6 65.7 letters per minute (20.0)* (17.8) (19.4) (20.4) (19.3) (28.4) Initial Letter Sound 87.9% 93.9% 94.0% 87.3% 94.4% 89.0% percent correct (13.6) (10.9) (11.0) (15.8) (14.0) (20.9) Familiar Words 50.4 56.7 58.2 47.4 65.2 64.7 words per minute (27.1) (25.3) (29.2) (21.9) (23.2) (30.1) Nonsense Words 24.2 26.0 27.8 26.0 30.6 30.1 words per minute (14.5) (11.1) (13.9) (11.7) (11.2) (13.1) Oral Vocabulary 91.6% 91.2% 92.0% 88.1% 90.9% 84.0% (11.0) (10.9) (9.9) (15.3) (9.6) (16.5) percent correct Oral Reading Fluency 32.6 36.3 36.9 39.0 52.4 51.6 words per min (19.0) (18.1) (19.4) (20.7) (20.4) (24.5) Reading Comprehension 52.4% 56.5% 52.2% 33.3% 54.7% 37.2% percent correct (32.9) (32.3) (31.1) (41.1) (29.8) (30.0) Listening Comprehension 73.7% 74.0% 72.9% 61.1% 79.3% 67.3% percent correct (25.4) (26.0) (25.0) (34.6) (28.8) (33.6) Dictation 51.4% 50.24% 69.5% 68.3% 79.4% 75.8% percent correct (25.1) (17.92) (26.1) (26.0) (18.3) (24.3) Sample Sizes: Sample Sizes:

2016 K2 = 1089 2016 R2 = 300

12 Change in mean scores on Nonsense Word Recognition is indicated as “a change”, neither increases nor decreases. Change in Dictation mean scores are reported as either increases or decreases. However, as noted, while subtask design entailed strict protocols for attaining equivalency, the nature of the task itself makes comparative inferences for Dictation somewhat tenuous.

EGRA Midterm Data Analytic Report for the Kyrgyz Republic, September 2016 Page 15

2015 K2 = 658 2015 R2 = 324 2014 K2 =984 2014 R2 =272 Cohort 1: Bishkek, Chui, Jalal-Abad, and Talas regions Cohorts 2 and 3: Batken, Naryn, Issyk-Kul, and Osh regions Subtasks in gray were not were strictly comparable across 2014 and 2016. *Standard deviations appears in parentheses ( ).

Figure 1 below presents the comparison of Oral Reading Fluency across the years for Kyrgyz Grade 2. Note the similar increase in mean scores across cohort 1 and cohorts 2 and 3.

Figure 1: Kyrgyz Grade 2: Oral Reading Fluency

Kyrgz Grade 2: Oral Reading Fluency 38 36.9 37 36.3 36 35 34 32.6 33 31.6 32 31 30 words per minute per words 29 28 2014 Baseline: 2015 Midterm: 2014 Baseline: 2016 Midterm: Cohort 1 Cohort 1 Cohorts 2 & 3 Cohorts 2 & 3

Cohor 1: Bishkek, Chui, Jalal-Abad, and Talas regions Cohorts 2 and 3: Batken, Naryn, Issyk-Kul, and Osh regions COMPARING 2014 AND 2016 (GRADE 4) The comparative situation in Grade 4 was similar to that of Grade 2 in some respects. Overall, mean scores increased on eight of 14 subtasks for both language groups. In the Kyrgyz group, scores on five out of seven subtasks increased, with the largest gains occurring in Familiar Word Recognition (+19.3 words per minute), Oral Reading Fluency (+14.5 words per minute), and Reading Comprehension (+13.8 percentage points). Mean score declines were small for Oral Vocabulary (-1.3 percentage points). In the Russian Grade 2 group, mean scores were down for three of seven subtasks. However, large gains were seen in two key subtasks: Familiar Word Recognition (+40.1 words per minute) and Oral Reading Fluency (+29.7 words per minute).13

Table 8: Comparison of 2014 (Baseline) and 2016 (Equivalent) Results: Grade 4 Kyrgyz Language Russian Language Baseline Midterm Baseline Midterm SUBTASK 2014 2015 2016 2014 2015 2016 Cohorts 2, 3 Cohort 1 Cohorts 2, 3 Cohorts 2,3 Cohort 1 Cohorts 2, 3

13 Although Familiar Word Recognition and Oral Reading Fluency are highly correlated and both subtasks measure decoding skills, there is a caveat regarding Familiar Word Recognition and Oral Reading Fluency interpretation for Russian Grade 4 in 2016. Please see the Study Limitations section for a more detailed discussion.

EGRA Midterm Data Analytic Report for the Kyrgyz Republic, September 2016 Page 16

Familiar Words 70.2 88.4 89.5 64.1 92.9 104.2 words per minute (27.7)* (33.1) (33.1) (24.6) (33.7) (35.5) Nonsense Words 29.8 33.9 29.0 36.6 32.7 35.3 words per minute (15.1) (13.1) (16.2) (15.2) (11.9) (14.5)

Oral Vocabulary 97.0% 94.7% 95.7% 90.9% 90.9% 88.4% percent correct (6.0) (7.2) (5.7) (10.4) (10.0) (11.9) Oral Reading 66.8 79.0 81.3 64.9 82.8 94.6 Fluency (28.5) (30.1) (30.9) (24.2) (28.2) (31.3) words per min Reading 64.9% 78.4% 78.7% 52.6% 63.1% 48.5% Comprehension* (29.0) (25.4) (25.2) (28.6) (32.7) (31.8) percent correct Listening 66.7% 68.9% 71.2% 82.9% 83.0% 77.5% Comprehension (28.4) (29.3) (28.0) (24.2) (24.5) (33.0) percent correct 78.1% 80.3% 82.7% 83.8% 87.9% 86.7% Dictation (20.0) (20.7) (21.1) (13.4) (15.5) (15.9) percent correct

Sample Sizes: Sample Sizes: 2016 K4 = 1,101 2016 R4 = 297

2015 K4 = 677 2015 R4 = 312 2014 K4 = 993 2014 R4 = 253 Cohort 1: Bishkek, Chui, Jalal-Abad, and Talas regions Cohorts 2 and 3: Batken, Naryn, Issyk-Kul, and Osh regions Subtasks in gray were not strictly comparable between 2014 and 2016. *Standard deviation appears in parentheses ( ).

ORAL READING FLUENCY PROFICIENCY IN 2014 AND 2016 The subtask scores for Oral Reading Fluency (ORF) shown in Tables 7 and 8 are represented in mean scores. The metric for establishing proficiency rates, or the proportion of pupils who met the minimum acceptable standard on ORF, is based on the 2006 National Standards for Reading in the Kyrgyz Republic. Proficiency was defined in terms of the proportion of pupils reading at a rate of 40 words (or more) per minute in Grade 2, and at 80 words (or more) per minute in Grade 4 (Tvaruzkova & Shamatov, 2012).14 Note that whereas ORF mean scores for Kyrgyz Grade 2 (cohorts 2 & 3) have increased from 32.6 wpm in 2014 to 36.9 wpm in 2016, a total of 39.8% of pupils are meeting the national standard of 40 words per minute, which indicates that more than half are still not meeting the minimum standard. Table 9 shows the trends in the proportion of pupils who met the national standards from 2014 to 2016 (cohorts 2 and 3 only), based on equivalent subtask scores. The proportion who met the standards increased from 31.5% to 39.8% in Kyrgyz Grade 2 and from 42.2% to 65.1% in Russian Grade 2. In Kyrgyz Grade 4, the percentages of students who attained proficiency increased from 33.2% to 49.7%, representing a 16.5 percentage-point increase.

14 For experimental purposes, data also have been collected on reading rates based on where students find themselves “at the 1-minute” and “at the 2-minute” marks on several of the reading tasks. Some slow readers may be penalized by a metric that assesses only what they are capable of reading in the first minute. The USAID Quality Reading Project team hopes to conduct further research in this area.

EGRA Midterm Data Analytic Report for the Kyrgyz Republic, September 2016 Page 17

Russian Grade 4 also increased from 2014; however these Russian Grade 4 results should be treated with caution because of the limitations to interpretation highlighted earlier.15 Recall that the subtask results are not comparable across languages because the subtasks are language-dependent and assessing progress in different languages. The data from the two languages are presented together for convenience purposes.

Table 9: Percentage Meeting Reading Fluency Standards by Language and Grade Benchmark Kyrgyz Russian Reading Fluency Standard or Above 2014 2016 2014 2016 Grade 2 40 words or above 31.5 39.8 42.2 65.1 Grade 4 80 words or above 33.2 49.7 22.4 65.6 The 2014 data include only regions that were sampled in 2016, cohorts 2 and 3, and not the full national sample.

Oral Reading Fluency and Reading Comprehension

As noted in the 2016 EGRA Toolkit, many studies have indicated that ORF is positively correlated with Reading Comprehension (RTI, 2015). On the EGRA, reading comprehension questions were asked upon completion of the ORF subtask to ensure that readers were not simply repeating words they did not understand. The results of the Reading Comprehension subtask, however, should be interpreted in light of the fact that at the early primary levels it is difficult to provide enough text and corresponding items for a highly reliable assessment of reading comprehension: Again, EGRA is an assessment of pre-reading and early reading skills that are the foundational elements of reading, not a comprehensive assessment of reading comprehension. Nonetheless, as the 2016 data demonstrate, ORF and Reading Comprehension are indeed positively correlated. Note in Figure 2 the positive correlation between the number of words read per minute and the number of reading comprehension questions answered correctly on Kyrgyz Grade 4: As scores rise on one subtask, they rise on the other as well. The correlation coefficients for ORF and the comprehension questions on the four subtasks were Kyrgyz Grade 2 (.42), Kyrgyz Grade 4 (.53). Russian Grade 2 (.43), and Russian Grade 4 (.66). Thus, the ORF subtask can be considered a reasonable proxy for reading comprehension to a certain extent. The edges on the boxes in the plot represent the 25th through 75th percentile range of scores, whereas the line through the box indicates the median score on ORF. The lines, or “whiskers,” represent the dispersion of scores.16 Figure 2 illustrates the data for Kyrgyz Grade 4 only and plots for the other grades and languages are provided in Appendix 9.

15 The linear equating approach for nonequivalent groups used to equate ORF across years (see Appendix 8) involved the use of the Familiar Word Recognition subtasks as anchor items. One of the core assumptions of this approach is the necessity of complete, parallel test forms. As indicated in the Study Limitations section for Russian Grade 4, whether this assumption can be said to be satisfied completely is under question. The equated results for ORF are possibly overestimating score gains for Russian Grade 4. 16 In several places, scores appear to extend to the extremes where a few pupils have read very few words per minute, yet answered several questions correctly. Note that these data points at the extremes of the lines represent only a very few outliers in the data that could be attributed to error or aberration.

EGRA Midterm Data Analytic Report for the Kyrgyz Republic, September 2016 Page 18

Figure 2: Relationship between Oral Reading Fluency and Reading Comprehension, 2016

Kyrgyz Grade 4 Reading Fluency by Reading Comprehension Score, 2016

150

100

50

Correct wordsperCorrect minute 0

0 20 40 60 80 100 excludes outside values Reading comprehension percent correct

2016 USAID F INDICATORS

2016 Grade 2 results are also reported in terms of key USAID F-Indicators below. In these tables, the percentages represent the proportion of pupils in project schools attaining the standard for minimum number words per minute at grade level. The figures in parentheses under the percentages indicate the number of pupils in the sample. Note that in Table 10 the two languages of instruction are combined for boys and girls, while Table 11 indicates the proportion of pupils in project schools attaining the minimum number of words by LOI and by location of school (urban or rural), both languages combined.

Table 10: Percentages Meeting National Reading Standards by Gender, All Languages Total (n) Boys (n) Girls (n) 45.2% 34.9% 56.6% Grade 2 (677) (354) (323)

Table 11: Percentages of Grade 2 Pupils Meeting National Reading Standards by Language, by School Location (All Languages) Language of Total (n) Instruction 39.4% Kyrgyz (537) 70.1% Russian (140) School Location Total (n) 42.4% Rural (538) 54.4% Urban (139)

EGRA Midterm Data Analytic Report for the Kyrgyz Republic, September 2016 Page 19

2016 Grade 4 results are reported in terms of key USAID F-Indicators below. In the tables, the percentages represent the proportion of pupils in project schools attaining the standard minimum number words per minute at grade level. The figures in parentheses under the percentages indicate the number of pupils in the sample. Note that in Table 12 languages of instruction are combined for boys and girls, while Table 13 indicates the proportion of pupils in project schools attaining the minimum number of words by language of instruction and by location of school (urban or rural), both languages combined. Finally, Table 14 presents the total number of pupils attaining standards and answering 80% of the Reading Comprehension questions correctly for both grades.

Table 12: Percentages of Grade 4 Pupils Meeting National Standards by Gender, All Languages Total (n) Boys (n) Girls (n) 54.5% 42.3% 67.1% Grade 4 (684) (350) (334)

Table 13: Percentages of Grade 4 Pupils Meeting National Standard by Language, by School Location, (All Languages) Language of Total (n) Instruction 51.1% Kyrgyz (547) 69.0% Russian (137) School Location Total (n) 52.8% Rural (543) 59.8% Urban (141)

Table 14: Percentages of Pupils Meeting Reading Standards and Reading with 80% Comprehension Benchmark Kyrgyz Russian Reading Fluency Standard or Above 2016 2016 Grade 2 ORF is 40 words or above and Reading Comp is 80% 7.4 12.6 or above Grade 4 ORF is 80 words or above and Reading Comp is 80% 42.2 24.5 or above

2016 SUBTASK SCORES Before presenting the results by each subtask, a bit more explanation of how the data are presented is in order. In the tables that follow, the mean scores and standard deviations are 2016 subtask results that were not equated to the 2014 scores (expressed in 2014 terms). This means that the results might be slightly different than those presented in Tables 7 and 8.

EGRA Midterm Data Analytic Report for the Kyrgyz Republic, September 2016 Page 20

As highlighted in Tables 7 and 8, the standard deviations are shown in parentheses directly under the mean scores. The standard deviation provides an indication of the degree to which the mean scores are spread out across the score distribution. Recall also that when sampling from a large population, it is important to assess for statistical significance of any observed differences in outcomes (by groups such as gender) to determine with certainty whether or not those differences could be attributed to chance in our samples. The null hypothesis in all our estimations was of one “no difference” between compared groups. In addition to presenting mean scores and standard deviations in the tables that follow, we also note the “Diff” value, which indicates the numerical difference between two scores. Whether or not these differences were statistically significant at the .05 or .01 levels is noted with either one or two asterisks, respectively. For example, a p-value coefficient of less than .05 indicates that with the given results, there is less than a 5% chance that the mean scores of boys and girls are equal in the population under study. A coefficient of less than .01 indicates that with the given results, there is less than a 1% chance that the mean scores in the population are equal. In cases in which the values are less than .05 or .01, we can reject the null hypothesis of no difference and can state with relative certainty that the true values in the population are, in fact, different. With large sample sizes, statistical significance tests can sometimes indicate statistically significant differences where the practical meaning of that difference is in fact negligible. In Table 15, the final column, “D,” represents Cohen’s d, a coefficient expressed in units of standard deviation that indicates the strength of the practical effect of statistically significant mean score differences. Thus, for example, an effect size of .5 indicates that the difference between mean scores is one half of a standard deviation. Cohen’s d was estimated and reported only when the mean score differences were, in fact, statistically significant. For a fuller explanation of the rationale for employing an effect size measure and to understand how these statistics were calculated, including formulas used in their estimation, see Appendix 7.

LETTER NAME RECOGNITION RESULTS (GRADE 2) In 2016, the mean score on the Kyrgyz Letter Name Recognition (LNR) subtask (n = 1,084) was 66.7 letters per minute (SD 19.4). The total mean score on the Russian LNR subtask (n = 299) was 65.7 letters per minute (SD 28.4). In Table 15, we present the results for the LNR subtask by gender in both languages.

Table 15: Letter Name Recognition Results by Gender, 2016 Kyrgyz Language Russian Language

Subtask Total Male Fem. Diff. D Total Male Fem. Diff. D N=1,084 n=576 n=508 N= 299 n = 141 n = 158 Letter Name 66.7 63.5 70.3 6.8** 0.4 65.7 62.7 68.3 5.7 (per minute) (19.4) (20.3) (17.7) (28.4) (32.0) (24.5) * Significant at .05 level; ** Significant at .01 level Cohen's d Effect Size Standard deviation appears in parentheses ( ). Small = 0.2 D is Cohen’s d and appears only when the differences are Moderate = 0.5 statistically significant. Large = 0.8

EGRA Midterm Data Analytic Report for the Kyrgyz Republic, September 2016 Page 21

For LNR, girls identified 70.3 letters per minute on the Kyrgyz Grade 2 task, 6.8 more letters per minute on average than boys for this task. This difference on the Kyrgyz test was statistically significant at the .01 level. The effect size of .4 can be considered a moderate effect. Russian girls also outscored their male peers, reading 5 to 7 letters per minute more, but these results were not statistically significant. As Table 16 shows, on the Kyrgyz LNR subtask, urban pupils performed just slightly better than their rural peers, but this difference was not statistically significant. In Russian Grade 2, urbanites scored 7.9 letters per minute higher than their rural peers, but this difference was also not statistically significant. Note the large standard deviation for the Russian groups – both urban and rural - indicating considerable spread of scores across the distribution.

Table 16: Letter Name Recognition Results by School Location, 2016 Kyrgyz Language Russian Language Subtask Urban Rural Diff. D Urban Rural Diff. D n=139 n=945 n=60 n=239 Initial Letter Sound 67.9 66.4 1.5 69.5 61.6 7.9 (% correct) (20.3) (19.3) (30.9) (25.1) * Significant at .05 level; ** Significant at .01 level Cohen's d Effect Size Standard deviation appears in parentheses ( ). Small = 0.2 D is Cohen’s d and appears only when the differences are Moderate = 0.5 statistically significant. Large = 0.8

INITIAL LETTER SOUND (GRADE 2) For the Initial Letter Sound (ILS) subtask, the pupil subtask booklet included a list of the 10 most frequently used letters in the Kyrgyz or Russian alphabets, randomly arranged. The frequency of letters in everyday use was determined during development of the assessment by text analysis and calculations of word count frequencies. Words with these letters were provided on a list for the administrator. The administrator read each word two times and asked the pupils to make the first sound of the word. If a pupil did not answer within 3 seconds, a “no answer” response was recorded. The maximum score for this section was 10 points, with one point assigned to each correct answer.

Table 17: Initial Letter Sound Results by Gender, 2016 Kyrgyz Language Russian Language Subtask Total Male Fem. Diff. D Total Male Fem. Diff. D N=1,089 n=579 n=510 N=300 n=142 n=158 Initial Letter Sound 94.0 93.2 94.8 1.6* 0.1 89.0 87.7 90.2 2.5 (% correct) (11.0) (11.5) (10.3) (20.9) (22.8) (18.9) * Significant at .05 level; ** Significant at .01 level Cohen's d Effect Size Standard deviation appears in parentheses ( ). Small = 0.2 D is Cohen’s d and appears only when the differences are Moderate = 0.5 statistically significant. Large = 0.8

Table 18: Initial Letter Sound Results by School Location, 2016 Kyrgyz Language Russian Language Subtask Urban Rural Diff. D Urban Rural Diff. D N=140 n=949 N=60 n=240

EGRA Midterm Data Analytic Report for the Kyrgyz Republic, September 2016 Page 22

Subtask Kyrgyz Language Russian Language Initial Letter Sound 93.2 94.1 0.9 93.8 84.0 9.8 (% correct) (12.6) (10.7) (16.3) (23.9) * Significant at .05 level; ** Significant at .01 level Cohen's d Effect Size Standard deviation appears in parentheses ( ). Small = 0.2 D is Cohen’s d and appears only when the differences are Moderate = 0.5 statistically significant. Large = 0.8

As in 2014, results on the Initial Letter Sound task in 2016 indicated high performance across gender and language groups - with mean scores over 90% correct (Tables 17 and 18). There was a statistically significant difference between boys and girls in the Kyrgyz group, but the effect size was very small, making this difference almost negligible. Girls also scored higher in the Russian group, but this difference was not statistically significant. In the Kyrgyz group, rural pupils scored .9 points higher whereas in the Russian group, a 9.8 score difference favored urbanites, although neither of these differences were statistically significant.

FAMILIAR WORD RECOGNITION (GRADES 2 AND 4) The total mean score on the 2016 Kyrgyz Grade 2 Familiar Word Recognition (FWR) subtask (n = 1,086) was 58.2 words per minute (SD 29.2). The total mean score on the Russian Grade 2 subtask (n = 300) was 64.7 words per minute (SD 30.1). Note that for both of these groups the spread of the scores was fairly wide, as indicated by the high standard deviations. The total mean score on the Kyrgyz Grade 4 subtask (n = 1,100) was 89.5 words per minute (SD 33.1). The total mean score on the Russian Grade 4 subtask (n = 578) was 104.2 words per minute (SD 35.5). Note the relatively large standard deviations for these results. Tables 19 and 20 present the full data for 2016.

Table 19: Familiar Words Results by Gender, 2016 Kyrgyz Language Russian Language Grade Total Male Fem. Diff. D Total Male Fem Diff. D 2 N=1,086 n=577 n=509 N=300 n=142 n=158 58.2 53.8 63.1 9.27** 0.32 64.7 56.8 71.9 15.08** 0.50 (29.2) (30.0) (27.5) (30.1) (29.7) (28.8) 4 N=1,100 n=578 n=522 N=297 n=145 n=152 89.5 82.9 96.7 13.8** 0.4 104.2 101.9 106.4 4.5 (33.1) (33.8) (30.7) (35.5) (39.3) (31.3) * Significant at .05 level; ** Significant at .01 level Cohen's d Effect Size Standard deviation appears in parentheses ( ). Small = 0.2 D is Cohen’s d and appears only when the differences are Moderate = 0.5 statistically significant. Large = 0.8

Table 20: Familiar Words Results by School Location, 2016 Kyrgyz Language Russian Language Grade Total Urban Rural Diff. D Total Urban Rural Diff. D 2 N=1,086 n=140 n=946 N=300 n=60 n=240 58.2 58.2 58.2 0.0 64.7 72.2 56.9 15.3 (29.2) (31.5) (28.8) (30.1) (32.0) (26.1)

EGRA Midterm Data Analytic Report for the Kyrgyz Republic, September 2016 Page 23

Grade Kyrgyz Language Russian Language 4 N=1,100 n=140 n=960 N=297 n=60 n=237 89.5 89.8 89.4 0.4 104.2 110.8 97.3 13.5 (33.1) (34.8) (32.8) (35.5) (32.8) (37.1) * Significant at .05 level; ** Significant at .01 level Cohen's d Effect Size Standard deviation appears in parentheses ( ). Small = 0.2 D is Cohen’s d and appears only when the differences are Moderate = 0.5 statistically significant. Large = 0.8

On the FWR subtasks, girls outperformed boys on all four subtasks, although the difference was not statistically significant for Russian Grade 4. For the other three subtasks, however, the differences were numerically large and statistically significant at the .01 level. In both the Kyrgyz grades, there was virtually no difference in performance between rural and urban students on the FWR subtask. In the Russian cohorts, urban students scored numerically higher by 15.3 (Grade 2) and 13.5 (Grade 4) than their rural peers but these differences were not found to be statistically significant.

NONSENSE WORD RECOGNITION (GRADES 2 AND 4) As Table 21 illustrates, the total mean score on the Kyrgyz Grade 2 Nonsense Word Recognition (NWR) subtask (n = 1,058) was 27.8 pseudowords per minute (SD 13.9). The total mean score on the Russian Grade 2 NWR subtask (n = 299) was 30.1 pseudowords per minute (SD 13.1). Note that there was considerably less dispersion around the mean for this subtask than for FWR. The total mean score on the Kyrgyz Grade 4 NWR subtask (n = 1,099) was 29.0 pseudowords per minute (SD 16.2). The total mean score on the Russian Grade 4 NWR subtask (n = 296) was 35.3 pseudowords per minute (SD 14.5). Table 22 presents the results by school location.

Table 21: Nonsense Words Results by Gender, 2016 Kyrgyz Language Russian Language Grade Total Male Fem Diff. D Total Male Fem Diff. D 2 N=1,058 n=560 n=498 N=299 n=141 n=158 27.8 25.6 30.3 4.73** 0.34 30.1 27.7 32.3 4.59** 0.35 (13.9) (13.1) (14.4) (13.1) (13.6) (12.4) 4 N=1,099 n=577 n=522 N=296 n=145 n=151 29.0 25.6 32.7 7.1** 0.4 35.3 34.6 35.9 1.3 (16.2) (15.5) (16.3) (14.5) (15.0) (13.9) * Significant at .05 level; ** Significant at .01 level Cohen's d Effect Size Standard deviation appears in parentheses ( ). Small = 0.2 D is Cohen’s d and appears only when the differences are Moderate = 0.5 statistically significant. Large = 0.8

Table 22: Nonsense Words Results by School Location, 2016 Kyrgyz Language Russian Language Grade Total Urban Rural Diff. D Total Urban Rural Diff. D 2 N=1,058 n=134 n=924 N=299 n=60 n=239 27.8 28.1 27.7 0.4 30.1 33.1 27.0 6.1

EGRA Midterm Data Analytic Report for the Kyrgyz Republic, September 2016 Page 24

Grade Kyrgyz Language Russian Language (13.9) (14.2) (13.9) (13.1) (13.5) (12.1) 4 N=1,099 n=140 n=959 N=296 n=60 n=236 29.0 29.0 29.0 0.0 35.3 36.6 33.9 2.7 (16.2) (16.9) (16.1) (14.5) (15.3) (13.5) * Significant at .05 level; ** Significant at .01 level Cohen's d Effect Size Standard deviation appears in parentheses ( ). Small = 0.2 D is Cohen’s d and appears only when the differences are Moderate = 0.5 statistically significant. Large = 0.8

A gender gap is evident at both grades in both languages for the Nonsense Word Recognition subtask. Girls outscored boys in both languages and at both grade levels: Three of these four categories have statistically significant differences, and only Russian Grade 4 girls have higher numerical scores that were not statistically significant. Note that urban pupils outscored rural pupils on three of four subtasks; however, none of these differences were large or statistically significant.

ORAL READING FLUENCY WITH COMPREHENSION (GRADES 2 AND 4) Recall that the results in the following section differ slightly from those presented in Tables 7, 8, and 9, because these ORF 2016 results have not been equated with the 2014 subtasks. Recall also that unlike the ORF results, the Reading Comprehension (RC) results were recorded as percentage correct (Table 24). As Table 23 shows, the total mean score on the Kyrgyz Oral Reading Fluency Grade 2 subtask (n = 1,086) was 40.6 words per minute (SD 23.5). The total mean score on the Russian Grade 2 subtask (n = 298) was 47.7 words per minute (SD 24.9). The total mean score on the Kyrgyz ORF Grade 4 subtask (n = 1,098) was 61.2 words per minute (SD 28.1). The mean score on the Russian ORF Grade 4 subtask (n = 292) was 65.1 words per minute (SD 28.1).

As in 2014 and on the 2015 midterm, there were large mean score differences favoring girls on this subtask (Table 23). In 2016, these differences were statistically significant with moderate effect sizes on all four assessments. As in 2015, the largest gender gap was in Kyrgyz Grade 4, where girls were favored by 16.9 words per minute, on average (effect size = .6). They read at a rate of 70 words per minute whereas their male counterparts read at a rate of only 53 words per minute. Gender gaps on the other grades and languages were also large. On the Reading Comprehension questions that follow the ORF subtask, girls outscored boys numerically in all four groups, yet only the two Kyrgyz subtasks had statistically significant, yet small, differences (Table 24).

Table 23: Oral Reading Fluency by Gender, per Minute, 2016 Kyrgyz Language Russian Language Grade Total Male Fem Diff. D Total Male Fem Diff. D 2 N=1,086 n=576 n=510 N=298 n=140 n=158 40.6 36.2 45.6 9.36** 0.40 47.7 41.1 53.6 12.48** 0.5 (23.5) (22.8) (23.3) (24.9) (24.2) (24.0)

EGRA Midterm Data Analytic Report for the Kyrgyz Republic, September 2016 Page 25

Grade Kyrgyz Language Russian Language 4 N=1,098 n=576 n=522 N=292 n=140 n=152 61.2 53.1 70.0 16.9** 0.6 65.1 58.7 71.1 12.4* 0.4 (28.1) (25.9) (27.9) (28.1) (29.4) (25.5) * Significant at .05 level; ** Significant at .01 level Cohen's d Effect Size Standard deviation appears in parentheses ( ). Small = 0.2 D is Cohen’s d and appears only when the differences are Moderate = 0.5 statistically significant. Large = 0.8

Table 24: Reading Comprehension by Gender, Percent Correct, 2016 Kyrgyz Language Russian Language Grade Total Male Fem. Diff. D Total Male Fem Diff. D 2 N=1,089 n=579 n=510 N=300 n=142 n=158 52.2 49.6 55.1 5.42* 0.17 37.2 34.5 39.6 5.08 (31.1) (32.4) (29.3) (30.0) (31.5) (28.5) 4 N=1,101 n=579 n=522 N=297 n=145 n=152 78.7 75.2 82.5 7.3** 0.3 48.5 41.4 55.3 13.9 (25.2) (27.1) (22.3) (31.8) (32.0) (30.2) * Significant at .05 level; ** Significant at .01 level Cohen's d Effect Size Standard deviation appears in parentheses ( ). Small = 0.2 D is Cohen’s d and appears only when the differences are Moderate = 0.5 statistically significant. Large = 0.8

Tables 25 and 26 present the data by school location. Urbanites outscored their rural counterparts on all ORF tests in both languages but not by numerically large or statistically significant margins in the Kyrgyz cohort. The differences between the urban and rural pupils in the Kyrgyz cohort were especially negligible, although urbanites in the Russian groups scored as high as 12.5 more words per minute than their rural peers. Gaps in Reading Comprehension between urban and rural pupils were also not large, with the exception of Russian Grade 4, in which urbanites correctly answered 54.8% of the questions whereas their rural counterparts correctly answered 41.9%.

Table 25: Oral Reading Fluency per Minute by Location, 2016 Kyrgyz Language Russian Language Grade Total Urban Rural Diff. D Total Urban Rural Diff. D 2 N=1,086 n=139 n=947 N=298 n=60 n=238 40.6 41.6 40.4 1.2 47.7 52.9 42.1 10.9 (23.5) (26.2) (23.0) (24.9) (25.0) (23.7) 4 N=1,098 n=140 n=958 N=292 n=60 n=232 61.2 61.5 61.2 0.3 65.1 71.1 58.6 12.5 (28.1) (27.8) (28.2) (28.1) (28.1) (26.9) * Significant at .05 level; ** Significant at .01 level Cohen's d Effect Size Standard deviation appears in parentheses ( ). Small = 0.2 D is Cohen’s d and appears only when the differences are Moderate = 0.5 statistically significant. Large = 0.8

Table 26: Reading Comprehension, Percent Correct by Location, 2016 Grade Kyrgyz Language Russian Language

EGRA Midterm Data Analytic Report for the Kyrgyz Republic, September 2016 Page 26

Grade Kyrgyz Language Russian Language Total Urban Rural Diff. D Total Urban Rural Diff. D 2 N=1,089 n=140 n=949 N=300 n=60 n=240 52.2 48.6 52.8 4.2 37.2 39.9 34.3 5.7 (31.1) (30.6) (31.2) (30.0) (26.9) (32.9) 4 N=1,101 n=140 n=961 N=297 n=60 n=237 78.7 79.7 78.5 1.2 48.5 54.8 41.9 12.9 (25.2) (23.6) (25.5) (31.8) (30.4) (32.1) * Significant at .05 level; ** Significant at .01 level Cohen's d Effect Size Standard deviation appears in parentheses ( ). Small = 0.2 D is Cohen’s d and appears only when the differences are Moderate = 0.5 statistically significant. Large = 0.8 Another way of examining the Reading Comprehension subtask is to look at the proportions of pupils answering what number of questions correctly. Recall that each pupil was asked a short series of questions as a comprehension check on the ORF subtask. Kyrgyz Grade 2 results are presented separately in Figure 3 because the Kyrgyz Grade 2 ORF subtask consisted of a total of four comprehension questions whereas the other subtasks comprised a total of five comprehension questions. As Figure 3 illustrates, 37% of Grade 2 respondents correctly answered three or four of a total of four questions. Only 16% answered zero questions correctly, with a large number of pupils (47%), answering one or two questions correctly. Note that in Kyrgyz Grade 4 (Figure 4), the percentage of students answering four or five out of five questions correctly increases to 67% whereas only 3% of pupils answered no questions correctly.

Figure 3: Number Correct, Kyrgyz Grade 2, Reading Comprehension, 2016

100% 90% 12% 80% 25% 70% 4 of 4 Correct 60% 3 of 4 Correct 50% 27% 2 of 4 Correct 40% 1 of 4 Correct 30% 0 of 4 Correct 20% 20% 10% 16% 0%

The trend within the Russian group was similar in that for Grade 2, as 30% of all pupils correctly answered three, four, or five out of five questions whereas this percentage grew to 41% for Grade 4. A full 35% of Russian Grade 2 students answered no questions correctly, while only 19% of Grade 4 pupils answered no questions correctly.

EGRA Midterm Data Analytic Report for the Kyrgyz Republic, September 2016 Page 27

Figure 4: Number Correct, Reading Comprehension, 2016

100% 7% 9% 90% 7% 12% 80% 16% 42%

70% 20% 5 of 5 Correct 60% 20% 4 of 5 Correct 3 of 5 Correct 50% 19% 25% 16% 2 of 5 Correct 40% 1 of 5 Correct

30% 21% 0 of 5 Correct 16% 20% 35% 9% 10% 19% 5% 0% 3% Grade 2, Russian Grade 4, Russian Grade 4, Kyrgyz

DICTATION (GRADES 2 AND 4) Scores on Dictation (DICT) tasks were recorded as percent correct scores (Tables 27 and 28). Note that on these tasks, girls scored at statistically significant higher levels in all four groups, with larger average differences among the Kyrgyz second and fourth graders. Girls scored 11 percentage points higher than boys in Grade 2. The differences between rural and urban pupils on the DICT tasks were smaller and none were statistically significant (Table 28). The largest urban and rural gap, at a 5.6 difference in percent correct score, was in Russian Grade 2 but this difference was not statistically significant.

Table 27: Dictation Percent Correct by Gender, 2016 Kyrgyz Language Russian Language Grade Total Male Fem. Diff. D Total Male Fem. Diff. D 2 N=1,089 n=579 n=510 N=300 n=142 n=158 69.5 64.4 75.4 11.05** 0.42 75.8 72.1 79.2 7.12* 0.29 (26.1) (28.2) (22.0) (24.3) (26.5) (21.5) 4 N=1,101 n=579 n=522 N=297 n=145 n=152 82.7 79.1 86.7 7.6** 0.4 86.7 83.7 89.6 5.9* 0.4 (21.1) (22.8) (18.3) (15.9) (18.8) (12.0) * Significant at .05 level; ** Significant at .01 level Cohen's d Effect Size Standard deviation appears in parentheses ( ). Small = 0.2 D is Cohen’s d and appears only when the differences are Moderate = 0.5 statistically significant. Large = 0.8

Table 28: Dictation Percent Correct by Location, 2016 Grade Kyrgyz Language Russian Language

EGRA Midterm Data Analytic Report for the Kyrgyz Republic, September 2016 Page 28

Grade Kyrgyz Language Russian Language Total Urban Rural Diff D Total Urban Rural Diff D 2 n=1,089 n=140 n=949 n=300 n=60 n=240 69.5 68.6 69.7 1.0 75.8 78.6 73.0 5.6 (26.1) (27.6) (25.8) (24.3) (21.8) (26.4) 4 n=1,101 n=140 n=961 n=297 n=60 n=237 82.7 85.4 82.2 3.2 86.7 88.0 85.4 2.5 (21.1) (20.0) (21.3) (15.9) (13.5) (18.2) * Significant at .05 level; ** Significant at .01 level Cohen's d Effect Size Standard deviation appears in parentheses ( ). Small = 0.2 D is Cohen’s d and appears only when the differences are Moderate = 0.5 statistically significant. Large = 0.8

LISTENING COMPREHENSION (GRADES 2 AND 4) For this subtask, pupils were asked four Listening Comprehension (LC) questions after a short text was read aloud. The results for both grades were similar: 57% of Kyrgyz Grade 2 pupils correctly answered three or four out of a total of four questions, and 58% of Kyrgyz Grade 4 pupils correctly answered three or four out of four questions. Only 6% (Grade 2) and 5% (Grade 4) answered zero questions correctly on the Kyrgyz language subtask. In the Russian groups, 40% of Grade 2 pupils correctly answered three or four out of a total of four questions, whereas 53% of Grade 4 pupils correctly answered three or four out of four questions. Figure 5 illustrates the percentage of all pupils who correctly answered zero, one, two, three, or four questions on the LC subtask.

Figure 5: Number Correct, Listening Comprehension, 2016

100%

90% 25% 19% 23% 32% 80% 70% 21% 30% 4 of 4 Correct 60% 32% 26% 3 of 4 Correct 50% 14% 2 of 4 Correct 40% 20% 1 of 4 Correct 30% 25% 26% 29% 0 of 4 Correct 20% 17% 12% 10% 11% 16% 6% [VALUE] 0% 5% Grade 2, Grade 4, Grade 2, Grade 4, Kyrgyz Kyrgyz Russian Russian

Mean scores on Listening Comprehension were recorded as percentage correct and are presented in Table 29. In addition to total mean scores by language and grade, the results are broken down by gender and school location. Girls scored numerically higher on all four subtasks, yet only Kyrgyz Grade 2 had statistically significant mean score differences between boys and girls. That difference can be considered marginal, however, as the effect size was .16. Across both the Russian and Kyrgyz groups and grades, mean scores by gender

EGRA Midterm Data Analytic Report for the Kyrgyz Republic, September 2016 Page 29

on linguistic comprehension subtasks were quite similar. This contrasts with decoding subtasks, which tended to have statistically significant differences by gender. Although none of the mean score differences were statistically significant by school location, the gap between urban and rural pupils (favoring urban pupils) was numerically large for Russian Grade 2—a total of 17.7 percentage points in difference between the mean scores for these two groups. It is possible that with smaller sample sizes, some mean score differences might not be statistically significant; nonetheless, they merit more investigation.

Table 29: Listening Comprehension Percent Correct by Gender, 2016 Kyrgyz Language Russian Language Grade Total Male Fem. Diff. D Total Male Fem. Diff. D 2 n=1,089 n=579 n=510 n=300 n=142 n=158 67.9 65.7 70.3 4.54** 0.16 60.1 59.6 60.5 0.86 (27.8) (28.2) (27.2) (34.4) (35.3) (33.6) 4 n=1,101 n=579 n=522 n=297 n=145 n=152 70.0 68.1 72.0 4.0 63.4 60.7 65.9 5.2 (28.1) (28.1) (28.0) (33.2) (35.3) (31.0) * Significant at .05 level; ** Significant at .01 level Cohen's d Effect Size Standard deviation appears in parentheses ( ). Small = 0.2 D is Cohen’s d and appears only when the differences are Moderate = 0.5 statistically significant. Large = 0.8

Table 30: Listening Comprehension Percent Correct by School Location, 2016 Kyrgyz Language Russian Language Grade Total Urban Rural Diff D Total Urban Rural Diff D 2 n=1,089 n=140 n=949 n=300 n=60 n=240 2 67.9 63.4 68.7 5.3 60.1 68.7 51.0 17.7 (27.8) (31.1) (27.1) (34.4) (30.7) (35.8) 4 n=1,101 n=140 n=961 n=297 n=60 n=237 70.0 70.4 69.9 0.5 63.4 66.0 60.6 5.4 (28.1) (28.6) (28.0) (33.2) (34.5) (31.9) * Significant at .05 level; ** Significant at .01 level Cohen's d Effect Size Standard deviation appears in parentheses ( ). Small = 0.2 D is Cohen’s d and appears only when the differences are Moderate = 0.5 statistically significant. Large = 0.8

ORAL VOCABULARY (GRADES 2 AND 4) The Oral Vocabulary (OV) subtask results are presented as percentage correct scores. Overall, mean scores ranged from 71% (Russian Grade 2) to 91% (Kyrgyz Grade 4). Boys in Kyrgyz Grade 4 scored higher than girls at a statistically significant level; in fact, OV was the only subtask on which any male cohort outperformed female cohorts. Girls scored higher than boys on the other three subtasks, however; the female advantage in Russian Grade 2 was +5.1, a statistically significant level (with a small effect size, .25). In the Kyrgyz groups, rural pupils scored numerically higher in both grades but not at statistically significant levels. The opposite was true for the Russian cohorts: Urban pupils scoring higher though also not at statistically significant levels.

EGRA Midterm Data Analytic Report for the Kyrgyz Republic, September 2016 Page 30

Table 31: Oral Vocabulary Results by Gender and School Location, 2016 Kyrgyz Language Russian Language Grade Total Male Fem. Diff. D Total Male Female Diff. D 2 n=1,089 n=579 n=510 n=300 n=142 n=158 89.5 89.7 89.4 0.33 71.2 68.5 73.6 5.10* .25 (11.4) (11.6) (11.2) (20.6) (21.6) (19.5) 4 n=1,101 n=579 n=522 n=297 n=145 n=152 91.7 92.7 90.6 2.2** 0.2 81.6 80.1 83.0 2.9 (9.8) (9.4) (10.1) (15.6) (16.1) (14.9) * Significant at .05 level; ** Significant at .01 level Cohen's d Effect Size Standard deviation appears in parentheses ( ). Small = 0.2 D is Cohen’s d and appears only when the differences are Moderate = 0.5 statistically significant. Large = 0.8

Table 32: Oral Vocabulary Results by Gender and School Location, 2016 Kyrgyz Language Russian Language Grade Total Urban Rural Diff. D Total Urban Rural Diff. D 2 n=1,089 n=140 n=949 n=300 n=60 n=240 89.5 86.3 90.1 3.8 71.2 75.2 67.1 8.1 (11.4) (14.5) (10.7) (20.6) (18.2) (22.3) 4 n=1,101 n=140 n=961 n=297 n=60 n=237 91.7 91.2 91.8 0.5 81.6 84.6 78.4 6.3 (9.8) (10.3) (9.7) (15.6) (13.9) (16.7) * Significant at .05 level; ** Significant at .01 level Cohen's d Effect Size Standard deviation appears in parentheses ( ). Small = 0.2 D is Cohen’s d and appears only when the differences are Moderate = 0.5 statistically significant. Large = 0.8

EGRA Midterm Data Analytic Report for the Kyrgyz Republic, September 2016 Page 31

III. FINDINGS AND DISCUSSION OVERALL RESULTS The purpose of this report is to present and analyze the 2016 midterm EGRA data. Any recommendations offered herein should be further investigated through context-based, data- driven dialogues with teachers as well as implementers engaged in efforts to increase capacity building among teachers in . In this section, we offer ideas for consideration in regard to the study’s findings and present policy-related questions for further discussion. A close examination reveals that although the results are somewhat positive, there are important areas for further consideration and discussion. There were positive trends in pupil performance between 2014 and 2016 for cohorts 2 and 3. As highlighted in the preceding sections, scores on key subtasks, such as Letter Name Recognition (LNR), Familiar Word Recognition, and Oral Reading Fluency (ORF), rose during this period. In both grades, the proportions of pupils who met national standards in ORF also grew during this period. In fact, pupils appear to be performing quite well on several of the core subtasks. The ability to read 66.7 letters per minute (Kyrgyz Grade 2) is tantamount to just above a letter per second. With a standard deviation of 19.4 and normally distributed results, approximately 68% of all pupils tested are reading between 46 and 86 letters per minute. This indicates reasonably strong alphabetic knowledge at the second-grade level. Furthermore, a 94% mean score (Kyrgyz Grade 2) on Initial Letter Sound (ILS) indicates that phonemic awareness is not posing significant challenges for second-graders. Although the Russian cohorts scored lower at 89%, this mean score value is still relatively high. Another way to look at the data is in terms of percentages of pupils with “zero scores,” or the percentage answering no question correctly. As Table 33 below indicates, the number of pupils who struggled enough to discontinue the assessment at onset or simply score zero was very low indeed. Out of more than 1,000 Kyrgyz language pupils, only three received zero scores on the LNR subtask. A higher number received zero scores on the ORF subtask, yet this number was only 24 out of 1,086 pupils. The low number of zero scores for both grades and languages, and across subtasks tells a similar story.

Table 33: Number and Percentage of Pupils with Zero Scores, 2016 Kyrgyz Grade 2 Russian Grade 2 Letter Name Recognition 3 (0.3%) 15 (5%) Initial Letter Sound 0 (0%) 6 (2%) Oral Reading Fluency 24 (2%) 11 (4%) From Total N = 1,086 300 Kyrgyz Grade 4 Russian Grade 4 Oral Reading Fluency 13 (1%) 12 (4%) From Total N = 1,101 297

EGRA Midterm Data Analytic Report for the Kyrgyz Republic, September 2016 Page 32

In a recent study funded by the World Bank, it was reported that approximately 60% of first- graders in the Kyrgyz Republic have a basic understanding of the alphabet when they enter the first grade (CEATM, 2014). These EGRA findings support the notion that slightly more than half of the population ostensibly is starting school with a certain foundation in place. Furthermore, there appears to be no achievement gap by school location for either the LNR or the ILS core Grade 2 subtasks. Grade 2 readers are also recognizing nearly one familiar word per second (Kyrgyz Grade 2, 58.2 words per minute; Russian Grade 2, 64.7 words per minute), whereas Grade 4 readers are recognizing between 15 and 40 more familiar words per minute. In addition, Kyrgyz Grade 2 readers are recognizing about 28 pseudo words per minute, and Russian Grade 2 readers are recognizing some 30 pseudo words per minute, thus indicating basic decoding skills in both languages. As in 2015, scores on Oral Vocabulary, Dictation, and Listening Comprehension were relatively high for cohorts 2 and 3 in 2016. As noted in the results section above, there were also increases in the number of pupils attaining reading standards in 2016 across grades and languages. The project has been working with the MOES, National Testing Center, Kyrgyz Academy of Education, and local methodologists and mentors to enhance teaching skills across the country. The project has disseminated new materials to teachers in formative assessment and teachers have had a chance to engage with supplemental reading. In-service trainings have been ongoing which has provided targeted training to broad spectrum of teachers. Switching from teacher centered reading instruction to student centered reading instruction and getting good reading material into the classroom may continue to help increase attainment. Reading clubs and libraries may be having the effect of increasing time spent reading through their extensive work and campaigns in the media and with parents.

Remaining Challenges Despite successes, the overall results present a mixed picture. In comparison with most other countries participating in EGRA studies, the 2016 Grade 2 mean score performances of 40.6 words per minute (Kyrgyz) and 47.7 words per minute (Russian) are reasonably high. Yet, with a national expectation benchmarked at 40 words per minute, a considerable proportion of the population is not meeting national expectations. The Grade 4 situation is similar: Just under half of all pupils are still not meeting national benchmarks. Further, as we present in later sections, there are large gender gaps in skills attainment that appear to be growing, not shrinking, by Grade 4. These gender gaps are especially apparent on subtasks requiring decoding skills. FOR MINISTRY AND USAID DISCUSSION Understanding the state of reading outcomes in the early grades in the Kyrgyz Republic is, of course, the crucial first step toward improving reading instruction and outcomes. It is essential to understand what is happening in this area, to monitor progress at all levels, and to adapt and calibrate interventions and supports as necessary. It is also essential that any initiatives, reforms, or proposed changes to the status quo in any one area of the system be tightly aligned with other parts of the system.

EGRA Midterm Data Analytic Report for the Kyrgyz Republic, September 2016 Page 33

In the Kyrgyz Republic, reading was not taught as a stand-alone subject until 2010 (Tvaruzkova & Shamatov, 2012); however, the school subject Native Language and Literature is taught from the earliest grades. As in other school subjects in the republic, a fundamental purpose of Native Language and Literature historically has been to provide a strong knowledge base and to transmit core cultural values and ideals (Shamatov & Niyozov, 2010). Instructional emphases on mastering (memorizing) core knowledge, accuracy in orthography, and text and oral reproduction is manifest in the considerable time invested in pupil reproduction of knowledge during daily lessons. The oral reproduction of texts (memorization of key portions of literature, poems, essays, and so forth) occupies a prominent place in classrooms, as does orally answering questions posed by the teacher (Shamatov & Niyozov, 2010). In this context, questions for further discussion include: • Is there a connection between traditional teaching practices (heavy reliance on rote learning) and the apparent slowdown in reading progress after Grade 2? If so, how can this be overcome? • How have teacher attitudes changed or not changed since reading became a stand-alone subject in 2010? In other words, what is the actual level of employment of newly learned instructional practices by teachers (use of new decoders, etc.)? • Is “time on task” adequate or are there gaps between intended teaching time and actual time due to resource or other constraints on teachers’ time? • How successful has the MOES been in introducing new instructional practices that focus more on reading outcomes? Examples include differentiated instruction, more individual pupil reading time, reading circles, levelled readers, allocation of resources and materials to students based on reading levels, exposure to content “outside the curricula” that may increase motivation, and other creative classroom strategies that promote interest in reading. • How is formative assessment in the early grades conducted to ensure continuous reading improvement? What are the training needs in this area? • How have results from the 2014 and 2015 EGRAs been relayed to teachers throughout the country? Have there been opportunities for teachers to reflect on EGRA results and articulate change strategies based on the results (especially for Grade 4)? • How can professional development programming become more focused on increasing student reading outcomes? Can some non-priority types of professional training be reduced in lieu of more concentration on the professional development of instructional leaders in reading? FOCUS ON GENDER

KYRGYZ LANGUAGE As in 2015, the clear pattern that emerges from the 2016 results in relation to gender is that girls outperformed boys on a large number of tasks, and that these gaps appear to be increasing over time in the Kyrgyz group. These gaps are especially wide on subtasks in which decoding skills are assessed.

EGRA Midterm Data Analytic Report for the Kyrgyz Republic, September 2016 Page 34

These overall findings on gender are consistent with the 2012 EGRA, the 2014 baseline EGRA, and the first 2015 midterm administered in Cohort 1 (Tvaruzkova & Shamatov, 2012; AIR, 2014; AIR, 2015). On the EGRA 2016 Kyrgyz subtasks, girls numerically outscored boys on 14 of the 16 subtasks. Independent sample t-tests revealed statistically significant differences in scores by gender on 14 of the 16 subtasks, although only nine of these had effect sizes that could be considered small or moderate. Grade 4 boys outscored their female peers at a statistically significant level only on Oral Vocabulary, whereas the four-point advantage by Grade 4 girls on Listening Comprehension was not statistically significant (both linguistic comprehension skills). As on the 2015 midterm, the areas in which Kyrgyz language girls demonstrated the largest advantages were related to subtasks that tapped into decoding skills. The largest gender gap was in Kyrgyz Grade 4, where girls were favored by 16.9 words per minute, on average (effect size = .6). They read at a rate of 70 words per minute whereas their male counterparts read at a rate of only 53 words per minute. Other score differentials in favor of girls included +.9.3 wpm on Grade 2 FWR, +13.8 wpm on Grade 4 FWR, +9.4 wpm on Grade 2 ORF, +16.9 wpm on Grade 4 ORF, +11.1 percentage points on Grade 2 DICT, and +7.6 percentage points on Grade 4 DICT. Note that as in 2015, the gaps between boys and girls on most subtasks grew between Grades 2 and 4. When one looks at the difference in values across grades (Table 34) this growth in the gender gap is visible.

Table 34: Kyrgyz Language Gender Comparison, 2016 Kyrgyz Grade 2 Kyrgyz Grade 4 Favored Difference Effect Size Favors Difference Effect Size Familiar Words Girls +9.3** Moderate Girls +13.8** Small (per minute) Nonsense Words Girls +4.7** Moderate Girls +7.1** Small (per minute) Oral Vocabulary Not Boys +0.3 Boys +2.2** Small (percent correct) significant Oral Reading Fluency Girls +9.4** Moderate Girls 16.9** Moderate (per minute) Reading Comp Girls +5.4* Small Girls 7.3** Small (percent correct) Listening Comp Not Girls +4.5** Small Girls 4.0 (percent correct) significant Dictation Girls +11.1** Moderate Girls 7.6** Small (percent correct) Initial Letter Sound Girls +1.6* Small (percent correct) Letter Name Girls +6.8** Moderate (per minute) Legend: Cohen's d Effect Size Boys Favored Small = 0.2 Moderate = 0.5 Girls Favored Large = 0.8 * Significant at .05 level; ** Significant at .01 level

EGRA Midterm Data Analytic Report for the Kyrgyz Republic, September 2016 Page 35

In a comparison of the score distributions of boys and girls, Figures 6 and 7 visually illustrate the large gender gaps in both grades in the Kyrgyz groups. Note that the pattern of girls outperforming boys is especially apparent at Grade 4.

Figure 6: Oral Reading Fluency Distribution by Gender (Kyrgyz Grade 2), 2016

60

40 Frequency

20

0 0 25 50 75 100 125 150 Correct words per minute Boys Girls

Figure 7: Oral Reading Fluency Distribution by Gender (Kyrgyz Grade 4), 2016

60

40 Frequency

20

0 0 25 50 75 100 125 150 175 200 225 250 Correct words per minute Boys Girls

EGRA Midterm Data Analytic Report for the Kyrgyz Republic, September 2016 Page 36

RUSSIAN LANGUAGE As in the Kyrgyz groups, the trend of girls outscoring boys held for the Russian LOI groups. Girls outscored boys numerically on all 16 subtasks; however, only seven of these score differences—with five of the seven at Grade 2—were statistically significant. Note that neither Letter Name Recognition nor Initial Letter Sound had statistically significant differences by gender. What is notable about the Russian cohort is that unlike the Kyrgyz cohort, there is no increase in the gaps between Grades 2 and 4, with the exception of the Reading Comprehension task, which consisted of very few items. For the Russian groups, the gender gap actually closes somewhat by Grade 4. For example, the gender gap decreased from 15.1 to 4.5 wpm for Familiar Word Recognition, from 4.6 to 1.3 wpm for Nonsense Word Recognition, from 5.1 to 2.9 percentage points for Oral Vocabulary, and from 7.1 to 5.9 percentage points for Dictation. The gap remained steady only for Oral Reading Fluency, at 12.5 to 12.4 wpm. Similar to the Kyrgyz group, the largest gaps for the Russian cohort were on subtasks that loaded on decoding skills.

Table 35: Russian Language Gender Comparison, 2016 Russian Grade 2 Russian Grade 4 Favored Difference Effect Size Favors Difference Effect Size Familiar Words Girls +15.1** Moderate Girls +4.5 Not significant (per minute) Nonsense Words Girls +4.6** Small Girls +1.3 Not significant (per minute) Oral Vocabulary Girls +5.1* Small Girls +2.9 Not significant (percent correct) Oral Reading Fluency Girls +12.5** Moderate Girls +12.4* Small (per minute) Reading Comp Girls +5.1 Not significant Girls +13.9 Not significant (percent correct) Listening Comp Girls +0.9 Not significant Girls +5.2 Not significant (percent correct) Dictation Girls +7.1* Small Girls +5.9* Small (percent correct) Initial Letter Sound Girls +2.5 Not significant (percent correct) Letter Name Girls +5.7 Not significant (per minute) Legend: Cohen's d Effect Size Boys Favored Small = 0.2 Girls Favored Moderate = 0.5 * Significant at .05 level; ** Significant at .01 level Large = 0.8

FOR MINISTRY AND USAID DISCUSSION It seems logical that attention should be focused on issues such as large gender gaps, especially on decoding tasks—in particular, at Grade 4. These questions should be considered at the policy level in the Kyrgyz Republic given that only strong instructional leadership in a highly centralized education system likely will result in progress over time. This third wave of EGRA results in the Kyrgyz Republic could serve as a means by which to mediate

EGRA Midterm Data Analytic Report for the Kyrgyz Republic, September 2016 Page 37

important conversations on the need to focus action in the areas of pedagogy, gender issues, focus and resource allocation, information and progress monitoring, and creating a reading culture. In recent decades, USAID and other international donors have focused on identifying and rectifying performance gaps by gender throughout the world. In many countries, emphasis has appropriately focused on closing access and achievement gaps that favor boys over girls. Caution should be taken, however, in extrapolating results from neighboring regions; data trends in each country should determine the trajectory of policy recommendations in each context. In countries of the former Soviet Union, the data from a host of sources have consistently indicated that girls rather than boys have been outperforming their counterparts in both achievement and educational access ( Young, Reeves, & Valyaeva, 2006; Brunner & Tillett, 2007; CEATM, 2009; CEATM, 2010; Drummond, 2011). Important questions to consider at the policy level relate to how the Kyrgyz Republic plans to address gender gaps and improve reading and other educational outcomes for boys, beginning in the early grades. The data on these gaps and the results for Kyrgyz boys in particular would not necessarily warrant urgent measures for Grade 2 if they merely indicated a temporary lag from which recovery in later grades was possible. However, the available data on educational attainment by gender and demographics in the Kyrgyz Republic indicate that the situation for boys does not improve by Grade 4. Therefore, we recommend the following questions for consideration and discussion: • Are there instructional practices that hinder the growth of boys’ skills? Are boys called upon in classrooms as frequently as girls? • Are there any issues with learning styles that hinder their participation? Are teaching methods active enough to keep boys engaged? • Are there activities and materials that could better engage boys in learning? • Are there cultural norms or domestic issues that disproportionally distract boys from active participation in learning activities, attending class, or completing homework assignments? SCHOOL LOCATION For both Kyrgyz LOI grades, the EGRA results indicate that six of the subtasks favored rural cohorts, while eight of the subtasks favored urban cohorts. However, overall differences in mean scores between pupils in rural and urban settings were negligible, and no score differences were statistically significant. Mean percentage scores or per-minute score differences were at one or less for nine of the 16 subtasks. The largest gaps were in Listening Comprehension (+5.3 percentage points) and Reading Comprehension (+4.2 percentage points), and both subtasks favored rural pupils. Again, however, these differences were not statistically significant. In Grade 4, only one of seven tasks favored rural pupils, but this difference was not statistically significant. The largest score gap was for Dictation, at 3.2 percentage points.

EGRA Midterm Data Analytic Report for the Kyrgyz Republic, September 2016 Page 38

Table 36: Kyrgyz Language School Location Comparison, 2016 Kyrgyz Grade 2 Kyrgyz Grade 4 Favored Difference Effect Size Favors Difference Effect Size Familiar Words None None Not significant Urban +0.4 Not significant (per minute) Nonsense Words Urban +0.4 Not significant None None Not significant (per minute) Oral Vocabulary Rural +3.8 Not significant Rural +0.5 Not significant (percent correct) Oral Reading Fluency Urban +1.2 Not significant Urban +0.3 Not significant (per minute) Reading Comp Rural +4.2 Not significant Urban +1.2 Not significant (percent correct) Listening Comp Rural +5.3 Not significant Urban +0.5 Not significant (percent correct) Dictation Rural +1.0 Not significant Urban +3.2 Not significant (percent correct) Initial Letter Sound Rural +0.9 Not significant (percent correct) Letter Name Urban +1.5 Not significant (per minute) Legend: Cohen's d Effect Size Urban Favored Small = 0.2 Rural Favored Moderate = 0.5 Large = 0.8 * Significant at .05 level; ** Significant at .01 level The Kyrgyz results contrast with the Russian results in that for each subtask in both Grades 2 and 4, urban pupils were favored. These urban-rural gaps were numerically larger than those for the Kyrgyz language pupils. The largest gaps in Grade 2 were in Listening Comprehension (+17.7 percentage points) and Familiar Words (+15.3 words per minute), both of which favored urban pupils. Note, however, that according to the results of the t-tests, none of these differences were estimated to be statistically significant.

EGRA Midterm Data Analytic Report for the Kyrgyz Republic, September 2016 Page 39

Table 37: Russian Language School Location Comparison, 2016 Russian Grade 2 Russian Grade 4 Differenc Favore Favored Effect Size Difference Effect Size e d Familiar Words Urban +15.3 Not significant Urban +13.5 Not significant (per minute) Nonsense Words Urban +6.1 Not significant Urban +2.7 Not significant (per minute) Oral Vocabulary Urban +8.1 Not significant Urban +6.3 Not significant (percent correct) Oral Reading Fluency Urban +10.9 Not significant Urban +12.5 Not significant (per minute) Reading Comp Urban +5.7 Not significant Urban +12.9 Not significant (percent correct) Listening Comp Urban +17.7 Not significant Urban +5.4 Not significant (percent correct) Dictation Urban +5.6 Not significant Urban +2.5 Not significant (percent correct) Initial Letter Sound Urban +9.8 Not significant (percent correct) Letter Name Urban +7.9 Not significant (per minute) Legend: Cohen's d Effect Size Urban Favored Small = 0.2 Rural Favored Moderate = 0.5 * Significant at .05 level; ** Significant at .01 level Large = 0.8

FOR MINISTRY AND USAID DISCUSSION

Despite the lack of statistically significant mean score differences, some of the numerical differences noted above merit further attention, especially for the larger gaps in the Russian groups. Some questions for consideration include:

• How does the MOES signal to the oblast (region) and local educational administrative units that the focus on reading is a new national priority? Are there low-cost opportunities to enhance the strength and frequency of such signaling? • Does the MOES (or do local authorities) monitor which teachers are engaged in professional development for reading improvement? Are there ways to leverage the strengths of well-trained cadres so that they may serve as a cost-efficient resource for other teachers who might need mentoring or training support in far out regions? • Do regional and local education departments have the authority and resources to collect data and information on the activities and initiatives in their communities with regard to improving reading instruction, and to collect data at the pupil level? • How does the MOES ensure that best practices in reading instruction are maintained throughout the system – including to most rural areas of the republic?

EGRA Midterm Data Analytic Report for the Kyrgyz Republic, September 2016 Page 40

APPENDIX 1. KYRGYZ ALPHABET The Kyrgyz alphabet (shown below) has 36 letters, and its current iteration is considered to be an “adapted Cyrillic alphabet” given its commonality with the Russian Cyrillic alphabet (Jusayeva, 2004).17 Kyrgyz, however, has several additional letters that are not found in most Slavic-language alphabets. Although few, if any, Letter Name Recognition (LNR) studies on Kyrgyz-speaking populations are available to English-speaking audiences, studies of LNR in the English and French languages have demonstrated that early LNR is a strong predictor of long-term reading development (Chiappe, Siegel, & Wade-Woolley, 2002).

17 Both the Arabic and Latin alphabets have been used in recent times. For more on the history of alphabet politics in the Kyrgyz Republic see (Hu & Imart, 1989).

EGRA Midterm Data Analytic Report for the Kyrgyz Republic, September 2016 Page 41

APPENDIX 2. FACTOR ANALYSIS Using subtask scores from both grades and languages, the underlying structure of the early grade reading data was analyzed by employing principal axis factor analysis. The factor analyses were conducted with subtask results by each grade and language to assess the dimensionality of the entirety of each assessment battery. Factor analysis interpretation was guided by examining factor loadings in a rotated factor matrix. Based on previous research on reading ability, it was plausible that our underlying factors of interest would be correlated; thus, oblique (Oblimin) rotation was selected.

The results from the analyses in the Russian and Kyrgyz languages were consistent with those of other research. The results indicated two primary underlying constructs, linguistic comprehension and decoding, for both grades and languages (Hoover & Gough, 1990). Although the data also indicated that the two distinct factors were correlated with each other on each of the four versions—Kyrgyz 2 (.38), Russian 2 (.36), Kyrgyz 4 (.44), Russian 4 (.48)—they were not correlated highly enough to indicate the presence of only a single factor. These results support the conclusions reached by Hoover and Gough (1990).

Note the consistently high loadings and clustering together of the Oral Reading Fluency, Familiar Words, and Nonsense Words subtasks for both languages. We can denote the convergence of these subtasks as Factor 1, the construct of decoding. The clustering and high loadings of these subtasks indicate that each of them taps into this latent construct. Letter Name and Dictation also loaded on decoding for the Kyrgyz language whereas Dictation loaded on both factors, although only at .374 on linguistic comprehension. In the Russian language group, Letter Name Recognition did not load above .400 on either factor whereas Dictation loaded only on Factor 1.

A second factor was also identified in the analysis. Note the relatively high loadings and clustering of the Listening Comprehension and Oral Vocabulary subtasks. We can denote Factor 2 as the construct of linguistic comprehension, which, as noted in the text, entails listening to stimuli and correctly responding to them. Linguistic comprehension is defined as the ability to understand oral language information without having to rely on decoding (Hoover & Gough, 1990). Note that for Grade 2, the Dictation subtask tapped into both decoding and linguistic comprehension on the Kyrgyz language assessment, although not strongly on either. The dictation task may be somewhat multidimensional, requiring skill in both constructs. The Dictation subtask loaded only on decoding for the Russian assessment, although weakly at .419.

Table 38: Grade 2 Factor Analysis Results Kyrgyz Language 2015 Factor 1 Factor 2 Russian Language 2015 Factor 1 Factor 2

EGRA Midterm Data Analytic Report for the Kyrgyz Republic, September 2016 Page 42

Kyrgyz Language 2015 Factor 1 Factor 2 Russian Language 2015 Factor 1 Factor 2 Nonsense Words .974 –* Nonsense Words .954 – Oral Reading Fluency .958 – Oral Reading Fluency .948 – Familiar Words .947 – Familiar Words .951 – Letter Name .617 – Letter Name – – Dictation .481 .374 Dictation .419 – Oral Vocabulary – .598 Oral Vocabulary .811 Initial Letter Sound – .420 Initial Letter Sound – – Listening Comp – .350 Listening Comp .755 Extraction Method: Principal Axis Factoring. – *Indicates loading less than .300. Rotation Method: Oblimin with Kaiser Normalization.

Table 39: Grade 4 Factor Analysis Results Kyrgyz Language 2015 Factor 1 Factor 2 Russian Language 2015 Factor 1 Factor 2 Nonsense Words .979 –* Nonsense Words .895 – Familiar Words .900 – Familiar Words .819 Oral Reading Fluency .853 – Oral Reading Fluency .744 Dictation .504 – Dictation .384 .364 Oral Vocabulary – .577 Oral Vocabulary .772 Listening Comprehension – .574 Listening Comprehension .639 Extraction Method: Principal Axis Factoring. –* Indicates loading less than .300. Rotation Method: Oblimin with Kaiser Normalization. a. Rotation converged in 4 iterations.

EGRA Midterm Data Analytic Report for the Kyrgyz Republic, September 2016 Page 43

APPENDIX 3: THE EGRA SUBTASKS IN FULL

LETTER NAME RECOGNITION (GRADE 2) The Letter Name Recognition (LNR) subtask assessed knowledge of the alphabetic principle, the foundation of learning to read. The alphabetic principle is defined as the understanding that words are made up of sounds (i.e., phonemes) and that letters (i.e., graphemes) are symbols that represent those sounds. When pupils understand that sounds correspond to letters (i.e., when they develop phonological awareness), they can begin to learn to decode words (McBride-Chang & Kail, 2002; 2004; McBride-Chang & Ho, 2000). For more on the Russian and and their unique characteristics, see the USAID Quality Reading Project: Kyrgyzstan: Early Grade Reading Assessment (EGRA) Midterm Analytic Report (AIR, 2015). The timed LNR task in the Kyrgyz Republic determined (a) whether pupils could correctly identify and read aloud both uppercase and lowercase letters and (b) the pace at which pupils were able to read letters (letters per minute). The identification of both upper- and lowercase letters was employed because research in other languages has suggested that reading skills progress only after 80% of letters in both cases are mastered (Seymour, Aro, & Erskine, 2003). For this subtask in both the Kyrgyz and Russian languages, each pupil received a booklet containing all letters in the alphabet. Upper- and lowercase letters were presented in random order to prevent recall and recitation from memory.

INITIAL LETTER SOUND (GRADE 2) The Initial Letter Sound (ILS) subtask was an assessment of phonemic awareness. A phoneme is the smallest, linguistically distinctive unit of sound that allows for differentiation of two words in a language. The 2000 National Reading Panel meta-analysis of the literacy research (conducted primarily on literacy in the English language) determined that skill in phoneme identification and phonological awareness is strongly associated with good reading comprehension. Phonemic awareness is the foundation for learning phonological awareness, a domain that includes skills in hearing and manipulating onsets, rimes, and syllables (Snow et al., 1998; NICHD, 2006). For the ILS subtask, the pupil booklet included a list of the 10 most frequently used letters in the Kyrgyz or Russian alphabets, randomly arranged. The frequency of letters in everyday use was determined during development of the subtask by text analysis and calculations of word count frequencies. The administrator read each word aloud twice and asked the pupils to make the first sound of the word. If a pupil did not answer within 3 seconds, a “no answer” response was recorded. The maximum score for this section was 10 points, with one point assigned for each correct answer.

FAMILIAR WORD RECOGNITION (GRADES 2 AND 4) The Familiar Word Recognition (FWR) subtask assessed the ability to recognize and read frequently occurring words. Frequency of words at both grades was determined through a

EGRA Midterm Data Analytic Report for the Kyrgyz Republic, September 2016 Page 44

word count analysis of the most commonly used words in textbooks of appropriate level. For this task, EGRA administrators were able to attain a measure of decontextualized decoding skills, which are distinct from reading comprehension skills (Gove, 2009). The FWR subtask examined whether pupils in Grades 2 and 4 were able to read aloud 40 familiar words at grade level. Unlike Oral Reading Fluency, this subtask featured a list of unrelated words that were not presented in the context of a story or a complete text. The words were randomly arranged in the pupil stimulus. The FWR tasks were scored on a timed, words-per-minute calculation in which the administrator determined the number of words that were attempted, the number that were read correctly, and the amount of time needed to correctly read the words in a 120-second period.

NONSENSE WORD RECOGNITION (GRADES 2 AND 4) The Nonsense Word Recognition (NWR) subtask assessed the ability of pupils to decode one- and two-syllable nonwords that could plausibly exist orthographically in the language in question. The NWR task provided a measure of decoding related to that of the Familiar Word Recognition task but had the advantage of ensuring that respondents were applying grapheme–phoneme correspondence rules to decode the nonwords and were not merely reciting a memorized word. According to Hirsch (2003), there is significant evidence that an overreliance on sight word vocabulary often leads to regression in reading development by age 9 or 10. For the NWR subtask, 40 nonwords were randomly arranged in a list in the pupil booklets for both grades, and participants were asked to read as many as they could. The subtask was graded on a timed, words-per-minute calculation in which the administrator determine the number of words that were attempted, the number that were read correctly, and the amount of time needed to correctly read the words during the 120-second task.

ORAL VOCABULARY (GRADES 2 AND 4) The Oral Vocabulary (OV) subtask examined whether pupils in Grades 2 and 4 were able to understand the meanings of familiar, spoken words at grade level. Factor analysis conducted by AIR on 2014 and 2015 subtask results and the results of other studies indicates that OV skill is a predictor of reading comprehension (Roth, Spencer, & Cooper, 2002; Share & Leiken, 2004). The OV task required receptive oral vocabulary knowledge as the administrator read aloud a list of 10 words, one word at a time. Pupils were presented with a set of four pictures for each word read aloud and were asked to identify the picture that best matched the spoken word. This is a commonly administered instrument for assessing receptive oral vocabulary knowledge based on the Peabody Picture Vocabulary Test (PPTV-R) (Dunn & Dunn, 1981). Raw scores (one point per correct answer) were calculated and converted to a percentage score for this subtask.

EGRA Midterm Data Analytic Report for the Kyrgyz Republic, September 2016 Page 45

ORAL READING FLUENCY WITH READING COMPREHENSION (GRADES 2 AND 4) Reading with comprehension is complex endeavor that involves both extracting and constructing meaning from text. Assessing comprehension is challenging, especially when using large texts that are not developmentally appropriate for pupils of a young age. As noted earlier in reference to the empirical investigations conducted on data from the Kyrgyz Republic in previous EGRA iterations, both decoding and linguistic comprehension skills are important for reading with comprehension. Oral Reading Fluency can best be understood as the ability to read with speed, accuracy, and proper expression. The purpose of the timed Oral Reading Fluency (ORF) subtask was to examine whether pupils in Grades 2 and 4 were able to read a passage with speed and accuracy when grade-appropriate words (familiar words) were presented in the pupil booklets. The ORF subtask is “oral” in that pupils read the passage aloud. Oral reading was assessed because empirical studies in many contexts have demonstrated that there is a strong correlation between oral fluency and reading comprehension (Fuchs, Fuchs, Hosp, & Jenkins, 2001). Although ORF is considered an important precursor to reading comprehension, fluency alone is not an indicator of reading comprehension; nonetheless, it is an important foundational skill. Some studies have demonstrated that as the reader progresses, the importance of ORF declines in relation to the importance of linguistic comprehension (Yovanoff, Duesbery, Alonzo, & Tindall, 2005). This finding is in alignment with AIR’s factor analyses of the data, which indicate that linguistic comprehension tasks account for a greater proportion of variance in reading comprehension than tasks that load highly on decoding, such as ORF. In 2016, the ORF subtask included paragraphs with 41 words (Kyrgyz Grade 2), 48 words (Russian Grade 2), 78 words (Kyrgyz Grade 4), and 91 words (Russian Grade 4). During the subtask design, test developers conducted textbook reviews to determine what words could be considered grade appropriate. The subtask was scored on a words-per-minute calculation in which the administrator determined the number of words that were attempted and the number of words that were read correctly over a 120-second period. The total number of words read at minute 1 and minute 2 also was collected. The Reading Comprehension subtask, which relied on questions about the text that was read in the ORF subtask, assessed pupils’ understanding of the text along with their ability to answer factual questions and make inferences based on what they read. After a pupil completed the ORF subtask, the administrator moved to the Reading Comprehension task, which consisted of a series of questions about the passage that had just been read. This subtask consisted of four questions for Kyrgyz Grade 2, five questions for Russian Grade 2, five questions for Kyrgyz Grade 4, and five questions for Russian Grade 4.

LISTENING COMPREHENSION (GRADES 2 AND 4) Research indicates that the ability to correctly understand and interpret oral stimuli (linguistic comprehension) and make meaning from what is heard is a core skill related to reading comprehension. (Hoover & Gough, 1990; Kamhi & Catts, 1991). This skill also draws on

EGRA Midterm Data Analytic Report for the Kyrgyz Republic, September 2016 Page 46

vocabulary knowledge, syntactic knowledge, and background knowledge. In this EGRA subtask, the pupil demonstrated Listening Comprehension by answering several questions from a simple oral story (series of sentences) read aloud by the administrator (an interactive situation). According to ’Maggio (1986), some of the core dimensions of listening involve retaining parts of language in short-term memory, discriminating among distinctive sounds, detecting key ideas, and deducing meaning from context. The subtasks included a paragraph of approximately 40 words for Grade 2 and approximately 80 words for Grade 4. The test administrator read the passage aloud only once at a pace of about one word per second. For both grades and languages, there was a total of four questions per text.

DICTATION (GRADES 2 AND 4) Dictation is a commonly used pedagogical tool in the Kyrgyz Republic and throughout the former Soviet Union (Tvaruzkova & Shamatov, 2012). It is frequently employed to assess listening comprehension as well as writing (reproductive) ability. Pupils’ ability to hear sounds and to correctly recreate the letters and words that correspond to those sounds indicates alphabet knowledge and word formation skills. The Kyrgyz Republic subtask for this particular assessment was adopted from the EGRA main study, the specific design of which has been validated in other contexts (Denton, Ciancio, & Fletcher, 2006). Pupils were graded on spelling, size, symbols, capitalization, punctuation, spacing direction, and accuracy in vowel and consonant sounds. Each category had a total of two possible points for completeness, one point for partial credit, and zero points for incorrect answers. The Dictation subtask for Kyrgyz Grades 2 and 4 consisted of eight and 10 items, respectively; in the Russian versions, nine and 11 items, respectively. In Kyrgyz, the maximum possible scores for Grades 2 and 4 were 16 and 20, respectively. In Russian, the maximum possible scores were 18 and 22 for Grades 2 and 4, respectively.

EGRA Midterm Data Analytic Report for the Kyrgyz Republic, September 2016 Page 47

APPENDIX 4. SUBTASK DEVELOPMENT AND PILOTING

The EGRA has been customized to be linguistically and culturally appropriate for use in the Kyrgyz Republic. For the baseline, midterm, and end line, the USAID Quality Reading Project team adapted EGRA subtasks for the country context using a protocol for their localization. After reviewing Kyrgyz and Russian language primary grade reading standards, textbooks, and other related materials, item writers participated in item and subtask development workshops in which subtasks were developed for all three administrations. Subtask development was initiated with a 2-week workshop but required ongoing attention to developing quality items over the course of the past 3 years. After the subtasks were developed and all items were printed, along with detailed protocols and instructions for administrators, pilot tests were administered several months before the actual administration to collect data about the performance and quality of the items and subtasks in authentic assessment conditions. After the subtask data were collected, the initial psychometric properties of all subtask items were examined. Each item was reviewed and analyzed to ensure fairness and balance based on gender and other criteria. Item p-values (difficulty levels) were evaluated along with item-test correlation coefficients (item discrimination indices); in addition, reliability and factor analyses were performed where appropriate. Following the pilot, review committees examined several items from each of the language versions to identify potentially ambiguous items. Subtask items were replaced or edited as necessary, based on the reviewers’ recommendations. As noted in other sections, special attention was accorded to producing balanced test maps with matrices that indicated the location of common “anchor items” horizontally across all pilot versions in the same grade as well as vertically across grades. See Appendix 8 for more details on item equating for the baseline and subsequent EGRA iterations.

EGRA Midterm Data Analytic Report for the Kyrgyz Republic, September 2016 Page 48

APPENDIX 5. ADMINISTRATOR TRAINING From March 28 to April 1, 2016, the USAID Quality Reading Project team trained supervisors and regional monitoring and evaluation (M&E) coordinators (14 people) on the one-to-one EGRA administration and M&E instrument administration procedures (lesson observation, teacher interviews, librarian interviews, and parent interviews). The team members selected from various regions were also trained on the organization of the midterm EGRA, logistical issues, project protocols, and communications with the home office in Bishkek. Afterward, the supervisors and M&E coordinators conducted 5-day training sessions for the test administrators in the Batken, Naryn, Osh, and Issyk-Kul oblasts. The training for all EGRA administrators was delivered from April 4–8, 2016. In addition to the 14 supervisors, 90 test administrators (administrators) were trained. Although EGRA administration took place in different regions than in 2015, efforts were made to retain experienced administrators from the 2014 and 2015 EGRA administrations for quality reinforcement. The administration training sessions were conducted in two groups by language of subtask administration. The training sessions were conducted in accordance with AIR procedures and the USAID Toolkit for EGRA Administration. On the first day, participants were introduced to the Institutional Review Board (IRB) forms that outlined the main principles: respect, justice, privacy, confidentiality, and the right to refuse participation at any time. The first 2 days were devoted to learning the EGRA forms for Grades 2 and 4 and practicing subtask administration. The 2016 EGRA contained improvements such as the “early stop rule” in the event that no answers were given within the first ten items. Participants also received a refresher on how to use stopwatches and mark protocols correctly. Throughout the training participants had opportunities to practice EGRA administration through activities that included video observation, role-plays, whole-group practice, and work in pairs, often with a purposeful mix of veteran and new administrators. All sessions provided opportunities to discuss “what to do” in a variety of challenging scenarios. Significantly, during-school visits also were organized, affording the participants opportunities to practice administering the EGRA in conditions similar to those that likely will be encountered during data collection. The participants also practiced in groups and observed the same one-to-one administrations to determine the level of consistency in their ratings. The participants practiced two scenarios involving student sampling procedures described in the training manual. Over the next 2 days, the participants received a detailed overview of, and practiced working with, the other project M&E instruments, such as the lesson observation tool and the teacher, librarian, and parent surveys. Before the participants practiced using the lesson observation instrument, they observed a video of a Grade 3 reading lesson and then discussed it as a whole group. The participants filled in the observation forms and participated in a debriefing afterward.

EGRA Midterm Data Analytic Report for the Kyrgyz Republic, September 2016 Page 49

APPENDIX 6: ADMINISTRATION AND MONITORING About 90 administrators were deployed to collect data in 71 schools (35 treatment and 36 control schools) in the Cohort 2 and 3 regions (Batken, Naryn, Osh, and Issyk-Kul oblasts). Each team of four administrators was responsible for administering the assessment in two to four schools. Data collection commenced on April 11, 2016, and was completed on April 22, 2016. The NTC supervisors and project staff were mobilized to conduct monitoring visits to ensure proper administration and to provide troubleshooting and support as necessary. For data collection, the teams were provided with all the necessary equipment: stopwatches, pencils with erasers, EGRA forms (for Grades 2 and 4), M&E surveys and forms, and protocols. Upon arrival at the schools, team leaders introduced their teams to the school principal, explained the purpose and objectives of the midterm assessment, and thanked the school principal for the schools’ participation in the assessment. They emphasized that the purpose of their visits was not to evaluate the school, the principal, or the teachers and that all information collected would remain anonymous. They also provided the schools with fact sheets on the results of baseline assessment conducted in 2014. In turn, the school administrators were supportive and created the necessary conditions for successful administration of the assessment. The teams selected the pupil sample (Grades 2 and 4) for EGRA testing according to the procedures described in the manual. During the first week of assessment, the teams conducted one-on-one assessment (parallel testing) in pairs to simultaneously administer the EGRA to one student in Grade 2 and one student in Grade 4. One data collector administered the assessment while the other silently observed. This kind of testing ensured the quality of the data by assessing the consistency of marking. This evaluation took place in the first two schools during the first week. The supervisors visited each school in their respective regions. Observations of parallel testing, pupil sampling, pupil testing, and interviews with teachers, parents, and librarians were conducted. Support and consultations for the teams were provided on different issues during the data collection process. The EGRA and M&E forms were checked several times: at the supervisor, leader, and data collector levels and during cross-checking. During the midterm assessment, Matthew Jukes, senior education evaluation specialist at RTI, visited the schools in Osh region. held discussions with the team leaders, checked the student sampling for Grades 2 and 4, observed teacher interviews, conducted interviews with the data collectors, and checked some of the M&E surveys. He evaluated the work of the data collectors as “well organized and fantastic.”

EGRA Midterm Data Analytic Report for the Kyrgyz Republic, September 2016 Page 50

APPENDIX 7. DATA ANALYSIS METHODS Mean scores and standard deviations for each subtask are presented by language group, gender, and demographic location of school (urban or rural) for both grades. Because numerical differences in mean scores can be misleading, score differences were tested for statistical significance by conducting independent samples t-tests using Stata software. The t-test assumes a null hypothesis of equality of means between groups under study (e.g., male/female). Because tests for statistical significance frequently result in the rejection of the null hypothesis when the sample sizes are large, an effect size measure was also estimated to determine whether or not there was any practical significance of the differences in mean scores estimated (Cohen, 1992). An iteration of what is frequently referred to as Cohen’s d was employed to avoid overestimation of any differences between groups that could, in fact, be the result of a statistical artifact. Cohen’s d is estimated using the following formula:    d  1 2

(n1 1)1  (n2 1) 2

n1  n2  2

Where:

 = the mean of the Samples, 1 and 2

n = the number in Samples 1 and 2

 = the variance in Samples 1 and 2

Cohen’s d is a standardized measure of effect size that can be applied to weighted samples and yet reports on a standard, recognizable scale (in practice, it reports the distance that two means are from each other in standard deviation terms). The effect size values (determined by Cohen) are as follows:

.2 (small effect)

.5 (medium effect)

.8 (large effect)

If the null hypothesis of no difference was retained, there was no need to calculate the effect size measure. For brevity, Cohen’s d is referenced as “D” in the data tables. Because many analyses were run to interpret the subtask data, model conditions (null hypotheses) for each analysis presented earlier was not repeated in the text: The null hypothesis in all the statistical tests was that the means were equal across groups (male and female, urban and rural). Statistical significance at the .05 and .01 levels were reported in the tables throughout the report.

EGRA Midterm Data Analytic Report for the Kyrgyz Republic, September 2016 Page 51

APPENDIX 8. EQUATING SUBTASKS

ITEM-RESPONSE THEORY APPROACHES The differences in the types of EGRA subtasks required the use of different equating approaches. Common items approaches using item response theory (IRT) can be employed for non-timed subtasks such as Initial Letter Sound and Oral Vocabulary.18 Listening Comprehension, however, does not contain common items because listening texts and corresponding questions are different across administrations. This challenge was resolved by employing an “external anchor” equating method for this subtask. Because the Listening Comprehension and Oral Vocabulary subtasks tap into the same latent construct of linguistic comprehension (see Appendix 2: Factor Analysis), items on the Oral Vocabulary subtask served as external anchor items for equating the Listening Comprehension subtasks across 2014 and 2016. The first step in this equating process entailed locating the commonly shared items on both the 2014 and 2016 test maps that were created at the EGRA design stage. A two-parameter IRT model to equate across testing years was employed. A check of the stability of anchor- item parameters was followed by a fixed-parameter calibration method that placed the parameters of the 2016 test on the reference scale of the 2014 test. IRT calibration of items was carried out with PARSCALE software. Some items were removed from the analysis because of zero variance (i.e. all answers on a given item were either correct or incorrect). Finally, a score equivalence table was generated that provided 2016 Listening Comprehension scores equated with 2014 scores.

CLASSICAL EQUATING FOR NONEQUIVALENT GROUPS The other critical subtask with no common items was Oral Reading. Because this was a timed subtask, it could not be equated using the same IRT approach described for Listening Comprehension in the previous section (Kolen & Brennan, 1999). Instead, classical linear equating for nonequivalent groups was carried out using a procedure proposed by Algina and Crocker (2006). The approach also employs “external anchor items” (as described above), with the Familiar Word Recognition subtask serving as the anchor test. The rationale for employing the Familiar Word Recognition subtask as an anchor test for Oral Reading Fluency was that both these subtasks loaded on the same factor, decoding, across all four subtests.19 In other words, both of these subtasks were assessing the same skill. In this design, the two groups (2014 Y, 2016 X) do not necessarily need to be formed by random assignment. A linear equating transformation was established on the basis of the regressions between an anchor test and each of the two equated test forms. The single anchor test used in both groups contained 37% common items across 2014 and 2016 for all subtask

18 The subtasks Nonsense Words and Dictation were not equated as described in the text. 19 As noted in the limitations, post-test investigation revealed that this assumption may have been violated for Russian Familiar Words Grade 4; therefore, the results of the equating for this grade and language are not presented.

EGRA Midterm Data Analytic Report for the Kyrgyz Republic, September 2016 Page 52

forms. This approach assumed the strict equivalence of the Familiar Word Recognition forms. The fundamental assumption made in linear equating is that the slope, intercept, and standard error of estimate for the regression of Oral Reading Fluency on Familiar Word Fluency in Subpopulation 1 are equal to the slope, intercept and standard error of estimate for the regression on Oral Reading Fluency on Familiar Word Fluency in the total population. With the mean scores, standard deviations, and the unstandardized regression coefficients (B) of Oral Reading Fluency and Familiar Word Fluency for both years, a linear transformation was established that allowed equating of the 2014 and 2016 instruments and the derivation of 2016 results calculated in 2014 terms. The formula for equating (Algina & Crocker, 2006, p. 458) was as follows:

Y* = a (X – C) + d,

Where:

And:

b = unstandardized regression coefficient of Oral Reading Fluency on Familiar Word Fluency

EGRA Midterm Data Analytic Report for the Kyrgyz Republic, September 2016 Page 53

APPENDIX 9. RELATIONSHIP BETWEEN ORF AND READING COMPREHENSION

Figure 8: Kyrgyz Grade 2 Reading Fluency by Reading Comprehension Score, 2016

150

Correct100 words per minute

50

0 0 25 50 75 100 excludes outside values Reading comprehension percent correct

Figure 9: Kyrgyz Grade 4 Reading Fluency by Reading Comprehension Score, 2016

150

Correct words per minute 100

50

0 0 20 40 60 80 100 excludes outside values Reading comprehension percent correct

EGRA Midterm Data Analytic Report for the Kyrgyz Republic, September 2016 Page 54

Figure 10: Russian Grade 2 Reading Fluency by Reading Comprehension Score, 2016

100

80 Correct words per minute

60

40

20

0 0 20 40 60 80 100 excludes outside values Reading comprehension percent correct

Figure 11: Russian Grade 4 Reading Fluency by Reading Comprehension Score, 2016

150

Correct words per minute 100

50

0 0 20 40 60 80 100 excludes outside values Reading comprehension percent correct

EGRA Midterm Data Analytic Report for the Kyrgyz Republic, September 2016 Page 55

APPENDIX 10. 2016 RESULTS BY TREATMENT/CONTROL STATUS

Grade 2 Kyrgyz Treatment (n=539) Control (n=550) Total (n=1,089) Mean Std. Dev. Mean Std. Dev. Mean Std. Dev. Letter Name Recognition (letters per minute) 67.1 (19.1) 69.5 (18.0) 67.6 (18.9) Initial Letter Sound (percent correct) 94.1 (11.0) 94.7 (9.0) 94.2 (10.6) Familiar Words (words per minute) 59.3 (28.4) 60.7 (29.2) 59.6 (28.5) Nonsense Words (words per minute) 27.7 (14.0) 28.3 (13.5) 27.8 (13.9) Oral Vocabulary (percent correct) 92.6 (9.5) 91.3 (10.3) 92.3 (9.6) Oral Reading Fluency (words per minute) 37.6 (18.7) 38.8 (20.1) 37.8 (18.9) Reading Comprehension (percent correct) 55.3 (30.2) 45.0 (30.7) 53.4 (30.6) Listening Comprehension (percent correct) 75.4 (23.6) 65.2 (27.4) 73.5 (24.6) Dictation (percent correct) 71.0 (24.9) 71.3 (24.3) 71.0 (24.8)

Grade 4 Kyrgyz Treatment (n=539) Control (n=550) Total (n=1,089) Mean Std. Dev. Mean Std. Dev. Mean Std. Dev. Familiar Words (words per minute) 91.1 (32.0) 84.7 (36.4) 89.9 (33.0) Nonsense Words (words per minute) 29.4 (15.8) 27.9 (17.9) 29.1 (16.2) Oral Vocabulary (percent correct) 96.0 (5.2) 94.5 (7.4) 95.7 (5.7) Oral Reading Fluency (words per minute) 82.3 (29.8) 77.6 (34.7) 81.4 (30.8) Reading Comprehension (percent correct) 80.6 (23.7) 71.4 (29.6) 78.8 (25.2) Listening Comprehension (percent correct) 73.1 (27.1) 64.2 (29.9) 71.4 (27.9) Dictation (percent correct) 83.7 (20.2) 78.4 (24.7) 82.7 (21.2)

EGRA Midterm Data Analytic Report for the Kyrgyz Republic, September 2016 Page 56

Grade 2 Russian Treatment (n=140) Control (n=160) Total (n=300) Mean Std. Dev. Mean Std. Dev. Mean Std. Dev. Letter Name Recognition (letters per minute) 67.9 (28.7) 56.2 (25.0) 65.8 (28.3) Initial Letter Sound (percent correct) 90.4 (19.2) 83.8 (24.8) 89.2 (20.5) Familiar Words (words per minute) 67.9 (30.2) 51.8 (25.8) 65.0 (30.0) Nonsense Words (words per minute) 31.4 (13.0) 24.7 (12.3) 30.2 (13.1) Oral Vocabulary (percent correct) 85.5 (15.5) 78.1 (16.6) 84.2 (15.9) Oral Reading Fluency (words per minute) 54.1 (24.0) 40.7 (23.6) 51.7 (24.4) Reading Comprehension (percent correct) 40.0 (29.4) 25.5 (29.9) 37.4 (30.0) Listening Comprehension (percent correct) 72.3 (31.7) 46.5 (32.9) 67.6 (33.4) Dictation (percent correct) 78.1 (22.6) 66.8 (28.2) 76.0 (24.1)

Grade 4 Russian Treatment (n=539) Control (n=550) Total (n=1,089) Mean Std. Dev. Mean Std. Dev. Mean Std. Dev. Familiar Words (words per minute) 107.3 (34.6) 94.7 (37.0) 104.9 (35.3) Nonsense Words (words per minute) 36.2 (14.5) 32.5 (13.7) 35.5 (14.4) Oral Vocabulary (percent correct) 89.2 (10.6) 84.7 (16.3) 88.4 (12.0) Oral Reading Fluency (words per minute) 97.4 (31.2) 82.0 (28.7) 94.6 (31.3) Reading Comprehension (percent correct) 50.5 (32.1) 41.3 (30.1) 48.8 (31.9) Listening Comprehension (percent correct) 78.5 (33.6) 74.0 (31.2) 77.6 (33.1) Dictation (percent correct) 87.4 (15.9) 84.2 (16.3) 86.8 (16.0)

EGRA Midterm Data Analytic Report for the Kyrgyz Republic, September 2016 Page 57

REFERENCES

Algina, & Crocker (2006). Introduction to Classical & Modern Test Theory. Thompson Wadsworth. .S.A.

American Institutes for Research. (2014, October). USAID Quality Reading Project: Kyrgyzstan: Early Grade Reading Assessment (EGRA) Baseline Data Report. Washington, DC: United States Agency for International Development.

American Institutes for Research (2015, October). USAID Quality Reading Project: Kyrgyzstan: Early Grade Reading Assessment (EGRA) Midterm Analytic Report. Washington, DC: United States Agency for International Development.

Brunner, J., & Tillett, A. (2007). Higher education in Central Asia: The challenges of modernization (case studies from Kazakhstan, Tajikistan, the Kyrgyz Republic and Uzbekistan). Washington, DC: The International Bank for Reconstruction and Development/The World Bank.

Chiappe, P., Siegel, L. S., Wade-Woolley, L. (2002). Linguistic diversity and the development of reading skills: A longitudinal study. Scientific Studies of Reading, 6(4), 369–400.

Cizek, G., & Bunch, M. (2007). Standard setting: A guide to establishing and evaluating performance standards on tests. Thousand Oaks, CA: Sage.

Center for Educational Assessment and Teaching Methods. (2009a). PISA 2009: International survey of reading literacy for 15 years old. Retrieved from http://www.testing.kg/ru/projects/pisa2009/

Center for Educational Assessment and Teaching Methods. (2009b). Natsional’noye otsenivanie obrazovatel’nix dostizhenii uchashixsya. [National Assessment of Educational Quality]. Bishkek: CEATM. Retrieved from http://www.testing.kg/ru/projects/NSBA2009/

Center for Educational Assessment and Teaching Methods. (2010). Rezul’tati obsherespublikanskova testirovaniya i zachisleniya na grantovie mesta vuzov goda Kirgizskoi Respubliki v 2010 godu. [Results of national scholarship testing and enrollment in university grant places in the Kyrgyz Republic in 2010]. Bishkek: CEATM. Retrieved from www.testing.kg

Center for Educational Assessment and Teaching Methods. (2014). Natsional’noye otsenivanie obrazovatel’nix dostizhenii uchashixsya. [National Assessment of Educational Quality]. Bishkek: CEATM. Retrieved from http://testing.kg/ru/nashi- proekty/noodu/rezultaty-issledovanija-noodu-2014-goda.html

Cohen, J. (1992). A power primer. Psychological Bulletin, 112, 155–159.

EGRA Midterm Data Analytic Report for the Kyrgyz Republic, September 2016 Page 58

Denton, C., Ciancio, D., & Fletcher, J. (January–March 2006). Validity, reliability, and utility of the Observation Survey of Early Literacy Achievement. Reading Research Quarterly, 41(1), 8–34. Denton, C. A., Ciancio, D. J., & Fletcher, J. M. (2006). Validity, reliability, and utility of the observation survey of early literacy achievement. Reading Research Quarterly, 41(1), 8–34.

De Young, A., Reeves, M., & Valyaeva, G. (2006). Surviving the transition? Case studies and schooling in the Kyrgyz Republic since independence. Greenwich, CT: Information Age Publishing.

Drummond, T. (2011). Predicting differential item functioning in cross-lingual assessments: The case of a high stakes admissions test in the Kyrgyz Republic. (Doctoral dissertation). Michigan State University, East Lansing, MI. Dubeck, M. M., & Gove, A. (2015). The early grade reading assessment (EGRA): Its theoretical foundation, purpose, and limitations. International Journal of Educational Development, 40, 315–322. http://dx.doi.org/10.1016/j.ijedudev.2014.11.004

Dunn, Lloyd M., & Dunn, Leota M. (1981). Manual for the Peabody Picture Vocabulary Test–Revised. Circle Pines, MN: American Guidance Service.

Fuchs, L., Fuchs, D., Hosp, K., & Jenkins, J. (2001). Oral reading fluency as an indicator of reading competence: A theoretical, empirical, and historical analysis. Scientific Studies of Reading, 5(3), 239–256

Gove, A. (2009). Early grade reading assessment toolkit. RTI International.

Gove, A., & Wetterberg, A. (2011). The Early Grade Reading Assessment: An introduction. In A. Gove & A. Wetterberg (Eds.), The Early Grade Reading Assessment: Applications and interventions to improve basic literacy (pp. 1–37). Research Triangle Park, NC: RTI Press. http://www.rti.org/pubs/bk-0007-1109-wetterberg.pdf

Hirsch, E. D., Jr. (2003, Spring). Reading comprehension requires knowledge of words and the world: Scientific insights into the fourth-grade slump and the nation’s stagnant comprehension scores. American Educator. Retrieved from http://www.aft.org/sites/default/files/periodicals/Hirsch.pdf

Hoover, W., & Gough, B. (1990). The simple view of reading. Reading and Writing: An Interdisciplinary Journal 2, 127–160. The Netherlands: Kluwer Academic Publishers.

Jusayeva, V. (2004). Kyrgyz grammar supplement. U.S. Peace Corps Kyrgyz Republic. Bishkek, Kyrgyz Republic.

EGRA Midterm Data Analytic Report for the Kyrgyz Republic, September 2016 Page 59

Kamhi, A.G., & Catts, H. W. (1991). Language and reading: Convergences, divergences, and development. In A. G. Kamhi & H. W. Catts (Eds.), Reading disabilities: A developmental language perspective (pp. 1–34).Toronto, Ontario, Canada: Allyn & Bacon.

Kim, J., & Mueller, C. (1978). Introduction to factor analysis: What it is and how to do it. Quantitative Applications in Social Sciences. Sage University Press.

Kolen, M. J., & Brennan, R. L. (1999). Test equating: Methods and practices. New York, NY: Springer.

McBride-Chang, C. & Ho, C. S.-H. (2005). Predictors of beginning reading in Chinese and English: A 2-year longitudinal study of Chinese kindergarteners. Scientific Studies of Reading, 9, 117–144.

McBride-Chang, C., & Kail, R. (2002). Cross-cultural similarities in the predictors of reading acquisition. Child Development, 73, 1392–1407.

National Institute of Child Health and Human Development (NICHD). (2006). Report on the findings of the national reading panel. Retrieved from https://www.nichd.nih.gov/publications/ pubs/nrp/Pages/findings.aspx

O’Maggio, A. (1986). A proficiency-oriented approach to listening and reading. In A. O’Maggio (Ed.), Teaching Language in Context, (pp. 121–174). Boston, MA: Henile & Heinle.

The Organization for Economic Cooperation and Development. (2010). Kyrgyz Republic 2010: Lessons from PISA, OECD Publishing. Retrieved from http://www.keepeek.com/Digital-Asset-Management/oecd/education/reviews-of- national-policies-for-education-kyrgyz-republic-2010_9789264088757-#page1

Patrinos, H. A, & Velez, E. (2009). Costs and benefits of bilingual education in Guatemala: A partial analysis. International Journal of Educational Development, 29(6), 594–598.

Roth, F. P., Speece, D. L., & Cooper, D. H. (2002). A longitudinal analysis of the connection between oral language and early reading. Journal of Educational Research, 95, 259– 272.

RTI International (2015). Early Grade Reading Assessment (EGRA) Toolkit, Second Edition. Washington, DC: United States Agency for International Development.

Shamatov, D., & Niyozov, S. (2010). Teachers surviving to teach: Implications for post- soviet education and society in Tajikistan and Kyrgyzstan. In J. Zajda (Ed.), Globalization, Ideology and Education Policy Reforms.

EGRA Midterm Data Analytic Report for the Kyrgyz Republic, September 2016 Page 60

Share, D. L. (2008). On the Anglocentricities of current reading research and practice: The perils of overreliance on an "outlier" orthography. Psychological Bulletin, 134(4), 584–615.

Share, D. L., & Leikin, M. (2004). Language impairment at school entry and later reading disability: Connections at lexical versus supralexical levels of reading. Scientific Studies of Reading, 8, 87–110.

Snow, C. E., Burns, M. S., & Griffin, P. (Eds.). (1998). Preventing reading difficulties in young children. Prepared on behalf of the Committee on the Prevention of Reading Difficulties in Young Children under Grant No. H023S50001 of the National Academy of Sciences and the U.S. Department of Education. Washington, DC: National Academy Press.

Snow, C., & the RAND Reading Study Group. (2002). Reading for understanding: Toward an R&D program in reading comprehension. Research prepared for the Office of Educational Research and Improvement (OERI), U.S. Department of Education. Santa Monica, CA: RAND Corporation.

Spencer, L. H., & Hanley, J. R. (2003). Effects of orthographic transparency on reading and phoneme awareness in children learning to read in Wales. British Journal of Psychology, 94(1), 1–28.

Seymour, P. H. K., Aro, M., & Erskine, J. M. (2003). Foundation literacy acquisition in European orthographies. British Journal of Psychology, 94, 143–174.

Tvaruzkova, M., & Shamtov, D. (2012). Review of early grade teaching and skills: The Kyrgyz Republic and Tajikistan. Aguirre Division of JBS International, Inc., with support from the United States Agency for International Development (USAID).

United Nations Children’s Fund (UNICEF). (2005). Monitoring learning achievement: Nationwide study of the quality of education in primary schools. Vaughn, S., & Linan-Thompson, S. (2004). Research-based methods of reading instruction grades K-3. Alexandria, VA: Association for Supervision and Curriculum Development.

Yovanoff, P., Duesbery, L., Alonzo, J., & Tindall, G. (2005). Grade-level invariance of a theoretical causal structure predicting reading comprehension with vocabulary and oral reading fluency. Educational Measurement, Fall, 4–12.

EGRA Midterm Data Analytic Report for the Kyrgyz Republic, September 2016 Page 61

U.S. Agency for International Development 1300 Pennsylvania Avenue, NW Washington, DC 20523 Tel: (202) 712-0000 Fax: (202) 216-3524 www.usaid.gov