TECHNICAL MANUAL—GRADES 3–12

READING ASSESSMENT for PRESCRIPTIVE INSTRUCTIONAL DATA

REV. JULY.REV. 2019 24, v .5.0 LEXIA® RAPIDTM ASSESSMENT Technical Manual for RAPID Version 5.0: Grades 3-12

© 2019 Lexia Learning, a Rosetta Stone Company. Lexia®, RAPID™, and other trademarks, names, and logos used herein are the property of Rosetta Stone Ltd. and/or its subsidiaries, and are registered and/or used in the United States and other countries. METAMETRICS® and LEXILE® are trademarks of MetaMetrics, Inc., and are registered in the United States and abroad. Additional trademarks included herein are the property of their respective owners.

Information in this document is subject to change without notice and does not represent a commitment on the part of Lexia Learning Systems LLC. The software described in this document is furnished under a license agreement or non-disclosure agreement. The software may be used only in accordance with the terms of the agreement. It is against the law to copy the software on any medium except as specifically allowed in the license or non-disclosure agreement. No part of this manual may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopying and recording, for any purpose without the express written permission of Lexia Learning Systems LLC.

www.lexialearning.com 300 Baker Avenue • Suite 320 • Concord, MA 01742 TECHNICAL MANUAL FOR RAPID V5.0: GRADES 3-12 tel: 800-435-3942 / 978-405-6200 fax: 978-287-0062 Contents Acknowledgments...... 5 Chapter 1 Introduction...... 6 Purpose...... 7 Chapter 2 3–12 Tasks...... 8 Task Rules...... 8 Screening Assessment Periods...... 8 Recommended Comprehension Tasks...... 9 Screening Task Descriptions...... 9 Task...... 9 Knowledge Task...... 10 Syntactic Knowledge Task...... 10 Comprehension Task...... 11 Chapter 3 Score Definitions...... 12 Reading Success Probability...... 12 Percentile Ranks...... 13 Performance Scores...... 13 Lexile Measures...... 14 Description of Sample...... 14 Study Design...... 15 Reported Lexile Reader Measures and Lexile Text Ranges...... 16 Chapter 4 Content and Item Development...... 17 Guidelines for Retaining Items...... 18 Chapter 5 Psychometric Development...... 20 Item Norming Sample...... 20 Performance Score Alignment Sample...... 22 Vertical Scaling and Data Collection Design...... 22 Item Response Theory...... 24 IRT and Its Role in Computer Adaptive Testing ...... 27 Chapter 6 Reliability...... 30 Marginal Reliability...... 30 Standard Error of Measurement...... 31 Test-Retest Reliability...... 31 Chapter 7 Validity...... 32 Evidence Based on the Internal Structure...... 32 Evidence Based on Relations to Other Variables...... 34 Concurrent Validity...... 34 Predictive Validity...... 34 Differential Accuracy of Prediction...... 36 Construct Validity...... 37

ii TECHNICAL MANUAL FOR RAPID V5.0: GRADES 3-12

Convergent Validity...... 37 Word Recognition...... 37 Vocabulary Knowledge...... 38 Syntactic Knowledge...... 38 ...... 38 Discriminant Validity...... 40 Growth Reference Charts...... 40

References...... 41 Appendix A...... 45 Appendix B...... 50 Appendix C...... 56 Appendix D...... 79

iii TECHNICAL MANUAL FOR RAPID V5.0: GRADES 3-12 Acknowledgments The Technical Manual for Lexia RAPID Assessment Version 5.0: Grades 3-12, the Lexia Reading Assessment for Prescriptive Instructional Data was written by the following authors in collaboration with Lexia Learning:

Barbara R. Foorman, Ph.D.

Yaacov Petscher, Ph.D.

Chris Schatschneider, Ph.D.

Florida Center for Reading Research Florida State University

The items, dynamic flow, computer-adaptive task flow and placement algorithms, creation of the development application, and psychometric work for this component skills battery were funded by grants from the Institute of Education Sciences (IES) to Florida State University:

• Institute of Education Sciences, USDOE (R305F100005; $4,447,990), entitled “Assessing Reading for Understanding: A Theory-Based, Developmental Approach,” subcontract to the Educational Testing Service for five years, 7/1/10-6/30/15 (Foorman, PI on subcontract; Petscher and Schatschneider, Co-Is).

• Institute of Education Sciences, USDOE (R305A100301; $1,499,741), entitled “Measuring Reading Progress in Struggling Adolescents,” awarded for four years, 3/1/10-2/28/14. (Foorman, PI; Petscher and Schatschneider, Co-Is).

The RAPID authors and Lexia Learning would like to acknowledge the following individuals for their leadership in executing the work funded by the above two IES grants: Dr. Adrea Truckenmiller, Karl Hook, and Nathan Day. We would like to thank the numerous school districts, administrators, and teachers who participated in the research funded by these two grants. We would also like to acknowledge the Lexia Content, Development, and Quality Assurance teams for their continued work on RAPID.

5 TECHNICAL MANUAL FOR RAPID V5.0: GRADES 3-12

Chapter 1 Introduction Comprehension of written text is a complex process that requires the coordination of a number of skills (Hudson, Pullen, Lane, & Torgesen, 2009; Scarborough, 2001). Students must be able to identify sounds, letters, and their correspondence to one another in order to decode words. When reading multisyllabic words or those with non-English origins, awareness of common spelling patterns and word parts also becomes essential (Henry, 2010). Beyond word recognition, understanding word meanings is crucial to comprehending what is read. Grasping the meaning of a word often requires more than knowing its definition in a particular passage. Knowing the meaning of a word translates to knowing its full lexical entry in a dictionary: pronunciation, spelling, multiple meanings in a variety of contexts, synonyms, antonyms, idiomatic use, related words, etymology, and morphological structure. Further, understanding the syntax of and the rules that govern how words are ordered to make meaningful sentences is also important for comprehension (Cain & Nash, 2011; Crosson & Lesaux, 2013). These word recognition, language, and comprehension processes must coalesce to result in skilled comprehension.

By third grade and beyond, extracting relevant meaning from text becomes more challenging for students as they encounter sources that are increasingly complex and diverse (Nagy & Anderson, 1984). To support students’ reading development, assessment methods must measure the foundational and underlying components of reading, monitor changes in student skills over time, and provide instructionally actionable intervention strategies. A consistent goal of assessment in reading is to provide useful information for instructional decision-making, while minimizing unnecessary, time- consuming testing (Torgesen, 2006). While educators and researchers acknowledge this goal, the need persists for precise screening and diagnostic tools of reading, particularly at the secondary level (Hock et al., 2009).

The Lexia Reading Assessment for Prescriptive Instructional Data (RAPID) was designed to meet this goal by answering meaningful questions for teachers and administrators about students’ performance and providing instructional strategies to support students’ progress in reading comprehension. Ultimately, teachers need to know which students are likely to be successful or at-risk for difficulty, what areas to target to support at-risk students, and how to group students based on their instructional needs.

RAPID consists of computer-administered adaptive screening tasks that measure development over time and create a profile of student performance

6 TECHNICAL MANUAL FOR RAPID V5.0: GRADES 3-12 in different skill areas. RAPID for grades 3-12 assesses complex knowledge, understanding, and application of word recognition, academic language, and reading comprehension skills using four tasks. Each screening task in RAPID is independently measured on its own scale, allowing for sensitive measurement of changes in student performance.

By including tasks that target vocabulary and syntactic knowledge across the grade levels, RAPID has excellent construct coverage of academic language, which has been shown to account for significant individual differences in reading comprehension in grades one through ten (Foorman, Herrera, Petscher, Mitchell, & Truckenmiller, 2014; Foorman, Koon, Petscher, Mitchell, & Truckenmiller, 2015). Computer administration allows for a high degree of standardization while adaptability in the items based on individual student performance allows for a highly reliable score to be reached in a relatively short amount of time, minimizing the amount of time needed for each task.

PURPOSE The Standards for Educational and Psychological Testing (AERA, APA, & NCME, 2014) require test developers and publishers to produce a technical manual that provides information documenting the technical quality of an assessment and its items.

Therefore, the purpose of this manual is to provide detailed information on the content, item, and psychometric development of RAPID for grades three through twelve. It provides a description of the tasks students will take and an explanation of each score that the assessment generates. It also describes the normative sample, it outlines the data collection design utilized to estimate item parameters, and it describes the computer adaptive process. Multiple measures of reliability and sources of validity evidence are also documented. Similar information for RAPID for grades kindergarten to two can be found in the Technical Manual for RAPID Version 4.0: Grades K-2.

The information included in this manual is intended for use by those who evaluate tests. While this manual describes RAPID tasks, scores, and psychometric features, it does not provide instructions on administration procedures or interpretation of reports. Please consult the User Guide for RAPID Version 4.0 for in-depth administration guidelines.

7 TECHNICAL MANUAL FOR RAPID V5.0: GRADES 3-12

Chapter 2 3-12 Tasks

TASK RULES For students in grades three through twelve, RAPID consists of four required screening tasks. Each task is computer-administered and provides unique information regarding a student’s skills. The first three tasks—Word Recognition, Vocabulary Knowledge, and Syntactic Knowledge—are item- adaptive. Stop rules determine when administration of each task is complete:

1. The student has responded correctly to the first 8 items. 2. The student has responded incorrectly to the first 8 items. 3. A reliable estimate of the student’s abilities has been reached (i.e., standard error is less than 0.42). 4. The student has responded to 30 items. Student placement within the last task, Reading Comprehension, is based on performance on the prior three adaptive tasks. Based on their performance on the adaptive tasks, students are placed in a passage at their reading level. Students may receive one or more passages on the Reading Comprehension task. Stop rules determine when administration of the Reading Comprehension task is complete:

5. The student has completed three passages. 6. A reliable estimate of the student’s abilities has been reached (i.e., standard error is less than 0.42). At subsequent administrations of RAPID within the same school year, the student’s prior scores on each of the four tasks will determine the initial set of items administered at that administration period.

Screening Assessment Periods To provide a guideline for the timeframes when students should be tested, there are three Assessment Periods for RAPID.

The three periods are bound by these dates for the 2019-2020 School Year:

Assessment Period Dates Fall July 13, 2019 - November 30, 2019 Winter December 1, 2019 - February 29, 2020 Spring March 1, 2020-mid-summer* *this date will be set during the 2019–2020 school year to accomodate the annual release of RAPID.

8 TECHNICAL MANUAL FOR RAPID V5.0: GRADES 3-12

Recommended Comprehension Tasks The Sentence Comprehension task is also item-adaptive and begins with the same five items regardless of student grade level. Stop rules determine when administration of this task is complete: 1. The student has responded correctly to the first 8 items.

2. The student has responded incorrectly to the first 8 items.

3. A reliable estimate of the student’s abilities has been reached (i.e., standard error is less than 0.42).

4. The student has responded to 30 items.

At subsequent administrations of the tasks within the same school year, the student’s prior score on that task will be used to determine the initial set of items administered to the student at that administration period. In Passage Comprehension, students are placed into a passage based on their performance on the Word Reading task. Each passage has five comprehension questions that do not depend on student grade level. A student completes Passage Comprehension once they has successfully:

1. Read one passage without invoking a passage Stop Rule. 2. Answered five comprehension questions. Refer to the Passage Comprehension section for a detailed explanation of passage levels and stop rules.

SCREENING TASK DESCRIPTIONS Word Recognition Task In this item-adaptive task, the student listens to a word and identifies its written form. The screen displays a drop-down menu with the correctly spelled word and two distractor choices that are spelled incorrectly.

The student may replay the audio for the word up to three times. The student has unlimited time to respond to each item.

This task incorporates (sound identification), decoding (reading), and encoding (spelling) processes by requiring the student to listen and identify the correct word among a series of choices. About 10% of target words are nonsense words in order to tap phonological decoding skills. When the target is a real word, distractors tap orthographic knowledge. For example, a distractor for “prerogative” might be perogative. By tapping orthographic

9 TECHNICAL MANUAL FOR RAPID V5.0: GRADES 3-12 knowledge in this task, the quality of a student’s lexical representation for a printed word is assessed. The more complete and accurate the lexical representation of a word is, the more efficient the student’s word recognition and reading comprehension (Perfetti & Stafura, 2014).

Vocabulary Knowledge Task In this item-adaptive task, the student reads a sentence that is missing a word. The student selects the word that best completes the sentence among three choices. The student has unlimited time to respond to each item. This task only includes audio for the directions section.

Beyond addressing vocabulary breadth, this task also addresses depth of word knowledge. The distractor and target words in this task vary in their morphological structure (i.e., prefixes or suffixes consisting of inflectional or derivational ). For example, “He likes to [collector, collect, collection] coins.” By tapping knowledge of common prefixes and suffixes, this task relates to a subdomain of word knowledge that connects to the ability to understand new words when they are encountered in print. Student performance on the Vocabulary Knowledge task accounted for up to 9% unique variance beyond prior reading comprehension, text reading efficiency, and spelling when predicting student’s end-of-year reading comprehension (Foorman, Petscher, & Bishop, 2012).

Syntactic Knowledge Task In this item-adaptive task, the student listens to a sentence or set of sentences that is missing a word or phrase. The screen displays the sentence(s) for the student to read. The student must choose the word or phrase that best completes the sentence from three choices. The student has unlimited time to respond to each item.

This task addresses text cohesion. The answer choices are a set of verbs, pronouns, or signal words. Signal words, which are also called connectives or cohesive ties, are words that represent causal (e.g., because), temporal (e.g., when), logical (e.g., if-then), additive (e.g., in addition), or adversative (e.g., although) relationships. They are important linguistic devices for linking ideas and information within and across sentences. They link back to information already read through pronoun reference (anaphora) or repetition of nouns and verbs, and provide clues to future meaning (e.g., therefore, nonetheless). Knowledge of the meaning and use of connectives is an important aid to comprehension (Cain & Nash, 2011; Crosson & Lesaux, 2013).

10 TECHNICAL MANUAL FOR RAPID V5.0: GRADES 3-12

Reading Comprehension Task In this passage-adaptive task, students are placed into their first reading comprehension passage based on their performance on the prior three tasks. The Reading Comprehension task consists of passages that are between approximately 200 and 1300 words in length. Each passage has between six and nine multiple-choice questions. Each question has one correct response and three distractors. The passage is presented to students first. Students can continue to view the passage on the left side of the screen once they begin answering questions on the right side of the screen.

Each question has been tested and assigned a value of difficulty level. The set of questions for each passage has an overall average item difficulty. Students continue to receive passages until a precise estimate of reading comprehension is achieved (i.e., reliability >.85) or once the student has received three passages. Students have an unlimited amount of time to read the passage and respond to questions.

11 TECHNICAL MANUAL FOR RAPID V5.0: GRADES 3-12

Chapter 3 Score Definitions Some scale scores are developed primarily to support norm-referenced interpretations; others support criterion-referenced interpretations. In practice, there is not always a sharp distinction and thus the differences between the two interpretations are delineated prior to delving into RAPID score interpretations.

Norm-referenced: measures broad skill areas, then ranks students with respect to how peers or other students (norm group) performed on the same test. Students’ scores are typically reported in percentiles, percentile ranks, stanines, or normal curve equivalents. Percentiles and percentile ranks are often used interchangeably—that is, the percentile rank of a particular score is defined as the percentage of the scores in a distribution which are at or below the particular score. When a score is identified by its percentile rank, the score is called a percentile. Historically, norms are established so that 25% will score in the bottom 25th percentile, 50% below 50th percentile, 75% below 75th percentile, and so on.

Criterion-referenced: determines whether students have achieved certain defined skills. An individual is compared with a preset standard for expected achievement. The performance of other students is not important to the interpretation of a particular student’s score. Criterion-referenced scores provide a number that indicates how well someone performs on a task, regardless of how anyone else does: e.g., the student is performing above grade level, the student is performing at grade level, or the student is performing below grade level.

RAPID provides several scores to answer different questions about student reading performance. In this section, we define each score and indicate its primary utility for educational decision-making.

READING SUCCESS PROBABILITY The Reading Success Probability (RSP) is a percentage that indicates the likelihood that a student will reach grade-level expectations in reading comprehension. For the purposes of RAPID in the 2019-2020 school year for grades three to twelve, reaching expectations is defined as performing at or above the 50th percentile for the student’s grade level on the Stanford Achievement Test, Tenth Edition, or Stanford 10 (Harcourt, 2003). The RSP is calculated by a combination of a student’s performance in the Word Recognition, Vocabulary Knowledge, Syntactic Knowledge, and Reading Comprehension tasks. This formula is based on the students’ grade level,

12 TECHNICAL MANUAL FOR RAPID V5.0: GRADES 3-12

since the factors that are most predictive of comprehension success change as a student grows older.

The RSP is used to determine which students are at-risk for meeting grade- level expectations by the end of the school year.

PERCENTILE RANKS Each task in RAPID 3-12 has an associated percentile rank. Percentile ranks range from 1 to 99, and they divide the distribution of scores from a large standardization sample into 100 groups that contain approximately the same number of students in each group. Thus, a sixth grade student who scored at the 60th percentile would have obtained a score better than 60% of sixth graders in the norming sample. The median percentile rank on all tasks is 50, which means that half the students in the norming sample for a particular grade obtained a score above that point, and half scored below it. The percentile rank is an ordinal variable, meaning that it cannot be added, subtracted, used to create a mean score, or in any other way be mathematically manipulated. The median is always used to describe the midpoint of a distribution of percentile ranks. Since this score compares a student’s performance to other students within a grade level, it is intended to inform the skill strengths and weaknesses for a student as compared to others in his or her grade. For the 2019-2020 school year, all percentile ranks are based on normative sample performance for the winter period.

PERFORMANCE SCORES All tasks in RAPID also produce an associated Performance Score1. Performance Scores for grades 3-12 span the development of each of four important skills: Word Recognition, Vocabulary Knowledge, Syntactic Knowledge, and Reading Comprehension. The Performance Score provides an estimate of a student’s development in a particular skill. This score should change as a student’s skill level increases or decreases. Each task’s vertical scale has a range of 0-1000, a mean of 500, and a standard deviation of 100. Performance Scores are on an equal interval scale, so they can be added, subtracted, and used to create a mean score. Therefore, Performance Scores (rather than RSP or percentile scores) should be used to determine the degree of growth in a skill for individual students. The Performance Score can be displayed against grade-level percentile ranks to evaluate whether students are making personal growth across the year, regardless of whether they are performing far below or above grade-level peers.

1 In RAPID Version 1.0, Performance Scores were called “Ability Scores.”

13 TECHNICAL MANUAL FOR RAPID V5.0: GRADES 3-12

LEXILE MEASURES MetaMetrics and Lexia Learning conducted a linking study for the purpose of matching students with books and texts—to predict the books and texts a student should be matched with for successful reading experiences, given their performance on the Lexia RAPID Assessment.

The Lexile Framework for Reading. The Lexile Framework is a tool that helps teachers, parents, and students locate appropriate reading materials. Text complexity (difficulty) and reader ability are measured in the same unit—the Lexile. Text complexity is determined by examining such characteristics as word frequency and sentence length. Items and text are calibrated using the Rasch model. The typical range of the Lexile Scale is from 200L to 1600L, although actual Lexile measures can range from below zero (BR) for beginning reader materials to above 1600L for advanced materials. The “L” suffix identifies the score as a Lexile measure. Where it appears, the “BR” prefix indicates a score that is below 0 on the Lexile scale; such scores are typical of beginning readers.

The Lexile Framework for Reading is designed to:

• Measure reading ability by focusing on skills that readers apply when studying written materials sampled from various content areas, including both literary and informational text • Include test items that consist of a passage that is response-illustrated • Measure reading comprehension skills, such as referring to details in the passage, drawing conclusions, and making comparisons and generalizations • Not require prior knowledge related to the passage, vocabulary taken out of context, or formal logic

Description of Sample The sample included students from ten states (Connecticut, Illinois, Iowa, Kentucky, Massachusetts, New York, Ohio, Pennsylvania, Rhode Island, and Texas) consisting of 10 districts and 24 schools. For the Lexia RAPID Assessment Reading Comprehension link in Grades 3 through 10, 2,806 student records matched scores on both the Lexia RAPID Assessment Reading Comprehension task and Lexile Linking Test, or 93.3% of the Lexile Linking Test sample.

14 TECHNICAL MANUAL FOR RAPID V5.0: GRADES 3-12

Study Design A single-group/common-person design was chosen for this study (Kolen and Brennen, 2014). This design is most useful “when (1) administering two sets of items to examinees is operationally possible, and (2) differential order effects are not expected to occur” (pp. 16–17). Most students in the sample were administered the Lexile Linking Test between February 8, 2018 and April 19, 2018, and administered Lexia RAPID Assessment Reading Comprehension task between January 10, 2018 and April 19, 2018. An additional phase of testing for Grades 3 through 10 occurred between May 24, 2018 and June 15, 2018 for both the Lexile Linking Test and the Lexia RAPID Assessment Reading Comprehension task.

The Lexile Linking Tests were paper-pencil tests constructed to be appropriate measures of reading skills to align with the Lexia RAPID Reading Comprehension task in Grades 3 through 12. Linking test forms were developed for Grades 3, 5, 7, 9, and 10. The linking test forms in Grades 3 and 5 contained 35 items, and the test forms for Grades 7, 9, and 10 contained 40 items. Each Lexile Linking Test contained items with a range of text complexity based on grade level norms. Common items were included across grades to provide connectivity for the linking procedure.

Linking Analysis. Two score scales (e.g., the Lexia RAPID Assessment Reading Comprehension scale and the Lexile scale) can be linked using a linear method when (1) test forms have similar difficulties; and (2) simplicity in conversion tables or equations, in conducting analyses, and in describing procedures are desired (Kolen and Brennan, 2014). In linear equating (Dorans, Moses, and Eignor, 2010), a transformation is chosen such that two sets of scores are considered to be linked if they correspond to the same number of standard deviations above (or below) the mean in some group of examinees (Kolen and Brennan, 2014).

The linking equation between Lexia RAPID Assessment Performance Score and Lexile measure can be written as:

Lexile measure = Slope (Lexia RAPID Assessment score) + intercept where the slope is the ratio of the standard deviations of the Lexia RAPID Assessment Reading Comprehension score and the Lexile Linking Test Lexile measure.

15 TECHNICAL MANUAL FOR RAPID V5.0: GRADES 3-12

Reported Lexile Reader Measures and Lexile Text Ranges In RAPID reports an individual student will receive a Lexile reader measure, which indicates the student’s performance on RAPID and how that translates to the Lexile Framework for Reading. In addition, they will receive a corresponding Lexile text range, which is a band that is 50L above and 100L below the student’s actual Lexile measure. This range represents the boundaries between the easiest kind of appropriate reading material for the student and the level at which the student will be challenged, yet can still read successfully. This can be used to guide suitable text selection for the student. It should also be noted that while RAPID Performance Scores are not capped by grade level (in other words, students are able to achieve the highest Performance Score, 1000, in any task that they take), reported Lexile reader measures are capped based on grade level. This is to avoid making text recommendations for students that exceed developmentally appropriate material.

16 TECHNICAL MANUAL FOR RAPID V5.0: GRADES 3-12

Chapter 4 Content and Item Development RAPID items were originally written and reviewed by a team of experienced educators with advanced degrees in education, communication sciences and disorders, and educational and school psychology. They designed items to target standards for late elementary, middle, and high school students using developmentally appropriate, educationally relevant vocabulary and text complexity.

Large numbers of items considered easy, moderate, and difficult for each grade level were created for each task. Each item in the RAPID item pool was developed using best practice methods and criteria that have been widely accepted in the measurement community by complying with The Standards for Educational and Psychological Testing (AERA, APA, & NCME, 2014).

Target words for the Word Recognition and Vocabulary Knowledge tasks were initially selected on the basis of their printed word frequency (Zeno, Ivens, Millard, & Duvvuri, 1995) and then through pilot work with a small group of students.

An estimate of the range in difficulty of the sentences in the Vocabulary Knowledge and Syntactic Knowledge tasks was obtained using the Flesch-Kincaid grade-level formula.

Passages and items on the Reading Comprehension task were written or chosen to address the Common Core State Standards in English Language Arts (CCSS-ELA) in three strands (Reading Informational Text, Reading Literary Text, and Language). Item writers reviewed publicly available text examples from the Partnership for Assessment of Readiness for College and Careers, the SmarterBalanced Consortium, and the Florida Comprehensive Assessment Test. The range of text complexity of passages was evaluated using a variety of quantitative measures (e.g., Flesch-Kincaid) and the qualitative rating guide from Appendix A of CCSS-ELA.

Passages on the Reading Comprehension task were originally written with the goal of an even distribution between literary and informational texts. The passage and item difficulty was ultimately determined by the norming sample’s performance on the task, so the resulting item bank distribution is 42% literary and 58% informational. Since this assessment aims to address the range of student abilities rather than equally address all standards, the guideline for item creation on the Reading Comprehension task was to make 30% of the items assess vocabulary, and 70% assess explicit and inferential comprehension. The comprehension items for grades three through five were split evenly between explicit and inferential questions, with the percentage favoring inferential questions at the upper grades.

17 TECHNICAL MANUAL FOR RAPID V5.0: GRADES 3-12

GUIDELINES FOR RETAINING ITEMS Several criteria were used to evaluate item performance. First, items were evaluated based on Classical Test Theory (CTT) indices. The first process was to identify items that demonstrated strong floor or ceiling effects in response rates >= 95%. Such items are not useful in creating an item bank as there is little variability in whether students are successful on the item. In addition to evaluating the proportion correct rate, item-total correlations were estimated. Items with negative values are indicative of poor functioning because it suggests better performing students answered the item incorrectly, which is typically an indication of a miskeyed or ambiguous item. Items with correlations <.15 were flagged for removal. Following the descriptive analysis of item performance, difficulty and discrimination values from the Item Response Theory (IRT) analyses were used to further identify items that were poorly functioning. Items were flagged for removal if the item discrimination was negative, or if the item difficulty was greater than +4.0 or less than -4.0. Final item counts are included in Table 2.

Table 2 Item/Passage Totals by Task

Task Item Count Word Recognition Task (WRT) 274 items Vocabulary Knowledge Task (VKT) 374 items Syntactic Knowledge Task (SKT) 235 items Reading Comprehension Task (RCT) 140 passages: 1154 items

Secondary criteria were used in evaluating the retained items, which was comprised of a differential item function (DIF) analysis. DIF refers to instances where individuals from different groups with the same level of underlying ability significantly differ in their probability to correctly endorse an item. Unchecked, items included in a test that demonstrate DIF will produce biased test results. For RAPID, DIF testing was conducted comparing: Black-White students, Hispanic-White students, Black-Hispanic students, students eligible for Free or Reduced Price Lunch (FRL) with students not receiving FRL, and English language learner (ELL) to non-ELL students.

DIF testing was conducted with a multiple indicator multiple cause (MIMIC) analysis in Mplus (Muthén & Muthén, 2008); moreover, a series of four standardized and expected score effect size measures were generated using VisualDF software (Meade, 2010) to quantify various technical aspects of score differentiation between the gender groups. First, the signed item difference in the sample (SIDS) index was created, which describes the average unstandardized difference in expected scores between the groups. The second effect size calculated was the unsigned item difference in the

18 TECHNICAL MANUAL FOR RAPID V5.0: GRADES 3-12 sample (UIDS). This index can be utilized as supplementary to the SIDS. When the absolute value of the SIDS and UIDS values are equivalent, the differential functioning between groups is equivalent; however, when the absolute value of the UIDS is larger than SIDS, it provides evidence that the item characteristic curves for expected score differences cross, indicating that differences in the expected scores between groups change across the level of the latent ability score. The D-max index is reported as the maximum SIDS value in the sample, and may be interpreted as the greatest difference for any individual in the sample in the expected response. Lastly, an expected score standardized difference (ESSD) was generated, and was computed similarly to a Cohen’s (1988) d statistic. As such, it is interpreted as a measure of standard deviation difference between the groups for the expected score response with values of .2 regarded as small, .5 as medium, and .8 as large. Fewer than 10% of items demonstrated DIF with most items <.50. Such items will be studied further over time; however, the lack of differential bias in prediction of risk tempers item-level DIF analysis.

19 TECHNICAL MANUAL FOR RAPID V5.0: GRADES 3-12

Chapter 5 Psychometric Development The piloting and validation of RAPID task items occurred from 2010-2015 through the funding provided by two IES grants (see Acknowledgements). Once item writers created items for each task, tasks were piloted with a norming sample of students in grades kindergarten through twelve. RAPID developers then used IRT to evaluate student results. These analyses led to the deletion of certain items and further creation and field trials of more difficult items.

For the purposes of describing our samples, we refer to two types of samples: a norming sample and an alignment sample. The norming sample refers to the set of studies that tested items for the RAPID tasks for the purposes of calibrating and validating the item parameters, and creating the vertical linking and scaling of items and persons across the grade levels. The alignment sample refers to a subsample of individuals who were administered RAPID tasks and Stanford 10 in order to link performance on the RAPID to normative performance on a global reading outcome.

ITEM NORMING SAMPLE Students in grades kindergarten through twelve from multiple school districts throughout Florida participated in the linking study. Students completed RAPID tasks for their grade level, and their performance informed the estimation and evaluation of item parameters and student performance score distributions for each task and grade level. From this norming sample, a total of 44,780 students across the districts participated in these calibration and validation studies.

Task items and passages were created to span 2nd-3rd grade up to 11th- College and Career Ready (Nelson, Perfetti, Liben & Liben, 2012). While eleventh and twelfth grade students were included in the initial sample, there was insufficient variance among the performance score distribution for students in grades ten through twelve to create meaningful separations in grade profiles. Thus, RAPID content was written to span beyond tenth grade; however, the RSP and percentile rank norms for grades ten to twelve are based on the tenth grade sample.

Table 1 provides a breakdown of sample sizes by grade for RAPID tasks. Average demographic information for the sample in grades three through ten was as follows: 41% White, 30% Hispanic, 23% Black, 6% Other; 60% eligible for Free/Reduced Price lunch (FRL); 8% English language learners (ELL). Demographics for the norming

20 TECHNICAL MANUAL FOR RAPID V5.0: GRADES 3-12

sample approximately reflected Florida demographics2. Compared to national demographics, norming sample characteristics show somewhat higher percentages of Black students (23% compared to 15%), Hispanic students (30% compared to 22%) and students eligible for free or reduced price lunch (60% versus 48%). Percentages for White students were lower in the norming sample (41% compared to 55%). Highly similar percentages were seen for ELL students (8% compared to 9%)3.

A particular nuance with assessment research is that the collected sample data may not precisely reflect the population of interest. To correct for observed imprecision in how well a sample reflects a population, sample weights were used to reduce bias and compensate for over- or under-representativeness of the sample. Subsequently, analyses were informed by weights constructed by evaluating the proportion of individuals who were sampled across combinations of race/ethnicity, ELL status, and FRL status. This resulted in 16 unique weights applied to the data to account for the four levels of race/ethnicity (White, Black, Hispanic, Other), two levels of FRL status (eligible/not eligible), and two levels of ELL status (ELL/not ELL). In this way, our analyses were able to more precisely reflect the distribution of population demographics. Specific sample weight data used in this study is reported in Appendix A.

Table 1 Sample Size by Grade for RAPID Tasks

Word Vocabulary Syntactic Reading Grade Recognition Knowledge Knowledge Comprehension 3 651 502 962 2,723 4 586 570 857 2,679 5 697 519 981 2,721 6 652 606 865 3,835 7 612 599 617 3,683 8 613 597 616 3,814 9 1,054 813 1,053 3,964 10 1,109 574 869 3,787 Total 5,974 4,780 6,820 27,206

2 Data sources: Race data from 2013-14 Survey 3, Florida Department of Education; Free/Reduced Lunch data from 2013-14 Survey 2 data, Florida Department of Education and Archive Data Core, Florida Center for Reading Research; English Language Learner data from Education Information and Accountability Services, Florida Depart- ment of Education and Archive Data Core, Florida Center for Reading Research.

3 Date source: Bitterman, Gray & Goldring (2013)

21 TECHNICAL MANUAL FOR RAPID V5.0: GRADES 3-12

PERFORMANCE SCORE ALIGNMENT SAMPLE The performance score alignment sample in grades 3-10 was a subsample of individuals who were administered both RAPID and Stanford 10. A key consideration in aligning RAPID performance scores to the Stanford 10 was to ensure that the sample data reflected normative performance in the United States at large. By using percentile ranks from the Stanford 10, it was possible to evaluate the distribution of scores in the alignment sample to national norms, and provide adjustments where needed to bring the observed sample in line with national norms. In grade 3, 550 were subsampled to achieve a Stanford 10 base rate of success at the 50th percentile equal to 50%. The following are the subsample sizes for the remaining grades: grade 4 (N= 500), grade 5 (N=620), grade 6 (N=900), grade 7 (N=800), grade 8 (N=700), grade 9 (N=500), grade 10 (N=600).

VERTICAL SCALING AND DATA COLLECTION DESIGN An underlying concept for achievement over grades is that items/tests are developed for each grade, and the items/tests are targeted to the skills and knowledge expected of the students at that grade. The result is a series of progressively more challenging tests for each grade level. The score scales for each of these tests are specific to that grade and they do not allow comparisons across grades. In order to make cross-grade comparisons, the scales need to be linked together to form one scale that spans all of the grades that are of interest. The methodology for creating the cross-grade scales is called vertical scaling, because the tests are arranged in increasing order (i.e., vertically). Vertical scaling is a general class of methodologies for taking the results of tests for a series of grade levels and placing them on a common scale.

In order to link an achievement scale from one grade to the next, a data collection design needs to be designed and implemented. A common- item, non-equivalent groups design (Kolen & Brennan, 2004) was used for collecting data with the norming sample for calibration and validation studies pertaining to the item parameters. This approach allows for linking multiple test forms via common items. Each task utilized multiple test forms per grade. For each task, a minimum of 20% of the total items within a form were identified as vertical linking items to create a vertical scale. These items served a dual purpose of not only linking forms across grades, but also linking forms within grades to each other. Outcome data consisted of well-known standardized measures of reading comprehension (Stanford 10).

Because the tasks in RAPID were each designed for vertical equating and scaling, two primary frameworks for estimating the item parameters were considered: 1) a multiple-group IRT analysis of all test forms or 2) test characteristic curve (TCC) equating. The latter approach was chosen to place the items on a common scale (using methodology from Stocking and Lord

22 TECHNICAL MANUAL FOR RAPID V5.0: GRADES 3-12

[1983]). The TCC is the expected value of the summed score for a test, or a set of items, as function of theta (θ). The following figure demonstrates a hypothetical example of TCCs for two forms.

Figure 1. Demonstrates a Hypothetical Test Characteristic Curves of Two Test Forms

The graph shows slight gaps between the two TCCs at different theta θ( ) levels. The Stocking and Lord methodology attempts to find the linking coefficients that minimize the difference in test characteristic curves between the common items in different groups. More specifically, the Stocking and Lord procedure takes the parameters of the items common to both forms and applies a transformation to them to find two equating coefficients, a (slope) and b (intercept), such that the TCC of one form is numerically as similar as possible to that of the other form. Stocking and Lord (1983) provide an algorithm for identifying a linear adjustment to the new scale, of the form:

Y = aX + b where X is a score on the initial scale for the new group, and Y is the corresponding score on the adjusted scale, and “a” and “b” are “slope” and “intercept” parameters, respectively. Vertical scaling uses this approach to analyze the parameters of items repeated in adjacent grades in order to capture the otherwise hidden differences in achievement between grades. Because response probabilities are maintained through a linear transformation, the ability scale can be transformed after calibration to create a more convenient scale for interpretation, usage, and score reporting. For example, each task’s vertical scale on RAPID uses μ=500,σ=100 in the linear equation to convert IRT theta to scale scores.

23 TECHNICAL MANUAL FOR RAPID V5.0: GRADES 3-12

All item analyses were conducted using Bilog-MG3 ( Zimowski, M., Muraki, E., Mislevy, R. J., & Bock, R. D. (2002)) with a two-parameter logistic (2PL) model. Sample weights according to student demographics were used to inform the item and student parameter scores. Parameters derived from these analyses are used in the look-up tables in the computer-adaptive system.

ITEM RESPONSE THEORY RAPID data for grades kindergarten through twelve was analyzed using IRT. This approach to data analysis differs from Classical Test Theory (CTT). In CTT testing, analysis of items involves estimating the difficulty of the item (based on the percentage of respondents correctly answering the item) as well as discrimination (how well individual items relate to overall test performance). While CTT practices are commonplace in assessment development, IRT holds several advantages over CTT. When using CTT, the difficulty of an item depends on the group of individuals on which the data was collected. This means that if a sample has more students that perform at an above-average level, the easier the items will appear; but if the sample has more below-average performers, the items will appear to be more difficult. Similarly, the more that students differ in their abilities, the more likely the discrimination of the items will be high; the more that the students are similar in their ability, the lower the discrimination will be. Thus, scores from a CTT approach can be heavily dependent on the makeup of the sample on which the items are tested.

The IRT framework has proven to be a powerful tool for test construction, scaling, equating, and score reporting. The benefits of IRT are such that: 1) item difficulty, item discrimination, and pseudo-guessing parameters are not dependent on the group(s) from which they were initially estimated; 2) scores describing students’ abilities are not related to the difficulty of the test; 3) shorter tests can be created that are more reliable than a longer test; and 4) item statistics and the ability of students are reported on the same scale.

IRT models define a specific statistical relationship between examinees and test items. For dichotomous or multiple-choice items, one common IRT measurement model is the 2PL model, which was used to analyze the RAPID norming data. The model describes how the probability of a correct response on a test item depends on examinee proficiency (also known as ability or theta (θ)) and two characteristics of the test item: difficulty of the item and the ability of the item to discriminate between high and low proficiencies. In this model, the probability of a successful performance to an item can be expressed as a function of one latent trait or person parameter (θ j) and two item parameters (a i ,bi):

24 TECHNICAL MANUAL FOR RAPID V5.0: GRADES 3-12

, ! !" !" ! ! ! !!!"# [!!!! !!!!! ] 𝑃𝑃𝑃𝑃 𝑋𝑋𝑋𝑋 = 1|𝜃𝜃𝜃𝜃 , 𝑎𝑎𝑎𝑎 , 𝑏𝑏𝑏𝑏 = where P ij (θ) reflects the probability that an examinee, given his/her ability level, answers item correctly (this can be conceptualized as an S-shaped curve with values between 0 and 1 over the theta or ability scale as seen in Figure

2, a i is the slope or discrimination parameter, whereas b i is the threshold or difficulty parameter of an item. D represents a scaling factor of 1.7.

Figure 2 demonstrates a sample item characteristic curve, which describes item properties from IRT. The ability of the individual is represented along the x-axis, denoted by theta. The ability of students and item statistics are reported on the same scale. Thus, the x-axis is a simultaneous representation of student ability and item difficulty. Negative values along the x-axis will indicate that items are easier, while positive values describe harder items. Pertaining to students, negative values describe individuals who perform below average, while positive values identify students who perform above average. A value of zero for both students and items reflects an average level of either ability or difficulty.

The probability of a correct response is represented along the y-axis, which varies across the level of difficulty. Item difficulty is defined as the value on the x-axis at which the probability of correctly endorsing the item is 0.50. As demonstrated for the sample item in Figure 2, the difficulty of this item would be 0.0. Item characteristic curves are graphical representations generated for each item that allow the user to see how the probability of getting the item correct changes for different levels of the x-axis. Students with an ability of -3.0 would have an approximate 0.01 chance of getting the item correct, while students with an ability of 3.0 would have a nearly 99% chance of getting an item correct.

25 TECHNICAL MANUAL FOR RAPID V5.0: GRADES 3-12

Figure 2 Sample Item Characteristic Curve

Item Discrimination is based on the relationship between how a student responds to an item and his or her subsequent performance on the rest of a test. In IRT, discrimination describes the extent to which an item can differentiate the probability of correctly endorsing an item across the range of ability (i.e., -3.0 to +3.0). Figure 3 provides an example of how discrimination operates in the IRT framework. For all three items presented in Figure 3, the difficulty has been held constant at 0.0, while the discriminations are variable. The dashed line (Item 1) shows an item with strong discrimination, the solid line (Item 2) represents an item with acceptable discrimination, and the dotted line (Item 3) is indicative of an item that does not discriminate. It is observed that for Item 3, regardless of the level of ability for a student, the probability of getting the item right is the same. Both high ability students and low ability students have the same chance of doing well on this item. Item 1 demonstrates that as the x-axis increases, the probability of getting the item correct changes as well. Notice that small changes between -1.0 and +1.0 on the x-axis result in large changes on the y-axis. This indicates that the item discriminates well among students, and that individuals with higher ability have a greater probability of getting the item correct. Item 2 shows that while an increase in ability produces an increase in the probability of a correct response, the increase is not as large as is observed for Item 1, indicating that it is a poorer discriminating item.

26 TECHNICAL MANUAL FOR RAPID V5.0: GRADES 3-12

Figure 3. Sample Item Characteristic Curves with Varied Discriminations

When employing IRT methodologies, it is pivotal that some assumptions such as monotonicity, local independence, and dimensionality have been met in order to ensure valid and reliable examinee score interpretations. Monotonicity asserts that the likelihood of successful performance is a non-decreasing function of a test taker’s proficiency. Local independence infers that item performance is provisionally independent given an examinee’s trait level, whereas the dimensionality of an assessment refers to the quantity of latent aptitudes required to capture the construct of interest. Checking the model assumptions are an integral part of the validation process. Results of these analyses can be found in the validity section.

IRT AND ITS ROLE IN COMPUTER ADAPTIVE TESTING In computer adaptive testing (CAT) tailoring item difficulty to match a student’s skill level can be accomplished within different methodology frameworks; however, the most commonly employed framework within an educational context is IRT, which is utilized for both adaptive item selection and test scoring.

The process of a computer adaptive test can be conceptualized as an algorithm that automatically moves up or down the item scale to select questions based on a student’s answers. For example, a student’s with poor reading skills may branch to easier items in order to better estimate his or her reading ability. Higher ability students would likely diverge to more demanding reading items in order to better ascertain the breadth of their reading skills and their reading ability. In general, when an item is answered correctly, the student is then given a more difficult item. When an item is answered incorrectly, the student is then given an easier item. A big advantage of adaptive assessments

27 TECHNICAL MANUAL FOR RAPID V5.0: GRADES 3-12 is that they dynamically adjust to each student’s unique pattern of responses, which pinpoints student achievement levels quickly and efficiently. A brief outline of how a CAT operates is provided in the following section.

Architecturally, a CAT is composed of five components (Wainer, 2000). The first component is related to test content. The remaining four components are psychometric rather than content, and refer to algorithms in the CAT system:

1. Calibrated item bank 2. Starting point 3. Item selection algorithm 4. Scoring algorithm 5. Termination criterion. The first two components are static in nature during CAT operation, whereas algorithms are employed to cycle through components 3, 4, and 5 until the termination criterion is satisfied (see Figure 4).

Develop Item Bank & Cat System

Specify Starting Point & Start Test

Select Item & Deliver to Examinee

Score Item, Update Examinee Not Satisfied Ability Estimate

Evaluate Termination Criterion

If Satisfied

Conclude Test

Figure 4. Flowchart of a CAT Operation

28 TECHNICAL MANUAL FOR RAPID V5.0: GRADES 3-12

As an example, an examinee takes the test on a computer. The computer or application is preloaded with the item bank (which includes psychometric data on each item such as item difficulty and discrimination), and a specific starting point will have been determined for the examinee. An item is selected for this starting point, the first item in the test. After the item is answered, it will be scored and an estimate of examinee ability (θ) is obtained. The termination criterion will then be evaluated; if it is not yet satisfied, another item will be selected (component 3), which the examinee will answer, then the examinee’s score (θ) is updated (component 4), and the termination criterion will be evaluated again until the test termination criterion has been satisfied (component 5).

29 TECHNICAL MANUAL FOR RAPID V5.0: GRADES 3-12

Chapter 6 Reliability Reliability describes the consistency of test scores across multiple administrations over time, as well as the consistency of test scores across test forms. Reliability is inversely related to the amount of measurement error associated with test scores, so it is an important index of test quality. Clearly, if a person’s test score on different occasions changes dramatically according to unintended variations in the testing process, little faith can be put into a particular score obtained on a particular day. For this reason, reliability can be considered a necessary but not sufficient condition for validity.

Because RAPID uses Item Response Theory (IRT) as its method of validation, reliability takes on a different meaning than from a Classical Test Theory (CTT) perspective. The biggest difference between the two approaches is their assumption about the measurement error related to the test scores. CTT treats the error variance as being the same for all scores, whereas IRT views the level of error as dependent on the ability of the individual. In IRT, reliability then describes the level of precision of measurement across ability; it may therefore be difficult to summarize the precision of scores in IRT with a single number. As such, reliability estimates for RAPID are provided in the forms of marginal reliability, standard error of measurement, and test-retest reliability estimates.

MARGINAL RELIABILITY It is possible to determine a generic estimate of reliability, known as marginal reliability (Sireci, Thissen, & Wainer, 1991) with:

! ! 𝜎𝜎𝜎𝜎! − 𝜎𝜎𝜎𝜎!∗ 𝜌𝜌𝜌𝜌 = ! 𝜎𝜎𝜎𝜎! 2 where σθ is the variance of performance score for the norming sample and 2 σe* is the mean-squared error. Marginal reliability coefficients for student performance in the spring computer adaptive version of RAPID are reported in Table B1 by grade and assessment period. Values of .80 are typically viewed as acceptable for research purposes while estimates at .90 or greater are acceptable for clinical decision-making (Nunnally & Bernstein, 1994).

Across grades three through twelve, the marginal reliability was quite high, ranging from .86 for fall Reading Comprehension to .93 for spring Word Recognition and Reading Comprehension. Table B2 shows the marginal reliability coefficients for the Reading Comprehension Task by student subpopulations. Marginal reliability coefficients for the Reading Comprehension Task were above .90 for most subpopulations across grades.

30 TECHNICAL MANUAL FOR RAPID V5.0: GRADES 3-12

STANDARD ERROR OF MEASUREMENT The standard error of measurement (SEM; Harvill, 2005) is an estimate of the amount of variance that might be observed in an individual’s performance given repeated testing. Only through repeated testing would it be possible to define the individual’s true ability, as scores may fluctuate from one particular day of testing to the next. Because it is unreasonable to test a student repeatedly in order to capture his/her true ability, we can construct an interval to observe the extent to which the score may fluctuate. The SEM is calculated with:

! SEM = 𝜎𝜎𝜎𝜎! 1 − 𝜌𝜌𝜌𝜌

where σ x is the standard deviation associated with the mean for assessment x, and ρ 2 is the marginal reliability for the assessment. Means and SEM are reported for Word Recognition (Table B3), Vocabulary Knowledge (Table B4), Syntactic Knowledge (Table B5), and Reading Comprehension (Table B6). While estimates are available for the first three 3-12 tasks in fall, winter, and spring, only spring estimates are currently available for the Reading Comprehension task. Table B7 displays the mean performance and SEM for the Reading Comprehension Task for subpopulation of students.

TEST-RETEST RELIABILITY The extent to which a sample of students performs consistently on the same assessment across multiple occasions is an indication of test-retest reliability. Reliability was estimated for students participating in the field-testing of RAPID by correlating their performance scores across three assessments. Retest correlations for the Word Recognition, Vocabulary Knowledge, and Syntactic Knowledge tasks (Table B8) were the strongest between winter and spring. Fall-winter correlations were strongest for Reading Comprehension. Overall correlations between fall and spring were the lowest. This pattern is expected, as a weaker correlation from the beginning of the year to the end suggests that students were differentially changing over time (i.e., lower ability students may have grown more over time compared to higher ability students). Test-retest correlations ranged from .31 for third grade students’ fall to spring Word Recognition performance, to .85 for sixth grade students’ fall to winter Reading Comprehension performance.

31 TECHNICAL MANUAL FOR RAPID V5.0: GRADES 3-12

Chapter 7 Validity Validity refers to the degree to which evidence and theory support the interpretations of test scores as delineated by the proposed uses of tests (AERA, APA, & NCME, 2014). It is important to bear in mind that validity is not an intrinsic property of a test; in fact, the inferences derived from the test scores must be validated, rather than the test itself. More specifically, each intended use of a test requires a distinct validity argument. Test validation is an iterative procedure of collecting evidence from numerous sources to evaluate the soundness of the score interpretations. This evidence is gathered from studies that examine the content of the test as well as studies that examine the scores produced by a test.

Historically, the major forms of validity evidence are known as content validity, criterion-related validity, and construct validity. Content validity evidence involves gathering data from subject matter experts regarding the degree to which the behaviors sampled on the test represent the behaviors the test is designed to measure. Criterion-related validity evidence involves evaluating correlations among test scores and other variables related to the construct measured. Construct validity involves gathering data that shows test scores are indicative of the construct measured.

The most recent version of the Standards for Educational and Psychological Testing (AERA, APA, & NCME, 2014) revised these three traditional forms of validity evidence as five sources for validity evidence. These five sources are evidence based on (a) test content, (b) response processes, (c) internal structure, (d) relations to other variables, and (e) testing consequences. This chapter addresses numerous aspects of the test validation process including: the internal structure and relations to other variables for 3rd -12th grade RAPID tasks.

EVIDENCE BASED ON THE INTERNAL STRUCTURE Evidence based on internal structure refers to statistical analysis of item and sub-score data to investigate the dimensions measured by an assessment. Analysis of the internal structure of a test can provide validity information regarding the degree to which the dimensionality of the test data conforms to the expected dimensionality. Each task in RAPID is designed to be unidimensional—that is, it is designed to measure a general continuum of reading proficiency.

A first step in gathering validity evidence based on the internal structure was to evaluate the dimensionality of item responses on each 3-12 task. An important assumption in IRT is that of unidimensionality, or the notion

32 TECHNICAL MANUAL FOR RAPID V5.0: GRADES 3-12 that a test score only has meaning if its items measure a single dimension. Connected to this assumption is the framework of local item independence, which requires that, for a given level of individual ability, individual responses to a set of items are statistically independent of each other (Hattie, Krakowski, Rogers, & Swaminathan, 1996). McDonald (1979) suggested that a weaker principle of independence should be used, whereby only the covariances must be zero, and that the relationship between moments did not need to be considered. Stout (1990) extended the logic of weak local independence to argue for “essential unidimensionality” rather than ascribing to more stringent standards. Conceptually, Stout argued that a test is unidimensional if, for a given level of ability, the average covariance over pairs of items on the test is small in magnitude, as opposed to zero.

To test for unidimensionality, a parametric confirmatory factor analysis was run on scores for different forms of each task by grade level. Because a planned missing data design was used, the covariance coverage was necessarily low. A planned missing data design with a large number of items frequently precludes a factor analysis of the full item response matrix when using the weighted least squares multivariate estimator. This estimator is necessary to produce commonly used fit indices for confirmatory factor analysis. Subsequently, the factor analysis was carried out by form and grade within each task. The comparative fit index (CFI), Tucker-Lewis index (TLI), and root mean square error of approximation (RMSEA) were used to evaluate model fit for the Word Recognition, Vocabulary Knowledge, and Syntactic Knowledge tasks. CFI and TLI values of at least .90 are considered acceptable, as are RMSEA values less than .10. Fit statistics for Word Recognition, Vocabulary Knowledge, and Syntactic Knowledge are reported in Table C1, Table C2, and Table C3, respectively. Results demonstrate that item responses across forms and grades converge on an essentially unidimensional construct for the three tasks.

For the Reading Comprehension task, the extent to which a unidimensional model fit better than a testlet model that accounts for items being nested within passages was tested. The two models were compared using the Akaike Information Criterion (AIC) and Bayesian Information Criterion (BIC) indices. Model fit comparisons between the unidimensional and testlet models for the Reading Comprehension Task are reported in Table C4. Results from this comparison based on AIC and BIC were mixed. The AIC suggests that the testlet model should be used, while the BIC and adjusted BIC values were smaller for the unidimensional model. Although the indices provide mixed information, the penalty term is greater in the BIC compared to the AIC. Due to the penalty difference, the BIC is a more conservative estimate; given the results above, it was deemed more appropriate for model selection. Subsequently, the unidimensional model was retained.

33 TECHNICAL MANUAL FOR RAPID V5.0: GRADES 3-12

EVIDENCE BASED ON RELATIONS TO OTHER VARIABLES Evidence based on relations to other variables refers to traditional forms of criterion-related evidence for validity, such as correlations with external criteria relevant to the attributes measured. Criterion validity describes how well scores on one assessment relate to other theoretically relevant constructs, both concurrently and predictively. Studying the relationships among test scores and other measures of the construct tested aids in understanding the degree to which test scores support their intended uses.

Concurrent Validity Concurrent validity was evaluated by correlating scores from the tasks amongst each other. Correlation results are reported in Table C5. Correlations between tasks were moderate for most grades, ranging from .29 for Word Recognition and Syntactic Knowledge in fourth grade, to .63 for Reading Comprehension and Syntactic Knowledge in ninth grade.

Predictive Validity Predictive validity was evaluated using RAPID tasks to predict later reading comprehension performance on the Stanford 10 (Harcourt, 2003) through a series of linear and logistic regressions. The linear regressions were run two ways. First, a correlation analysis was used to evaluate the strength of relations between each of the tasks’ performance scores with scores on the Stanford 10. Second, a multiple regression analysis was run to estimate the total amount of variance that the linear combination of the predictors explained in Stanford 10 reading comprehension performance. Results from the linear regression analyses are reported in Table C6. The total amount of variance explained in Stanford 10 reading comprehension by four RAPID tasks ranged from 39% in grade ten, to 62% in grade three.

For the logistic regressions, students’ performance on the Stanford 10 Reading Comprehension task was coded as ‘1’ for performance at or above the 50th percentile, and ‘0’ for scores below this target. This dichotomous variable was then regressed on a combination of Word Recognition, Vocabulary Knowledge, Syntactic Knowledge and Reading Comprehension scores at each grade level. Further, we evaluated the classification accuracy of scores from RAPID as it pertains to risk status on the Stanford 10. By dichotomizing the combination of screening task scores as ‘1’ for not at- risk for reading difficulties, and ‘0’ for at-risk for reading difficulties, students could be classified based on their dichotomized performances on both RAPID and Stanford 10. As such, students could be identified as not at- risk on the combination of screening tasks and demonstrating grade level performance on the Stanford 10 (i.e., specificity or true-negatives), at- risk on the combination of screening task scores and below grade level

34 TECHNICAL MANUAL FOR RAPID V5.0: GRADES 3-12 performance on the Stanford 10 (i.e., sensitivity or true-positives), not at-risk based on the combination of screening task scores and not at grade level on the Stanford 10 (i.e., false negative error), or at-risk on the combination of screening task scores and at grade level on the Stanford 10 (i.e., false positive error). Classification of students in these categories allows for the evaluation of cut-points on the combination of screening tasks (i.e., Reading Success Probability [RSP] index) to determine which RSP cut-point maximizes predictive power.

The concept of risk can be viewed in many ways. One interpretation is risk as “percent chance” ranging from 0 to100, with 0 meaning there is no chance that a student will experience difficulty, and 100 indicating that a student will absolutely experience difficulty. When attempting to identify children who are “at-risk” for poor performance on a future measure of reading achievement, this yes/no decision is typically based upon a “cut-point” along a continuum of risk. This future measure of achievement is often a state’s high-stakes assessment, which provides a standard score describing the performance of each student. Grade-level cut-points are chosen that determine whether a student has passed or failed the state-wide assessment.

Decisions concerning appropriate cut-points for screening measures are based on the level of correct classification that is desired from the screening assessments. While a variety of statistics may be used to guide such choices (e.g., sensitivity, specificity, positive and negative predictive power; see Schatschneider, Petscher, & Williams, 2008), negative predictive power was utilized to develop the RAPID cut-point. Negative predictive power is the proportion of students who end up not developing a problem, divided by the number of students identified by the screen as not being at risk (Schatschneider et al, 2008, p. 310). Predictive power is not considered to be a property of screening assessments since it is known to fluctuate given the proportion of individuals who are at-risk on the selected outcome measure (Streiner, 2003).

As it pertains to RAPID, we evaluated various cut-points on the RSP, which would result in a minimum value of .85 negative predictive power for grades three through ten. Results from this analysis (Table C7) showed that a .70 RSP could be used to obtain the .85 negative predictive power threshold.

Besides a binary classification a multi-class classification was used to evaluate a third category in the RSP. This was primarily done to differentiate students that fall into the middle of the score distribution from the students that appear at the upper and lower end of the score distribution. The approach used borrowed machine learning algorithms. A one vs. all method was implemented to determine the third category in the RSP. This methodology creates a binary model for each of the multiple output classes. Each of these

35 TECHNICAL MANUAL FOR RAPID V5.0: GRADES 3-12

binary models for the individual classes is assessed against its complement (all other classes in the model) as though it were a binary classification issue. Results were evaluated through the application of a confusion matrix which is named after the fact that it shows where the classification algorithm confuses one class for another. Similar to the binary classification metrics such as positive predictive power, negative predictive power, sensitivity and specificity were used to evaluate the classification accuracy of the models.

Several different cut points on the RSP were evaluated. Results from these analyses (Table C8) showed that negative predictive power were highest when using the following three categories (≤30, 31-69, ≥70).

Additional logistic regression analyses were conducted in which students’ performance on the SESAT/Stanford 10 was regressed on RAPID’s Reading Comprehension Task. The Reading Comprehension task was chosen because it is the most reliable of all of the tasks and assesses students’ proficiency in the ultimate goal of literacy instruction.

Different analyses were conducted for the 30th percentile and below as well as the 15th percentile and below. Results (Table C9) were evaluated in terms of classification accuracy for each grade level.

Differential Accuracy of Prediction An additional component of checking the validity of cut-points and scores on RAPID involved testing the differential accuracy of the regression equations across different demographic groups. This procedure involves a series of logistic regressions predicting success on the Stanford 10 test (i.e., at or above the 50th percentile). The independent variables included a variable that represented whether students were identified as not at-risk (RSP ≥ .70; coded as ‘1’) or at-risk (RSP < .70; coded as ‘0’) on the combination of RAPID task scores, and a variable that represented a selected demographic group, as well as an interaction term between the two variables. A statistically significant interaction term would suggest that differential accuracy in predicting end-of-year performance existed for different groups of individuals based on the risk status determined by RAPID. For the combination of RAPID task scores, differential accuracy was separately tested for Black students, Hispanic students, ELL students, and students eligible for Free/Reduced Price Lunch (FRL).

When testing for differential accuracy between Black and White students (Table C10), a significant effect for the interaction between the RSP cut-point and minority status existed for students in grade four (p = .005). This finding indicated that for the sample tested in the winter, White students with an RSP above the cut-point had a greater chance of being at or above the 50th percentile on the Stanford 10, compared to Black students above the

36 TECHNICAL MANUAL FOR RAPID V5.0: GRADES 3-12 cut-point on the RSP. We note that replication will be needed across multiple administrations with a larger sample to evaluate the extent of this phenomenon.

No significant differential accuracy existed between Hispanic and White students (Table C11), ELL and non-ELL students (Table C12), or students who were eligible for FRL and those who were not (Table C13).

The same analyses was conducted for the RSP coded into 3 categories (≤30, 31-69, ≥70). Tables C14-17 show the findings of these analyses. No significant differential accuracy was found for any demographic sub-group.

CONSTRUCT VALIDITY Construct validity describes how well an assessment measures the construct it is intended to measure. Components of construct validity include convergent validity and discriminant validity. The goal of convergent validity is to yield a high association, indicating that the developed measure converges, or is empirically linked to, the intended construct. The goal of discriminant validity is to yield a lower association, indicating that the developed measure is unrelated to a non-intended construct.

Convergent Validity Convergent validity can be evaluated by testing relations between a developed assessment and another related assessment. After the norming study, additional data was collected in two large school districts in Florida with four elementary schools, three middle schools, and two high schools. A total of 1,825 students in grades three through ten were administered the four tasks in RAPID as well as established, norm-referenced assessments of word reading (Test of Word Reading Efficiency–2, Torgesen, Wagner, & Rashotte, 2012), vocabulary (Peabody Picture Vocabulary Test, Fourth Edition; Dunn & Dunn, 2007), and syntactic knowledge (the Grammaticality Judgment Test of the Comprehensive Assessment of Spoken Language; Carrow-Woolfolk, 2008).

Word Recognition Word recognition was measured by the RAPID Word Recognition task and compared to measures of decoding , the Efficiency and Phonemic Decoding Efficiency subtests of the Test of Word Reading Efficiency, Second Edition (TOWRE-2; Torgesen et al., 2012). The TOWRE-2 was designed to monitor the progress of students receiving additional instruction for weaknesses in word reading abilities. It has demonstrated discrimination between low-performing students with language and reading disabilities (Torgesen et al., 2012). On the Sight Word Efficiency subtest (Real Word), students read aloud as many sight words as possible in 45 seconds. On the Phonemic Decoding Efficiency subtest (Non-Word), students read 37 TECHNICAL MANUAL FOR RAPID V5.0: GRADES 3-12 aloud as many non-words as possible in 45 seconds. Correlations between the RAPID Word Recognition task and the TOWRE-2 are listed in Table C18. Correlations between the RAPID Word Recognition task and the TOWRE-2 Real Word subtest demonstrated moderate associations. The average correlation across grade levels is .33 with a range of .24 to .49. Correlations between the RAPID Word Recognition task and the TOWRE-2 Non-Word subtest were moderate as well. The average correlation across grade levels was .38 with a range of .30 to .47.

Vocabulary Knowledge Students’ vocabulary knowledge was measured by the RAPID Vocabulary Knowledge task and the Peabody Picture Vocabulary Test, Fourth Edition (PPVT-4; Dunn & Dunn, 2007). The PPVT-4 is frequently used as a normative measure of receptive vocabulary. The PPVT-4 requires students to point to the picture that best represents a word spoken by the examiner, from a group of four pictures. Correlations between the RAPID Vocabulary Knowledge task and the PPVT-4 are shown in Table C18. Correlations demonstrate moderate associations between the RAPID Vocabulary Knowledge task and the PPVT- 4. The average correlation across grade levels is .52, with a range of .47 to .67.

Syntactic Knowledge Students’ ability to use the structure of sentences to comprehend the sentence meaning was measured by the RAPID Syntactic Knowledge task and the Grammaticality Judgment subtest (GJT) of the Comprehensive Assessment of Spoken Language (CASL; Carrow-Woolfolk, 2008). The CASL is most frequently used by speech language pathologists to determine instructional goals for students with diagnostic weaknesses in language skills. In the GJT, students hear sentences with and without grammatical errors and must indicate whether or not they detected errors. In addition, students are asked to fix any perceived errors in the sentence without changing its meaning. Correlations between RAPID and the GJT-4 and CASL are shown in Table C18. Correlations between the RAPID Syntax Knowledge task and the GJT were moderate. The average correlation across grade levels was .49, with a range of .37 to .61.

Reading Comprehension Relations between the RAPID Reading Comprehension task and the Stanford 10 Reading Comprehension subtest are found in Table C6. The Stanford 10 Reading Comprehension subtest is based on research about the contexts and activities that are effective for reading instruction and assessment. The Stanford 10 framework involves three major purposes for reading (literary, informational, and functional) and multiple modes of comprehension

38 TECHNICAL MANUAL FOR RAPID V5.0: GRADES 3-12

(initial understanding, interpretation, critical analysis, and awareness and usage of reading strategies). The correlations between RAPID Reading Comprehension and the Stanford 10 range from .57 to .74.

A secondary analysis of convergent validity evaluated the extent to which the correlations between RAPID and the PPVT-4, GJT, and TOWRE-2 tasks varied depending on a student’s level of ability. Because traditional correlations are representative of average associations, it is possible that the average does not best characterize relations for students with low, average, and high ability levels. For example, it is plausible that at low levels of the GJT, a stronger correlation exists between the GJT and the RAPID Syntactic Knowledge task compared to a weaker correlation at higher levels of the GJT. Because the GJT is a clinical measure of syntax knowledge, it is designed for students with prior demonstrated weaknesses in this skill. The GJT is not typically administered to students with average or high syntax skills; therefore, reporting the average correlation between scores on the GJT and the RAPID Syntactic Knowledge task could mask a stronger association for students with poor syntax skills. Typical regression models are ill equipped to test for differential correlations across the range of scores for an outcome variable. Rather, quantile regression (Koenker & Bassett, 1978; Petscher & Logan, 2014; Petscher, Logan, & Zhou, 2013) is suitable to estimating the correlation between measures conditional on performance of the outcome. In this manner, analysis was done on the extent to which: 1) the correlation between the RAPID Vocabulary Knowledge task and PPVT-4 varied for students with low, average, and high PPVT-4 scores; 2) the correlation between the RAPID Word Recognition task and TOWRE-2 Real Word subtest varied for students with low, average, and high TOWRE-2 Real Word subtest scores; 3) the correlation between the RAPID Word Recognition task and TOWRE-2 Non- Word subtest varied for students with low, average, and high TOWRE-2 Non- Word subtest scores; and 4) the correlation between the RAPID Syntactic Knowledge task and the GJT varied for students with low, average, and high GJT scores.

The quantile correlations by grade level between RAPID Vocabulary Knowledge task and the PPVT-4 are shown in Table C19. In general, the correlations between the two assessments are stronger for students whose performance was lower on PPVT-4. The implication is that lower performance on the PPVT-4 is correlated with low performance on the Vocabulary Knowledge task. At higher levels of the PPVT-4, the correlation is moderate. As shown in Table C19, a similar trend is observed for the RAPID Word Recognition task in its relation to the two TOWRE-2 outcomes, as well as for the Syntactic Knowledge task and GJT scores.

39 TECHNICAL MANUAL FOR RAPID V5.0: GRADES 3-12

Discriminant Validity Discriminant validity can be evaluated by correlating scores from a developed assessment with variables that are not related to the construct being measured. Discriminant validity was evaluated by estimating correlations between RAPID tasks and variables that should not be related to measures of reading: sex and birthdate (Table C20). Results indicated that weak associations were generally observed across grade levels.

Growth Reference Charts Growth charts were created based on RAPID data from the norming sample that was collected in the Winter Administration Period (Appendix D). These growth charts consist of a series of curves that illustrate the typical pace of performance for students that start at different achievement levels (10th, 25th, 50th, 75th, 90th percentiles) in each grade. For example, students who started at the 10th percentile in 3rd grade on the Word Recognition Task had an average observed scale score of 285. Students who are at the 10th percentile in 4th grade obtained an observed scale score of 302 on average and so forth. Grades 10-12 were combined due to insufficient variance among the performance score distributions. Growth charts such as these can be used as a reference guide to identify how students are performing relative to the norming sample in each grade. These charts are tools that can contribute to forming an overall plan for the skill being measured and are intended to be used as a diagnostic reference.

40 TECHNICAL MANUAL FOR RAPID V5.0: GRADES 3-12 References American Educational Research Association, American Psychological Association, & National Council on Measurement in Education (2014). Standards for educational and psychological testing. Washington, DC: American Educational Research Assocation.

Bitterman, A, Gray, L. and Goldring, R. (2013). Characteristics of Public and Private Elementary and Secondary Schools in the United States: Results from the 2011-12 Schools and Staffing Survey (NCES 2013- 312). US Department of Education. Washington, DC. National Center on Educational Statistics. Retrieved on July 30th, 2015 from http:// nces.gov/pubsearch.

Cain, K., & Nash, H. M. (2011). The influence of connectives on young readers’ processing and comprehension of text. Journal of Educational Psychology, 103(2), 429-441.

Carrow-Woolfolk, E. (1995; 1999; 2008). Comprehensive Assessment of Spoken Language. Circle Pines, MN: AGS.

Crosson, A. C., Lesaux, N. K. (2013). Connectives: Fitting another piece of the vocabulary instruction puzzle. The Reading Teacher, 67(3), 193-200.

Dorans, N. J., Moses, T. P., & Eignor, D. R. (2010). Principles and practices of test score equating (Research Report No. RR-10-29). Princeton, NJ: Educational Testing Service.

Dunn, L.M., & Dunn, D.M. (2007). Peabody Picture Vocabulary Test-IV. Circle Pines, MN: American Guidance Service.

Foorman, B., & Herrera, S., Petscher, Y., Mitchell, A., & Truckenmiller, A. (2015). The structure of oral language and reading and their relation to comprehension in grades kindergarten through grade 2. Reading and , 28(5), 655–681. http://eric.ed.gov/?id=EJ1057505.

Foorman, B., Koon, S., Petscher, Y., Mitchell, A., & Truckenmiller, A. (2015). Examining General and Specific Factors in the Dimensionality of Oral Language and Reading in 4th-10th Grades. Journal of Educational Psychology. Advance online publication. doi: 10.1037/edu0000026.

Foorman, B. R., Petscher, Y., & Bishop, M. D. (2012). The incremental variance of morphological knowledge to reading comprehension in grades 3-10 beyond prior reading comprehension, spelling, and text reading

41 TECHNICAL MANUAL FOR RAPID V5.0: GRADES 3-12

efficiency. Learning and Individual Differences, 22, 792-798.

Harcourt Brace (2003a). Stanford Achievement Test (10th ed.). San Antonio, TX: Author. Harvill, L.M. (2005). Standard error of measurement. Educational Measurement: Issues and Practice, 10, 33-41.

Hattie, J., Krakowski, K., Rogers, H.J., & Swaminathan, H. (1996). An assessment of Stout’s index of essential unidimensionality. Applied Psychological Measurement, 20, 1-14.

Henry, M. K. (2010). Unlocking literacy: Effective decoding and spelling instruction. Baltimore, MD: Paul H. Brookes.

Hock, M. F., Brasseur, I. F., Deshler, D. D., Catts, H. W., Marquis, J. G., Mark, C. A., & Stribling, J. W. (2009). What is the reading component skill profile of adolescent struggling readers in urban schools? Learning Disability Quarterly,32(1), 21-38.

Hudson, R. F., Pullen, P. C., Lane, H. B., & Torgesen, J. K. (2009). The complex nature of reading fluency: A multidimensional view. Reading & Writing Quarterly: Overcoming Learning Difficulties, 25(1), 4-32.

Koenker, R., & Bassett, G. (1978). Regression quantiles. Econometrica, 46, 33-50.

Kolen, M. J., & Brennan, R. L. (2004). Test equating methods and practices. New York: Springer-Verlag.

Kolen, M.J. & Brennan, R.L. (2014). Test equating, scaling, and linking: Methods and practices. rd edition. New York: Springer Science + Business Media, LLC.

Meade, A. W. (2010). A taxonomy of effect sizes for the differential functioning of items and scales. Journal of Applied Psychology, 95, 728-743.

McDonald, R.P. (1979). The structural analysis of multivariate data: A sketch of general theory. Multivariate Behavioral Research, 14, 21-38.

Muthén, B., & Muthén, L. (2008). Mplus User’s Guide. Los Angeles, CA: Muthén and Muthén.

Nagy, W. E., & Anderson, R. C. (1984). How many words are there in printed school English? Reading Research Quarterly, 19, 304–330.

42 TECHNICAL MANUAL FOR RAPID V5.0: GRADES 3-12

Nelson, J., Perfetti, C., Liben, D., Liben, M. (2012). Measures of text difficulty: Testing their predictive value for grade levels and student performance. Retrieved from: http://www.ccsso.org/ Documents/2012/Measures%20ofText%20Difficulty_final.2012.pdf

Nunnally, J. C., & Bernstein, I. H. (1994). Psychometric theory (3rd Ed.). New York: McGraw-Hill.

Perfetti, C., & Stafura, J. (2014). Word knowledge in a theory of reading comprehension. Scientific Studies of Reading, 18, 22-37. DOI: 10.1080/10888438.2013.827687

Petscher, Y., Logan, J.A.R., & Zhou, C. (2013). Extending conditional means modeling: An introduction to quantile regression (pp. 3-33). In Y. Petscher, C. Schatschneider, & D.L. Compton (Eds.) Applied quantitative analysis in education and social sciences. New York: Routledge.

Petscher, Y., & Logan, J.A.R. (2014). Quantile regression in the study of Developmental Sciences. Child Development, 85, 861-881. Doi: 10.1111/cdev.12190.

Scarborough, H. S. (2001). Connecting early language and literacy to later reading (dis)abilities: Evidence, theory, and practice. In S. Neumann & D. Dickinson (Eds.), Handbook for research in early literacy (pp. 97–110). New York: Guilford.

Schatschneider, C., Petscher, Y., & Williams, K. M. (2008). How to evaluate a screening process: The vocabulary of screening and what educators need to know (pg. 304-317). In L. Justice & C. Vukelic (Eds.). Every moment counts: Achieving excellence in preschool language and literacy instruction. New York: Guilford Press.

Sireci, S. G., Thissen, D., & Wainer, H. (1991). On the reliability of testlet-based tests. Journal of Educational Measurement, 28, 237-247.

Stocking, M. L., & Lord, F. M. (1983). Developing a common metric in item response theory. Applied Psychological Measurement, 7, 201-210.

Stout, W.F. (1990). A new item response theory modeling approach and applications to unidimensionality assessment and ability estimation. Psychometrika, 55, 293-325.

Streiner, D. L. (2003). Diagnosing tests: Using and misusing diagnostic and

43 TECHNICAL MANUAL FOR RAPID V5.0: GRADES 3-12

screening tests. Journal of Personality Assessment, 81, 209-219.

Torgesen, J. K. (2006). A comprehensive K–3 reading assessment plan: Guidance for school leaders. Portsmouth, NH: RMC Research Corporation, Center on Instruction. Retrieved from http://www. centeroninstruction.org/files/K%2D3%20Reading%2Epdf.

Torgesen, J.K., Wagner, R.K., & Rashotte, C.A. (2012). Test of Word Reading Efficiency, 2nd Edition. Austin, TX: Pro-ED, Inc.

Wainer, H. (Ed.) (2000). Computerized adaptive testing: A Primer. Mahwah, NJ: Erlbaum.

Zeno, S. M., Ivens, S. H., Millard, R. T., & Duvvuri, R. (1995). The educator’s word frequency guide. New York: Touchstone Applied Science Associates. Inc., Brewster.

Zimowski, M., Muraki, E., Mislevy, R. J., & Bock, R. D. (2002). BILOG-MG 3; Item analysis and test scoring with binary logistic models for multiple groups [computer program]. Mooresvill, IN: Scientific Software.

44 TECHNICAL MANUAL FOR RAPID V5.0: GRADES 3-12 Appendix A Table A1: Population values for each grade for each of the sixteen demographic groups

Grade Race FRL ELL 3 4 5 6 7 8 9 10 White Yes Yes 0.00 0.20 0.64 0.25 0.33 0.36 0.17 0.43 White Yes No 18.11 17.52 17.26 17.69 16.80 16.37 15.24 13.48 White No Yes 0.09 0.30 0.09 0.13 0.00 0.07 0.25 0.14 White No No 22.02 23.22 23.61 23.69 25.00 25.95 27.56 29.39 Black Yes Yes 0.18 0.30 0.27 0.19 0.47 0.43 0.25 0.57 Black Yes No 19.75 18.72 18.80 18.88 18.20 17.53 16.49 15.55 Black No Yes 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 Black No No 3.00 3.20 3.36 3.75 4.20 4.50 5.83 6.56 Hispanic Yes Yes 6.82 5.51 7.27 7.75 7.07 6.79 2.83 3.85 Hispanic Yes No 16.65 17.42 15.26 14.38 14.53 14.37 16.49 14.41 Hispanic No Yes 0.18 0.20 0.18 1.00 0.40 0.79 0.67 0.86 Hispanic No No 6.73 6.81 6.81 6.06 6.93 7.01 8.33 8.92 Other Yes Yes 0.00 0.30 0.09 0.25 0.53 0.21 0.25 0.36 Other Yes No 3.46 3.21 3.36 3.00 2.60 2.57 2.41 2.07 Other No Yes 0.09 0.10 0.00 0.06 0.07 0.07 0.08 0.00 Other No No 2.91 3.00 3.00 2.94 2.87 2.93 3.16 3.42

Note: Not all race/ethnicity subgroups are represented due to limited information provided when evaluating interactions among (i.e., White, Black, Hispanic, Other), free/reduced lunch status (eligible or ineligible), and English language learner (identified or not identified). Students in grades 11 and 12 use the grade 10 distribution of ability scores. FRL = Free/reduced price lunch. ELL = English language learners. Note that Table A1 should be used with Table A2, TableA3, TableA4, Table A5. Large sample weights reflect subgroups that needed to be weighted more in the analyses; however, a large value does not necessarily indicate gross under- sampling. For example, TableA4 highlights that Hispanic students who are FRL and ELL have large weights in grades 8-10 (e.g., 679, 283, and 385). Note also that Table A1 shows that Hispanic students who are FRL and ELL constitute only 6.79% of the state population in grade 8. Thus, the large sample weight reflects the need to weight the smaller sample by a factor of 679 so that it can adequately reflect the state population at an appropriate level.

45 TECHNICAL MANUAL FOR RAPID V5.0: GRADES 3-12

Table A2: Sample weight values for Word Recognition Task

Grade Race FRL ELL 3 4 5 6 7 8 9 10 White Yes Yes 0.00 1.18 0.64 0.25 0.33 0.36 1.89 0.43 White Yes No 1.71 1.63 1.60 2.45 2.23 2.45 2.82 3.56 White No Yes 0.09 0.30 0.09 0.13 0.00 0.44 1.32 0.14 White No No 0.52 0.51 0.54 0.55 0.49 0.50 0.59 0.48 Black Yes Yes 0.18 0.30 0.27 0.19 2.94 0.43 2.78 6.33 Black Yes No 0.83 0.84 0.87 0.67 0.84 1.01 0.72 1.00 Black No Yes 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 Black No No 0.32 0.38 0.29 0.36 0.41 0.39 0.48 0.60 Hispanic Yes Yes 45.47 16.21 51.93 75.00 70.00 6.79 2.83 10.69 Hispanic Yes No 9.05 14.64 6.63 9.59 12.01 11.98 16.49 11.44 Hispanic No Yes 1.20 1.18 0.18 1.00 0.40 4.94 3.53 3.19 Hispanic No No 2.58 2.66 2.37 2.82 2.83 3.92 2.74 3.96 Other Yes Yes 0.00 0.88 0.64 1.67 1.61 0.64 0.89 0.36 Other Yes No 1.07 1.45 2.92 1.09 1.14 1.31 1.70 2.30 Other No Yes 0.20 0.59 0.00 0.06 0.44 0.21 0.89 0.00 Other No No 0.57 0.53 0.55 0.83 1.17 0.49 0.51 0.92

Note: Not all race/ethnicity subgroups are represented due to limited information provided when evaluating interactions among (i.e., White, Black, Hispanic, Other), free/reduced lunch status (eligible or ineligible), and English language learner (identified or not identified). Students in grades 11 and 12 use the grade 10 distribution of ability scores. FRL = Free/reduced price lunch. ELL = English language learners.

46 TECHNICAL MANUAL FOR RAPID V5.0: GRADES 3-12

Table A3: Sample weight values for Vocabulary Knowledge Task

Grade Race FRL ELL 3 4 5 6 7 8 9 10 White Yes Yes 0.00 2.00 0.64 0.25 0.33 0.36 0.17 0.43 White Yes No 0.69 0.67 0.66 0.89 0.87 0.72 0.69 0.77 White No Yes 9.00 0.30 0.09 0.13 0.00 0.07 0.25 0.14 White No No 0.84 0.81 0.82 1.06 1.01 0.90 1.01 0.89 Black Yes Yes 0.90 1.67 1.50 0.19 2.76 2.53 0.68 0.57 Black Yes No 1.77 2.01 1.52 1.14 1.00 1.18 1.07 1.06 Black No Yes 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 Black No No 1.00 0.96 0.97 0.65 0.72 0.93 1.44 1.08 Hispanic Yes Yes 2.85 10.40 5.39 7.83 6.04 5.18 2.16 3.16 Hispanic Yes No 0.93 0.88 0.89 0.56 0.71 0.71 0.87 1.02 Hispanic No Yes 18.00 1.11 0.95 5.88 2.35 4.65 5.58 0.86 Hispanic No No 1.35 1.55 1.36 1.41 1.66 1.74 1.88 1.31 Other Yes Yes 0.00 3.00 0.47 0.25 0.53 0.21 2.08 0.36 Other Yes No 0.96 1.14 1.34 1.52 1.04 1.53 0.89 0.92 Other No Yes 9.00 0.56 0.00 0.06 0.07 0.07 0.08 0.00 Other No No 0.86 0.71 1.20 1.37 0.95 2.90 0.95 0.85

Note: Not all race/ethnicity subgroups are represented due to limited information provided when evaluating interactions among (i.e., White, Black, Hispanic, Other), free/reduced lunch status (eligible or ineligible), and English language learner (identified or not identified). Students in grades 11 and 12 use the grade 10 distribution of ability scores. FRL = Free/reduced price lunch. ELL = English language learners.

47 TECHNICAL MANUAL FOR RAPID V5.0: GRADES 3-12

Table A4: Sample weight values for Syntactic Knowledge Task

Grade Race FRL ELL 3 4 5 6 7 8 9 10 White Yes Yes 0.00 1.67 1.00 1.00 1.00 36.00 17.00 43.00 White Yes No 2.39 2.14 2.27 2.31 3.23 14.36 14.65 12.96 White No Yes 0.29 1.00 1.00 1.00 0.00 7.00 25.00 14.00 White No No 0.50 0.47 0.43 0.39 0.37 0.33 0.37 0.38 Black Yes Yes 0.10 0.14 0.33 0.23 2.94 43.00 2.78 57.00 Black Yes No 0.83 0.98 1.15 1.08 1.26 2.12 1.31 1.35 Black No Yes 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 Black No No 0.46 0.55 0.51 0.59 0.58 0.73 0.89 0.93 Hispanic Yes Yes 9.34 2.36 5.08 13.36 70.70 679.00 283.00 385.00 Hispanic Yes No 3.27 3.83 3.32 4.61 29.65 29.33 24.98 120.08 Hispanic No Yes 0.43 1.67 1.80 100.00 4.00 79.00 67.00 86.00 Hispanic No No 2.31 3.89 2.67 3.50 4.28 14.31 14.61 38.78 Other Yes Yes 0.00 2.50 0.29 2.08 3.31 1.31 25.00 36.00 Other Yes No 1.23 1.06 2.35 1.52 1.78 1.76 4.23 9.00 Other No Yes 0.17 0.83 0.00 0.50 0.44 0.44 0.89 0.00 Other No No 1.12 0.99 0.89 1.16 1.61 1.39 0.88 2.28

Note: Not all race/ethnicity subgroups are represented due to limited information provided when evaluating interactions among (i.e., White, Black, Hispanic, Other), free/reduced lunch status (eligible or ineligible), and English language learner (identified or not identified). Students in grades 11 and 12 use the grade 10 distribution of ability scores. FRL = Free/reduced price lunch. ELL = English language learners.

48 TECHNICAL MANUAL FOR RAPID V5.0: GRADES 3-12

Table A5: Sample weight values for Reading Comprehension Task

Grade Race FRL ELL 3 4 5 6 7 8 9 10 White Yes Yes 0.00 0.77 1.16 1.09 1.22 2.00 1.13 1.65 White Yes No 0.91 1.04 1.04 1.32 1.26 1.34 1.10 1.08 White No Yes 0.41 1.58 1.29 0.57 0.00 0.44 1.92 1.08 White No No 0.58 0.53 0.52 0.67 0.71 0.72 0.97 0.96 Black Yes Yes 1.64 2.73 2.45 0.61 0.87 0.61 0.45 1.04 Black Yes No 2.08 2.11 2.06 1.31 1.18 1.16 0.85 0.92 Black No Yes 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 Black No No 0.86 0.92 1.09 1.04 1.40 1.16 1.34 1.17 Hispanic Yes Yes 1.93 1.92 1.96 1.41 1.39 1.42 1.04 1.23 Hispanic Yes No 1.83 1.93 2.03 0.91 0.96 0.98 0.94 0.92 Hispanic No Yes 0.33 0.54 0.62 1.23 0.59 1.08 1.56 1.37 Hispanic No No 1.05 1.28 1.39 1.24 1.31 1.17 1.52 1.41 Other Yes Yes 0.00 1.00 0.23 0.96 1.39 0.72 1.39 1.24 Other Yes No 1.11 1.10 0.99 1.15 1.03 1.11 0.78 0.69 Other No Yes 0.24 0.29 0.00 0.46 0.64 0.44 0.80 0.00 Other No No 0.00 0.77 1.16 1.09 1.22 2.00 1.13 1.65

Note: Not all race/ethnicity subgroups are represented due to limited information provided when evaluating interactions among (i.e., White, Black, Hispanic, Other), free/reduced lunch status (eligible or ineligible), and English language learner (identified or not identified). Students in grades 11 and 12 use the grade 10 distribution of ability scores. FRL = Free/reduced price lunch. ELL = English language learners.

49 TECHNICAL MANUAL FOR RAPID V5.0: GRADES 3-12 Appendix B Table B1: Marginal Reliability for RAPID Screening Tasks of Vocabulary Knowledge, Word Recognition, Syntactic Knowledge, and Reading Comprehension at the Fall, Winter, and Spring Administrations

Word Recognition Vocabulary Knowledge Syntactic Knowledge Reading Comprehension Grade Fall Winter Spring Fall Winter Spring Fall Winter Spring Fall Winter Spring 3 .73 .85 .89 .84 .86 .87 .85 .87 .89 .85 .86 .83 4 .86 .84 .88 .81 .83 .86 .88 .87 .88 .76 .85 .89 5 .87 .84 .90 .87 .87 .88 .87 .88 .90 .80 .83 .90 6 .86 .85 .91 .85 .85 .86 .88 .89 .91 .84 .87 .91 7 .86 .86 .91 .85 .85 .86 .88 .89 .91 .78 .83 .91 8 .87 .83 .92 .83 .84 .84 .91 .88 .92 .81 .85 .92 9 .88 .80 .91 .85 .82 .86 .91 .87 .90 .67 .78 .91 10 .88 .78 .90 .85 .81 .84 .91 .87 .90 .76 .82 .92 All Grades .92 .88 .93 .91 .89 .90 .93 .92 .93 .86 .88 .93

Note: Reliability coefficients for the Fall and Winter Reading Comprehension scores are reflective of fixed item administrations. Spring reliability coefficients for Reading Comprehension are reflective of performance on the CAT version. Marginal reliability coefficients for Vocabulary and Word Recognition are reflective of CAT versions of the assessments.

50 TECHNICAL MANUAL FOR RAPID V5.0: GRADES 3-12

Table B2: Marginal Reliability Coefficients for RAPID Reading Comprehension Task by Grade and Subgroup

Grade Group Reliability Grade Group Reliability NO Free or Reduced Lunch 0.90 NO Free or Reduced Lunch 0.96 Free or Reduced Lunch 0.90 Free or Reduced Lunch 0.96 Non-ELL 0.93 Non-ELL 0.97 3 7 ELL 0.65 ELL 0.87 No Minority 0.90 No Minority 0.96 Minority 0.91 Minority 0.97 NO Free or Reduced Lunch 0.89 NO Free or Reduced Lunch 0.96 Free or Reduced Lunch 0.89 Free or Reduced Lunch 0.96 Non-ELL 0.92 Non-ELL 0.97 4 8 ELL 0.65 ELL 0.88 No Minority 0.88 No Minority 0.95 Minority 0.90 Minority 0.97 NO Free or Reduced Lunch 0.92 NO Free or Reduced Lunch 0.95 Free or Reduced Lunch 0.94 Free or Reduced Lunch 0.95 Non-ELL 0.95 Non-ELL 0.96 5 9 ELL 0.79 ELL 0.69 No Minority 0.93 No Minority 0.94 Minority 0.94 Minority 0.95 NO Free or Reduced Lunch 0.95 NO Free or Reduced Lunch 0.90 Free or Reduced Lunch 0.95 Free or Reduced Lunch 0.89 Non-ELL 0.97 Non-ELL 0.92 6 10 ELL 0.87 ELL 0.60 No Minority 0.95 No Minority 0.89 Minority 0.96 Minority 0.90

51 TECHNICAL MANUAL FOR RAPID V5.0: GRADES 3-12

Table B3: Mean Performance Scores and Standard Error of Measurement (SEM) for Word Recognition

Fall Winter Spring Grade N Mean SEM Mean SEM Mean SEM 3 470 341.36 29.72 351.25 29.79 377.59 24.21 4 491 407.69 31.06 405.81 30.43 427.49 29.73 5 426 437.77 30.92 440.94 30.42 466.91 27.06 6 646 465.32 31.28 458.53 31.06 490.20 26.41 7 634 498.42 32.22 482.32 31.74 518.74 27.85 8 690 531.50 32.88 515.55 36.63 555.32 27.06 9 1017 543.01 33.21 543.53 43.68 567.72 29.29 10 916 574.34 33.96 558.00 47.27 591.01 32.76

Table B4: Mean Performance Scores and Standard Error of Measurement (SEM) for Vocabulary Knowledge

Fall Winter Spring Grade N Mean SEM Mean SEM Mean SEM 3 466 380.28 29.30 393.07 27.98 413.82 25.91 4 486 431.77 28.42 439.80 28.63 453.59 26.85 5 423 469.14 29.17 473.85 28.12 482.07 26.89 6 639 492.40 29.23 498.09 29.17 505.10 27.05 7 632 521.95 29.24 518.13 29.34 529.92 26.97 8 681 550.11 29.60 540.88 30.88 551.98 29.40 9 1014 555.66 29.40 560.26 32.00 562.86 28.62 10 887 571.88 30.28 575.32 36.19 574.38 30.44

52 TECHNICAL MANUAL FOR RAPID V5.0: GRADES 3-12

Table B5: Mean Performance Scores and Standard Error of Measurement (SEM) for Syntactic Knowledge

Fall Winter Spring Grade N Mean SEM Mean SEM Mean SEM 3 377 328.84 30.80 358.06 30.58 402.12 25.29 4 376 403.74 30.06 417.15 30.80 452.63 24.85 5 340 430.52 30.12 452.58 30.82 483.09 25.29 6 383 456.01 31.18 473.15 31.59 505.59 25.04 7 396 510.01 30.40 504.94 31.41 529.24 25.49 8 380 523.01 30.16 533.04 34.28 554.57 25.73 9 457 554.38 32.05 551.09 36.27 571.61 27.52 10 443 554.98 31.07 549.89 38.55 562.49 28.15

Table B6: Mean Performance Scores and Standard Error of Measurement (SEM) for Reading Comprehension

Spring Grade N Mean SEM 3 325 386.03 28.69 4 322 440.07 32.96 5 302 497.25 36.49 6 431 499.96 37.63 7 426 524.45 39.67 8 461 571.71 48.61 9 703 583.06 39.26 10 626 589.72 44.65

Note: Data is only provided for spring due to the adaptive version only being administered in the spring.

53 TECHNICAL MANUAL FOR RAPID V5.0: GRADES 3-12

Table B7: Mean Performance Scores and Standard Error of Measurement (SEM) for Reading Comprehension Task by Grade and Subgroup

Grade Group Mean SEM Grade Group Mean SEM NO Free or Reduced Lunch 409.87 22.59 NO Free or Reduced Lunch 581.90 28.46 Free or Reduced Lunch 358.26 17.04 Free or Reduced Lunch 480.89 23.02 Non-ELL 381.20 17.33 Non-ELL 536.22 23.02 3 7 ELL 334.97 21.18 ELL 419.88 26.83 No Minority 392.51 23.02 No Minority 559.51 30.00 Minority 367.55 17.60 Minority 500.76 23.67 NO Free or Reduced Lunch 483.54 21.22 NO Free or Reduced Lunch 622.65 28.98 Free or Reduced Lunch 434.67 17.33 Free or Reduced Lunch 521.50 24.70 Non-ELL 460.40 16.73 Non-ELL 579.66 23.88 4 8 ELL 393.68 23.86 ELL 448.25 28.28 No Minority 470.64 21.44 No Minority 599.37 30.33 Minority 444.05 18.45 Minority 541.00 25.10 NO Free or Reduced Lunch 512.59 23.67 NO Free or Reduced Lunch 615.00 30.82 Free or Reduced Lunch 446.16 20.74 Free or Reduced Lunch 534.30 25.49 Non-ELL 483.44 19.74 Non-ELL 574.64 24.50 5 9 ELL 396.35 24.07 ELL 461.80 30.33 No Minority 497.68 24.30 No Minority 591.27 31.94 Minority 457.17 21.22 Minority 553.30 26.08 NO Free or Reduced Lunch 549.79 23.67 NO Free or Reduced Lunch 592.87 18.72 Free or Reduced Lunch 478.95 20 Free or Reduced Lunch 566.28 16.75 Non-ELL 521.07 19.23 Non-ELL 583.13 15.48 6 10 ELL 419.68 24.09 ELL 540.15 20.03 No Minority 534.36 24.50 No Minority 591.88 19.50 Minority 491.21 20.50 Minority 570.84 16.43

54 TECHNICAL MANUAL FOR RAPID V5.0: GRADES 3-12

Table B8: Test-retest correlations for Vocabulary Knowledge, Word Recognition, Syntactic Knowledge, and Reading Comprehension

Word Recognition Vocabulary Knowledge Syntactic Knowledge Reading Comprehension Fall- Winter- Fall- Fall- Winter- Fall- Fall- Winter- Fall- Fall- Winter- Fall- Grade Winter Spring Spring Winter Spring Spring Winter Spring Spring Winter Spring Spring 3 .46 .51 .31 .59 .61 .44 .49 .55 .48 .74 .66 .66 4 .59 .62 .45 .58 .62 .51 .62 .70 .56 .83 .77 .71 5 .63 .73 .64 .75 .74 .65 .68 .75 .68 .83 .77 .73 6 .59 .65 .66 .60 .72 .51 .63 .69 .65 .85 .80 .77 7 .65 .69 .73 .66 .69 .54 .68 .74 .69 .80 .79 .73 8 .66 .72 .74 .63 .67 .63 .66 .76 .70 .81 .79 .71 9 .65 .68 .76 .65 .64 .65 .70 .73 .80 .77 .72 .65 10 .69 .70 .80 .62 .70 .64 .67 .70 .72 .75 .74 .66 All .46 .51 .31 .59 .61 .44 .49 .55 .48 .74 .66 .66 Grades

55 TECHNICAL MANUAL FOR RAPID V5.0: GRADES 3-12 Appendix C Table C1: Fit statistics by grade and form for the Word Recognition Task

RMSEA RMSEA RMSEA Grade Form χ² df p-value RMSEA LB UB p-value CFI TLI 3 A 233.54 152 0.000 0.042 0.031 0.052 0.91 0.93 0.92 B 130.20 104 0.042 0.027 0.006 0.041 1.00 0.96 0.95 4 A 99.27 65 0.004 0.044 0.025 0.061 0.71 0.90 0.87 B 135.26 119 0.146 0.021 0.000 0.036 1.00 0.95 0.94 5 A 173.02 152 0.117 0.020 0.000 0.030 1.00 0.96 0.95 B 81.14 65 0.085 0.027 0.000 0.044 0.99 0.94 0.93 6 A 478.14 377 0.000 0.020 0.014 0.026 1.00 0.93 0.93 B 425.31 350 0.004 0.018 0.011 0.024 1.00 0.94 0.94 7 A 189.75 152 0.020 0.029 0.012 0.041 1.00 0.90 0.89 B 86.31 90 0.590 0.000 0.000 0.028 1.00 1.00 1.00 8 A 179.94 152 0.060 0.025 0.000 0.038 1.00 0.91 0.90 B 154.74 135 0.118 0.022 0.000 0.036 1.00 0.95 0.94 9 A 198.25 152 0.007 0.024 0.013 0.032 1.00 0.96 0.95 B 140.16 152 0.745 0.000 0.000 0.016 1.00 1.00 1.00 10 A 196.33 152 0.009 0.025 0.013 0.034 1.00 0.92 0.91 B 102.48 77 0.028 0.029 0.010 0.040 1.00 0.88 0.86 C 404.31 377 0.159 0.017 0.000 0.029 1.00 0.95 0.94

Note: df = degrees of freedom; RMSEA = root mean square error of approximation; LB = lower bound; UB = upper bound; CFI = comparative fit index; TLI = Tucker-Lewis index.

56 TECHNICAL MANUAL FOR RAPID V5.0: GRADES 3-12

Table C2: Fit statistics by grade and form for the Vocabulary Knowledge Task

RMSEA RMSEA RMSEA Grade Form χ² df p-value RMSEA LB UB p-value CFI TLI 3 A 202.51 170 0.045 0.020 0.000 0.032 1.00 0.96 0.96 B 175.65 152 0.092 0.019 0.000 0.031 1.00 0.97 0.96 4 A 195.50 189 0.358 0.009 0.000 0.022 1.00 0.99 0.99 B 214.65 189 0.097 0.017 0.000 0.027 1.00 0.97 0.97 5 A 199.62 189 0.284 0.011 0.000 0.024 1.00 0.98 0.98 B 169.92 170 0.487 0.000 0.000 0.022 1.00 1.00 1.00 6 A 385.84 377 0.366 0.006 0.000 0.016 1.00 0.99 0.99 B 441.40 377 0.012 0.017 0.008 0.023 1.00 0.96 0.96 7 A 207.17 189 0.174 0.014 0.000 0.025 1.00 0.95 0.94 B 219.36 189 0.064 0.018 0.000 0.028 1.00 0.98 0.98 8 A 216.55 189 0.083 0.017 0.000 0.027 1.00 0.97 0.97 B 228.64 189 0.026 0.021 0.008 0.029 1.00 0.94 0.93 9 A 215.70 189 0.089 0.014 0.000 0.023 1.00 0.98 0.98 B 225.72 189 0.035 0.017 0.005 0.002 1.00 0.96 0.96 10 A 204.25 189 0.212 0.012 0.000 0.022 1.00 0.98 0.98 B 232.27 170 0.001 0.028 0.018 0.037 1.00 0.89 0.88

Note: df = degrees of freedom; RMSEA = root mean square error of approximation; LB = lower bound; UB = upper bound; CFI = comparative fit index; TLI = Tucker-Lewis index.

57 TECHNICAL MANUAL FOR RAPID V5.0: GRADES 3-12

Table C3: Fit statistics by grade and form for the Syntactic Knowledge Task

RMSEA RMSEA RMSEA Grade Form χ² df p-value RMSEA LB UB p-value CFI TLI 3 A 189.18 170 0.149 0.011 0.000 0.019 1.00 0.94 0.93 B 198.78 152 0.007 0.018 0.010 0.024 1.00 0.96 0.96 4 A 188.69 135 0.001 0.022 0.014 0.029 1.00 0.90 0.88 B 167.71 152 0.182 0.011 0.000 0.020 1.00 0.97 0.97 5 A 211.22 170 0.017 0.016 0.007 0.022 1.00 0.92 0.91 B 177.81 152 0.075 0.013 0.000 0.021 1.00 0.97 0.96 6 A 205.98 170 0.031 0.160 0.005 0.023 1.00 0.96 0.95 B 293.34 230 0.003 0.018 0.011 0.024 1.00 0.95 0.94 C 231.39 170 0.001 0.020 0.013 0.027 1.00 0.93 0.93 7 A 160.33 170 0.691 0.000 0.000 0.015 1.00 1.00 1.00 B 176.75 170 0.345 0.008 0.000 0.020 1.00 0.98 0.97 8 A 304.36 170 0.000 0.036 0.029 0.042 1.00 0.82 0.80 B 275.77 135 0.000 0.041 0.034 0.048 0.98 0.77 0.74 9 A 184.00 170 0.219 0.009 0.000 0.017 1.00 0.99 0.99 B 221.00 170 0.005 0.017 0.010 0.023 1.00 0.92 0.91 10 A 199.47 170 0.061 0.014 0.000 0.022 1.00 0.93 0.93 B 160.32 135 0.068 0.015 0.000 0.023 1.00 0.88 0.86

Note: df = degrees of freedom; RMSEA = root mean square error of approximation; LB = lower bound; UB = upper bound; CFI = comparative fit index; TLI = Tucker-Lewis index.

58 TECHNICAL MANUAL FOR RAPID V5.0: GRADES 3-12

Table C4: AIC and BIC values for the unidimensional and testlet models in Reading Comprehension by grade

Grade Model AIC BIC adjusted-BIC 3 Unidimensional 103845 106019 104851 Testlet 103672 106928 105177 4 Unidimensional 113842 115987 114830 Testlet 113553 116765 115033 5 Unidimensional 101720 130349 102539 Testlet 101471 104130 102700 6 Unidimensional 151414 153927 152649 Testlet 150809 154579 152663 7 Unidimensional 121206 123155 122158 Testlet - - - 8 Unidimensional 141907 144093 142981 Testlet 141541 144820 143153 9 Unidimensional 143848 146261 145041 Testlet 143673 147293 145463 10 Unidimensional 122108 124454 123259 Testlet 121811 125330 123538

Note: Grade 7 Testlet model did not converge.

59 TECHNICAL MANUAL FOR RAPID V5.0: GRADES 3-12

Table C5: Bivariate Associations among RAPID Tasks

Grade Measure Word Recognition Vocabulary Knowledge Syntactic Knowledge Reading Comprehension 3 Word Recognition - Vocabulary Knowledge .37 - Syntactic Knowledge .30 .38 - Reading Comprehension .42 .60 .48 - 4 Word Recognition - Vocabulary Knowledge .30 - Syntactic Knowledge .29 .35 - Reading Comprehension .43 .42 .52 - 5 Word Recognition - Vocabulary Knowledge .37 - Syntactic Knowledge .31 .44 - Reading Comprehension .40 .58 .57 - 6 Word Recognition - Vocabulary Knowledge .36 - Syntactic Knowledge .36 .45 - Reading Comprehension .48 .54 .58 - 7 Word Recognition - Vocabulary Knowledge .38 - Syntactic Knowledge .42 .44 - Reading Comprehension .45 .46 .60 - 8 Word Recognition - Vocabulary Knowledge .40 - Syntactic Knowledge .46 .44 - Reading Comprehension .49 .49 .59 - 9 Word Recognition - Vocabulary Knowledge .53 - Syntactic Knowledge .54 .58 - Reading Comprehension .55 .53 .63 - 10 Word Recognition - Vocabulary Knowledge .51 - Syntactic Knowledge .55 .55 - Reading Comprehension .49 .50 .59 -

60 TECHNICAL MANUAL FOR RAPID V5.0: GRADES 3-12

Table C6: Bivariate Correlations between RAPID Screening Tasks and Stanford 10. Percent Variance Explained in Stanford 10 by RAPID Vocabulary, Word Recognition, and Reading Comprehension

Word Vocabulary Syntactic Reading Grade Recognition Knowledge Knowledge Comprehension Total R² 3 .43 .56 .39 .74 .61 4 .39 .45 .47 .71 .58 5 .41 .57 .55 .74 .61 6 .46 .53 .55 .71 .54 7 .43 .43 .56 .66 .48 8 .47 .46 .61 .67 .51 9 .55 .51 .60 .60 .48 10 .51 .47 .60 .57 .42

Table C7: Classification Accuracy of the Reading Success Probability (RSP) in Grades 3-12 using .70 Cut-Points

Cut-Point Grade SE SP PPP NPP OCC Base Rate .70 3 .85 .69 .66 .87 .76 .41 4 .77 .74 .59 .88 .75 .32 5 .83 .76 .65 .89 .78 .35 6 .92 .56 .68 .87 .86 .50 7 .91 .60 .61 .91 .73 .40 8 .85 .67 .62 .88 .74 .39 9 .76 .69 .45 .90 .71 .25 10 .64 .74 .49 .84 .71 .28

Note: SE= Sensitivity, SP = Specificity, PPP = Positive Predictive Power, NPP = Negative Predictive Power, OCC = Overall Correct Classification. Students in Grades 11 and 12 are classified according to Grade 10 criteria.

Note: RSP cut-off is .70. RSP scores are based on student performance at the winter administration.

61 TECHNICAL MANUAL FOR RAPID V5.0: GRADES 3-12

Table C8: Negative Predictive Power of the Reading Success Probability (RSP) in Grades 3-10 using 3 Cut-Points

RSP Categories Grade ≤30 31-69 ≥70 3 .76 .75 .92 4 .70 .70 .92 5 .79 .82 .85 6 .76 .79 .86 7 .72 .77 .88 8 .75 .82 .80 9 .79 .77 .85 10 .73 .70 .89

Table C9: Classification Accuracy in Grades 3-10 for the Reading Comprehension Task for students at ≤30th and ≤15th percentiles

Percentile Grade FPR FNR Specificity Sensitivity PPP NPP Overall AUC 95% Confidence Accuracy Interval Lower Upper ≤30th 3 .12 .39 .61 .88 .80 .75 .77 .89 .86 .91 ≤30th 4 .13 .47 .53 .88 .73 .75 .74 .84 .80 .88 ≤30th 5 .02 .34 .66 .88 .78 .79 .79 .89 .86 .92 ≤30th 6 .16 .33 .67 .85 .68 .84 .79 .86 .83 .88 ≤30th 7 .12 .46 .54 .88 .76 .73 .74 .83 .80 .86 ≤30th 8 .08 .43 .57 .91 .84 .74 .77 .84 .81 .87 ≤30th 9 .11 .58 .42 .89 .73 .70 .71 .78 .74 .83 ≤30th 10 .11 .56 .44 .89 .71 .74 .73 .79 .75 .83 ≤15th 3 .13 .57 .43 .87 .33 .91 .82 .86 .82 .89 ≤15th 4 .04 .59 .41 .96 .71 .87 .85 .89 .84 .93 ≤15th 5 .05 .51 .49 .95 .72 .88 .86 .89 .86 .93 ≤15th 6 .08 .39 .61 .92 .63 .92 .87 .90 .88 .93 ≤15th 7 .13 .54 .46 .87 .35 .92 .82 .83 80 .86

62 TECHNICAL MANUAL FOR RAPID V5.0: GRADES 3-12

Percentile Grade FPR FNR Specificity Sensitivity PPP NPP Overall AUC 95% Confidence Accuracy Interval Lower Upper ≤15th 8 .08 .76 .39 .92 .54 .86 .82 .83 .80 .87 ≤15th 9 .09 .79 .21 .91 .25 .89 .82 .76 .71 .81 ≤15th 10 .05 .72 .28 .95 .51 .89 .86 .84 .79 .89

63 TECHNICAL MANUAL FOR RAPID V5.0: GRADES 3-12

Table C10: Differential Accuracy for RAPID Screening Tasks by Grade: Black-White (BW)

Grade Parameter df Estimate SE χ² p-value 3 Intercept 1 0.15 0.28 0.30 0.580 RSP 1 4.64 1.34 11.86 <0.001 BW 1 -0.60 0.34 3.06 0.080 RSP *BW 1 -2.28 1.42 2.56 0.109 4 Intercept 1 -0.60 0.34 3.05 0.080 RSP 1 2.99 0.48 38.28 <0.001 BW 1 0.45 0.41 1.19 0.274 RSP *BW 1 -1.67 0.60 7.61 0.005 5 Intercept 1 -0.37 0.26 1.99 0.157 RSP 1 3.83 0.63 36.26 <0.001 BW 1 -0.22 0.35 0.41 0.517 RSP *BW 1 -1.43 0.74 3.70 0.054 6 Intercept 1 -0.09 0.20 0.19 0.657 RSP 1 2.82 0.46 36.31 <0.001 BW 1 -0.88 0.32 7.21 0.007 RSP *BW 1 -0.27 0.80 0.11 0.735 7 Intercept 1 -0.39 0.22 3.23 0.071 RSP 1 3.27 0.50 42.28 <0.001 BW 1 -0.00 0.33 0.00 0.985 RSP *BW 1 -0.39 0.78 0.25 0.612 8 Intercept 1 -0.20 0.25 0.67 0.410 RSP 1 2.42 0.43 31.30 <0.001 BW 1 0.11 0.38 0.08 0.771 RSP *BW 1 -1.04 0.70 2.24 0.134 9 Intercept 1 0.24 0.24 0.94 0.329 RSP 1 2.66 0.46 32.48 <0.001 BW 1 -0.42 0.37 1.30 0.253 RSP *BW 1 -0.26 0.64 0.17 0.679 10 Intercept 1 0.55 0.27 4.11 0.042 RSP 1 1.82 0.37 24.04 <0.001 BW 1 -0.79 0.37 4.52 0.033 Note: RSP cut-off is .70. RSP scores are based on student performance RSP *BW 1 0.48 0.56 0.74 0.387 at the winter administration.

64 TECHNICAL MANUAL FOR RAPID V5.0: GRADES 3-12

Table C11: Differential Accuracy for Screening Tasks by Grade: Hispanic-White (HW)

Grade Parameter df Estimate SE χ² p-value 3 Intercept 1 0.15 0.28 0.30 0.580 RSP 1 4.64 1.34 11.86 <0.001 HW 1 -0.79 0.31 6.32 0.011 RSP*HW 1 -1.70 1.39 1.49 0.222 4 Intercept 1 -0.60 0.34 3.05 0.080 RSP 1 2.99 0.48 38.28 <0.001 HW 1 0.19 0.37 0.25 0.610 RSP*HW 1 -0.46 0.55 0.69 0.405 5 Intercept 1 -0.37 0.26 1.99 0.157 RSP 1 3.83 0.63 36.26 <0.001 HW 1 -0.33 0.31 1.11 0.291 RSP*HW 1 -0.86 0.69 1.53 0.215 6 Intercept 1 -0.09 0.20 0.19 0.657 RSP 1 2.82 0.46 36.31 <0.001 HW 1 -0.80 0.25 9.98 <0.001 RSP*HW 1 -0.08 0.58 0.02 0.886 7 Intercept 1 -0.39 0.22 3.23 0.071 RSP 1 3.27 0.50 42.28 <0.001 HW 1 -0.15 0.27 0.30 0.577 RSP*HW 1 -0.63 0.61 1.08 0.298 8 Intercept 1 -0.20 0.25 0.67 0.410 RSP 1 2.42 0.43 31.30 <0.001 HW 1 -0.47 0.31 2.29 0.129 RSP*HW 1 0.77 0.60 1.60 0.205 9 Intercept 1 0.24 0.24 0.94 0.329 RSP 1 2.66 0.46 32.48 <0.001 HW 1 -0.02 0.31 0.00 0.934 RSP*HW 1 -0.56 0.57 0.97 0.324 10 Intercept 1 0.55 0.27 4.11 0.042 RSP 1 1.82 0.37 24.04 <0.001 HW 1 -0.66 0.33 4.01 0.045 Note: RSP cut-off is .70. RSP scores are based on student performance RSP*HW 1 0.53 0.48 1.21 0.270 at the winter administration.

65 TECHNICAL MANUAL FOR RAPID V5.0: GRADES 3-12

Table C12: Differential Accuracy for RAPID Screening Tasks by Grade: English Language Learners (ELL)

Grade Parameter df Estimate SE χ² p-value 3 Intercept 1 -0.23 0.11 4.09 0.043 RSP 1 2.74 0.27 102.81 <0.001 ELL 1 -1.18 0.30 14.91 <0.001 RSP*ELL 1 13.93 743.00 0.00 0.985 4 Intercept 1 -0.10 0.14 0.58 0.445 RSP 1 2.10 0.21 99.65 <0.001 ELL 1 -0.99 0.29 11.04 <0.001 RSP*ELL 1 0.22 0.88 0.06 0.802 5 Intercept 1 -0.49 0.14 12.17 0.000 RSP 1 2.97 0.23 159.44 <0.001 ELL 1 -0.42 0.26 2.63 0.104 RSP*ELL 1 -1.00 0.66 2.25 0.133 6 Intercept 1 -0.32 0.12 7.17 0.007 RSP 1 2.61 0.26 95.81 <0.001 ELL 1 -1.71 0.32 28.18 <0.001 RSP*ELL 1 -0.58 0.83 0.50 0.478 7 Intercept 1 -0.13 0.12 1.10 0.293 RSP 1 2.80 0.28 94.38 <0.001 ELL 1 -1.51 0.31 23.75 <0.001 RSP*ELL 1 -0.55 0.70 0.60 0.437 8 Intercept 1 0.04 0.15 0.08 0.772 RSP 1 2.23 0.27 65.49 <0.001 ELL 1 -1.66 0.33 25.51 <0.001 RSP*ELL 1 0.57 1.03 0.30 0.582 9 Intercept 1 0.19 0.14 1.85 0.173 RSP 1 2.31 0.23 95.15 <0.001 ELL 1 -0.44 0.35 1.61 0.204 RSP*ELL 1 -1.66 0.92 3.25 0.071 10 Intercept 1 0.17 0.14 1.40 0.236 RSP 1 2.21 0.21 105.95 <0.001 ELL 1 -0.88 0.33 6.75 0.009 Note: RSP cut-off is .70. RSP scores are based on student performance RSP*ELL 1 -1.25 0.70 3.19 0.073 at the winter administration.

66 TECHNICAL MANUAL FOR RAPID V5.0: GRADES 3-12

Table C13: Differential Accuracy for Screening Tasks by Grade: Free or Reduced Price Lunch (FRL)

Grade Parameter df Estimate SE χ² p-value 3 Intercept 1 0.83 0.32 6.47 0.011 RSP 1 3.04 0.85 12.65 <0.001 FRL 1 -1.46 0.34 17.84 <0.001 RSP*FRL 1 -0.18 0.90 0.04 0.836 4 Intercept 1 1.04 0.42 6.00 0.014 RSP 1 1.54 0.54 7.93 0.004 FRL 1 -1.55 0.44 12.26 <0.001 RSP*FRL 1 0.73 0.58 1.54 0.214 5 Intercept 1 -0.12 0.35 0.12 0.721 RSP 1 2.55 0.47 28.86 <0.001 FRL 1 -0.55 0.37 2.15 0.141 RSP*FRL 1 0.49 0.53 0.85 0.354 6 Intercept 1 -0.31 0.24 1.65 0.197 RSP 1 2.61 0.42 38.51 <0.001 FRL 1 -0.45 0.27 2.84 0.091 RSP*FRL 1 0.14 0.52 0.07 0.780 7 Intercept 1 0.12 0.23 0.29 0.588 RSP 1 3.12 0.53 34.83 <0.001 FRL 1 -0.74 0.27 7.69 0.005 RSP*FRL 1 -0.50 0.60 0.70 0.401 8 Intercept 1 0.14 0.28 0.26 0.606 RSP 1 2.25 0.48 21.33 <0.001 FRL 1 -0.70 0.32 4.72 0.029 RSP*FRL 1 0.44 0.57 0.60 0.436 9 Intercept 1 0.27 0.22 1.44 0.230 RSP 1 2.54 0.37 46.89 <0.001 FRL 1 -0.22 0.27 0.67 0.410 RSP*FRL 1 -0.44 0.46 0.91 0.338 10 Intercept 1 0.02 0.20 0.01 0.890 RSP 1 2.63 0.31 70.35 <0.001 FRL 1 -0.04 0.26 0.02 0.863 Note: RSP cut-off is .70. RSP scores are based on student performance RSP*FRL 1 -0.68 0.40 2.86 0.090 at the winter administration.

67 TECHNICAL MANUAL FOR RAPID V5.0: GRADES 3-12

Table C14: Differential Accuracy for RAPID Screening Tasks by Grade: Black-White (BW)

Grade Parameter df Estimate SE χ² p-value 3 Intercept 1 1.01 0.26 15.57 <.001 RSP (<=30) 1 -1.39 0.31 19.78 <.001 RSP (31-69) 1 -0.19 0.33 0.32 0.570 BW 1 -1.27 0.40 10.24 0.001 BW*RSP (<=30) 1 0.04 0.52 0.00 0.931 BW*RSP (31-69) 1 -0.02 0.51 0.00 0.965 4 Intercept 1 0.35 0.23 2.35 0.126 RSP (<=30) 1 -1.73 0.32 29.07 <.001 RSP (31-69) 1 0.23 0.30 0.62 0.432 BW 1 3.33 153.0 0.00 0.983 BW*RSP (<=30) 1 -4.30 153.0 0.00 0.978 BW*RSP (31-69) 1 -4.72 153.0 0.00 0.976 5 Intercept 1 0.39 0.25 2.44 0.118 RSP (<=30) 1 -2.43 0.35 47.22 <.001 RSP (31-69) 1 -0.13 0.31 0.18 0.674 BW 1 -0.50 0.39 1.62 0.204 BW*RSP (<=30) 1 0.99 0.52 3.53 0.061 BW*RSP (31-69) 1 -0.00 0.52 0.00 0.987 6 Intercept 1 0.37 0.20 3.32 0.069 RSP (<=30) 1 -1.89 0.29 41.17 <.001 RSP (31-69) 1 0.25 0.28 0.80 0.372 BW 1 -1.12 0.55 4.09 0.043 BW*RSP (<=30) 1 -0.69 0.84 0.69 0.406 BW*RSP (31-69) 1 -0.72 0.66 1.21 0.271 7 Intercept 1 0.45 0.21 4.65 0.031 RSP (<=30) 1 -1.77 0.29 37.81 <.001 RSP (31-69) 1 0.10 0.28 0.14 0.71 BW 1 -0.74 0.37 3.90 0.048 BW*RSP (<=30) 1 0.20 0.51 0.15 0.694 BW*RSP (31-69) 1 -0.36 0.50 0.52 0.471 8 Intercept 1 0.16 0.27 0.35 0.556 RSP (<=30) 1 -1.81 0.41 19.39 <.001

68 TECHNICAL MANUAL FOR RAPID V5.0: GRADES 3-12

Grade Parameter df Estimate SE χ² p-value RSP (31-69) 1 0.18 0.36 0.25 0.615 BW 1 -0.55 0.39 1.92 0.167 BW*RSP (<=30) 1 0.98 0.58 2.82 0.093 BW*RSP (31-69) 1 0.11 0.53 0.05 0.829 9 Intercept 1 0.25 0.31 0.63 0.427 RSP (<=30) 1 -2.41 0.47 26.20 <.001 RSP (31-69) 1 0.04 0.38 0.01 0.917 BW 1 -4.32 77.91 0.00 0.956 BW*RSP (<=30) 1 -7.73 155.8 0.00 0.961 BW*RSP (31-69) 1 3.82 77.91 0.00 0.961 10 Intercept 1 0.23 0.26 0.81 0.370 RSP (<=30) 1 -1.96 0.38 26.25 <.001 RSP (31-69) 1 -0.00 0.31 0.00 0.976 BW 1 -0.13 0.39 0.11 0.740 BW*RSP (<=30) 1 0.55 0.55 1.02 0.313 BW*RSP (31-69) 1 -0.09 0.48 0.04 0.840

69 TECHNICAL MANUAL FOR RAPID V5.0: GRADES 3-12

Table C15: Differential Accuracy for Screening Tasks by Grade: Hispanic-White (HW)

Grade Parameter df Estimate SE χ² p-value 3 Intercept 1 1.01 0.26 15.57 <.001 RSP (<=30) 1 -1.39 0.31 19.78 <.001 RSP (31-69) 1 -0.19 0.33 0.32 0.570 HW 1 -1.08 0.35 9.67 0.002 HW*RSP (<=30) 1 -0.54 0.45 1.47 0.226 HW*RSP (31-69) 1 0.30 0.44 0.48 0.490 4 Intercept 1 0.35 0.23 2.35 0.126 RSP (<=30) 1 -1.73 0.32 29.07 <.001 RSP (31-69) 1 0.23 0.30 0.62 0.432 HW 1 0.15 0.43 0.12 0.726 HW*RSP (<=30) 1 -0.07 0.52 0.02 0.890 HW*RSP (31-69) 1 -1.02 0.51 4.02 0.050 5 Intercept 1 0.39 0.25 2.44 0.118 RSP (<=30) 1 -2.43 0.35 47.22 <.001 RSP (31-69) 1 -0.13 0.31 0.18 0.674 HW 1 -0.36 0.33 1.21 0.272 HW*RSP (<=30) 1 0.69 0.46 2.27 0.132 HW*RSP (31-69) 1 -0.16 0.42 0.14 0.704 6 Intercept 1 0.37 0.20 3.32 0.069 RSP (<=30) 1 -1.89 0.29 41.17 <.001 RSP (31-69) 1 0.25 0.28 0.80 0.372 HW 1 -0.16 0.29 0.31 0.576 HW*RSP (<=30) 1 -0.17 0.39 0.19 0.663 HW*RSP (31-69) 1 -0.24 0.37 0.43 0.513 7 Intercept 1 0.45 0.21 4.65 0.031 RSP (<=30) 1 -1.77 0.29 37.81 <.001 RSP (31-69) 1 0.10 0.28 0.14 0.706 HW 1 -0.45 0.29 2.37 0.124 HW*RSP (<=30) 1 -0.05 0.39 0.01 0.905 HW*RSP (31-69) 1 -0.31 0.38 0.69 0.405 8 Intercept 1 0.16 0.27 0.35 0.556 RSP (<=30) 1 -1.82 0.41 19.39 <.001

70 TECHNICAL MANUAL FOR RAPID V5.0: GRADES 3-12

Grade Parameter df Estimate SE χ² p-value RSP (31-69) 1 0.18 0.36 0.25 0.615 HW 1 0.14 0.37 0.13 0.716 HW*RSP (<=30) 1 -0.62 0.53 1.34 0.248 HW*RSP (31-69) 1 0.00 0.49 0.00 0.989 9 Intercept 1 0.25 0.31 0.63 0.430 RSP (<=30) 1 -2.41 0.47 26.20 <.001 RSP (31-69) 1 0.04 0.38 0.01 0.917 HW 1 -0.05 0.41 0.01 0.907 HW*RSP (<=30) 1 0.50 0.60 0.70 0.403 HW*RSP (31-69) 1 -0.13 0.51 0.07 0.791 10 Intercept 1 0.23 0.26 0.81 0.369 RSP (<=30) 1 -1.96 0.38 26.25 <.001 RSP (31-69) 1 -0.00 0.31 0.00 0.976 HW 1 -0.40 0.34 1.42 0.234 HW*RSP (<=30) 1 0.54 0.49 1.22 0.269 HW*RSP (31-69) 1 -0.13 0.42 0.10 0.751

71 TECHNICAL MANUAL FOR RAPID V5.0: GRADES 3-12

Table C16: Differential Accuracy for Screening Tasks by Grade: English Language Learners (ELL)

Grade Parameter df Estimate SE χ² p-value 3 Intercept 1 0.46 0.15 9.48 0.002 RSP (<=30) 1 -1.59 0.20 65.19 <.001 RSP (31-69) 1 -0.17 0.19 0.81 0.368 ELL 1 -1.94 0.64 9.21 0.002 ELL*RSP (<=30) 1 0.94 0.74 1.62 0.204 ELL*RSP (31-69) 1 -0.66 0.89 0.55 0.457 4 Intercept 1 0.32 0.17 3.62 0.057 RSP (<=30) 1 -1.76 0.22 61.47 <.001 RSP (31-69) 1 -0.30 0.20 2.25 0.133 ELL 1 -1.40 1.13 1.53 0.216 ELL*RSP (<=30) 1 1.09 1.28 0.73 0.394 ELL*RSP (31-69) 0 NA NA NA NA 5 Intercept 1 0.28 0.14 3.75 0.053 RSP (<=30) 1 -1.99 0.20 95.25 <.001 RSP (31-69) 1 -0.12 0.19 0.39 0.530 ELL 1 -1.55 0.68 5.25 0.022 ELL*RSP (<=30) 1 1.26 0.86 2.12 0.150 ELL*RSP (31-69) 0 NA NA NA NA 6 Intercept 1 0.27 0.14 3.83 0.050 RSP (<=30) 1 -1.95 0.19 101.47 <.001 RSP (31-69) 1 -0.03 0.17 0.02 0.879 ELL 1 -10.52 243.5 0.00 0.966 ELL*RSP (<=30) 1 9.85 243.5 0.00 0.968 ELL*RSP (31-69) 1 -3.92 337.6 0.00 0.991 7 Intercept 1 0.24 0.13 3.29 0.070 RSP (<=30) 1 -1.61 0.18 80.87 <.001 RSP (31-69) 1 -0.16 0.17 0.83 0.363 ELL 1 2.99 235.3 0.00 0.990 ELL*RSP (<=30) 1 -4.19 235.3 0.00 0.986 ELL*RSP (31-69) 1 -5.03 235.3 0.00 0.983 8 Intercept 1 0.31 0.15 4.24 0.040 RSP (<=30) 1 -1.54 0.21 51.76 <.001

72 TECHNICAL MANUAL FOR RAPID V5.0: GRADES 3-12

Grade Parameter df Estimate SE χ² p-value RSP (31-69) 1 0.11 0.20 0.30 0.581 ELL 1 3.14 286.1 0.00 0.991 ELL*RSP (<=30) 1 -5.08 286.1 0.00 0.986 ELL*RSP (31-69) 1 -4.25 286.1 0.00 0.989 9 Intercept 1 0.04 0.17 0.06 0.808 RSP (<=30) 1 -2.16 0.27 65.28 <.001 RSP (31-69) 1 0.05 0.21 0.05 0.819 ELL 1 -0.60 0.76 0.63 0.429 ELL*RSP (<=30) 1 0.85 1.13 0.56 0.452 ELL*RSP (31-69) 0 NA NA NA NA 10 Intercept 1 0.15 0.14 1.12 0.291 RSP (<=30) 1 -1.49 0.20 53.49 <.001 RSP (31-69) 1 -0.13 0.18 0.55 0.457 ELL 1 -0.53 0.75 0.49 0.483 ELL*RSP (<=30) 1 -0.49 1.09 0.20 0.653 ELL*RSP (31-69) 0 NA NA NA NA

73 TECHNICAL MANUAL FOR RAPID V5.0: GRADES 3-12

C17: Differential Accuracy for Screening Tasks by Grade: Free or Reduced Price Lunch (FRL)

Grade Parameter df Estimate SE χ² p-value 3 Intercept 1 1.44 0.38 14.33 <.001 RSP (<=30) 1 -1.78 0.45 15.64 <.001 RSP (31-69) 1 -0.67 0.48 2.49 0.110 FRL 1 -1.55 0.42 14.02 <.001 FRL*RSP (<=30) 1 0.36 0.50 0.52 0.471 FRL*RSP (31-69) 1 0.62 0.47 1.71 0.190 4 Intercept 1 0.78 0.23 11.24 <.001 RSP (<=30) 1 -1.53 0.32 22.42 <.001 RSP (31-69) 1 -0.09 0.31 0.08 0.772 FRL 1 -0.73 0.36 4.11 0.043 FRL*RSP (<=30) 1 -0.37 0.46 0.64 0.425 FRL*RSP (31-69) 1 -0.32 0.44 0.54 0.461 5 Intercept 1 0.52 0.22 5.60 0.018 RSP (<=30) 1 -1.79 0.33 29.49 <.001 RSP (31-69) 1 -0.20 0.28 0.53 0.468 FRL 1 -0.50 0.29 3.03 0.082 FRL*RSP (<=30) 1 -0.16 0.41 0.15 0.701 FRL*RSP (31-69) 1 -0.05 0.36 0.02 0.890 6 Intercept 1 -0.09 0.29 0.09 0.768 RSP (<=30) 1 -2.75 0.51 28.78 <.001 RSP (31-69) 1 0.34 0.35 0.99 0.321 FRL 1 0.07 0.33 0.04 0.836 FRL*RSP (<=30) 1 1.06 0.55 3.72 0.054 FRL*RSP (31-69) 1 -0.31 0.40 0.59 0.443 7 Intercept 1 0.43 0.20 4.67 0.031 RSP (<=30) 1 -1.28 0.28 20.62 <.001 RSP (31-69) 1 -0.43 0.25 2.87 0.090 FRL 1 -0.42 0.26 2.57 0.109 FRL*RSP (<=30) 1 -0.61 0.36 2.93 0.087 FRL*RSP (31-69) 1 0.40 0.34 1.38 0.240 8 Intercept 1 0.37 0.26 2.12 0.146 RSP (<=30) 1 -1.65 0.39 18.15 <.001

74 TECHNICAL MANUAL FOR RAPID V5.0: GRADES 3-12

Grade Parameter df Estimate SE χ² p-value RSP (31-69) 1 0.44 0.35 1.54 0.215 FRL 1 -0.29 0.31 0.88 0.347 FRL*RSP (<=30) 1 -0.24 0.46 0.27 0.600 FRL*RSP (31-69) 1 -0.38 0.43 0.80 0.372 9 Intercept 1 0.02 0.31 0.00 0.952 RSP (<=30) 1 -2.42 0.53 21.17 <.001 RSP (31-69) 1 0.10 0.37 0.07 0.786 FRL 1 -0.05 0.38 0.02 0.887 FRL*RSP (<=30) 1 0.46 0.60 0.58 0.447 FRL*RSP (31-69) 1 -0.06 0.45 0.02 0.886 10 Intercept 1 0.31 0.21 2.14 0.143 RSP (<=30) 1 -1.69 0.30 31.83 <.001 RSP (31-69) 1 -0.27 0.26 1.09 0.296 FRL 1 -0.48 0.28 2.85 0.092 FRL*RSP (<=30) 1 0.32 0.40 0.64 0.425 FRL*RSP (31-69) 1 0.38 0.35 1.17 0.279

75 TECHNICAL MANUAL FOR RAPID V5.0: GRADES 3-12

Table C18: Correlations between RAPID scores and the PPVT-IV, GJT, and TOWRE-2

TOWRE-2 TOWRE-2 Grade N RAPID Task PPVT-IV GJT Real Word Non-Word 3 251 Word Recognition 0.27 0.31 0.42 0.43 Vocabulary Knowledge 0.47 0.40 0.37 0.29 Syntactic Knowledge 0.54 0.49 0.34 0.28 4 161 Word Recognition 0.36 0.40 0.45 0.45 Vocabulary Knowledge 0.56 0.57 0.50 0.44 Syntactic Knowledge 0.60 0.61 0.35 0.33 5 167 Word Recognition 0.22 0.10 0.24 0.30 Vocabulary Knowledge 0.61 0.51 0.35 0.39 Syntactic Knowledge 0.56 0.47 0.33 0.32 6 113 Word Recognition 0.36 0.26 0.49 0.47 Vocabulary Knowledge 0.62 0.53 0.41 0.44 Syntactic Knowledge 0.52 0.44 0.20 0.20 7 72 Word Recognition 0.34 0.31 0.46 0.51 Vocabulary Knowledge 0.58 0.50 0.43 0.33 Syntactic Knowledge 0.50 0.49 0.30 0.28 8 71 Word Recognition 0.41 0.45 0.28 0.46 Vocabulary Knowledge 0.50 0.53 0.36 0.45 Syntactic Knowledge 0.74 0.51 0.33 0.47 9 227 Word Recognition 0.39 0.25 0.35 0.43 Vocabulary Knowledge 0.65 0.55 0.27 0.29 Syntactic Knowledge 0.35 0.37 0.25 0.27 10 169 Word Recognition 0.40 0.40 0.28 0.36 Vocabulary Knowledge 0.67 0.61 0.36 0.44 Syntactic Knowledge 0.52 0.56 0.34 0.38

Note: PPVT-IV = Peabody Picture Vocabulary Task – 4th Edition; GJT = Grammaticality Judgment Task, TOWRE-2 = Test of Word Reading Efficiency.

76 TECHNICAL MANUAL FOR RAPID V5.0: GRADES 3-12

Table C19: Average correlations within ranges of quantiles/percentiles by grade and task

Quantile/Percentile Range Task Outcome Grade <40 40-60 >60 Word Recognition TOWRE-2 Real Word 3 0.47 0.47 0.34 4 0.44 0.41 0.40 5 0.19 0.19 0.16 6 0.54 0.48 0.49 7 0.53 0.45 0.40 8 0.11 0.28 0.38 9 0.31 0.29 0.43 10 0.19 0.31 0.37 Word Recognition TOWRE-2 Non-Word 3 0.45 0.48 0.35 4 0.50 0.47 0.35 5 0.39 0.31 0.27 6 0.57 0.38 0.36 7 0.67 0.38 0.33 8 0.55 0.37 0.38 9 0.48 0.41 0.29 10 0.52 0.33 0.18 Vocabulary Knowledge PPVT-IV 3 0.60 0.48 0.40 4 0.60 0.50 0.42 5 0.66 0.67 0.52 6 0.67 0.58 0.54 7 0.66 0.63 0.55 8 0.52 0.34 0.25 9 0.72 0.56 0.51 10 0.72 0.70 0.54 Syntactic Knowledge GJT 3 0.44 0.52 0.52 4 0.66 0.58 0.58 5 0.50 0.52 0.40 6 0.50 0.48 0.37 7 0.71 0.41 0.50 8 0.70 0.48 0.30 9 0.39 0.38 0.47 10 0.61 0.55 0.52

77 TECHNICAL MANUAL FOR RAPID V5.0: GRADES 3-12

Table C20: Correlations between RAPID tasks and birthdate/sex

Grade Task Birthdate Sex

3 Word Recognition 0.09 0.08 Vocabulary Knowledge 0.10 0.11 Syntactic Knowledge 0.04 0.08 Reading Comprehension 0.11 0.22 4 Word Recognition 0.21 0.03 Vocabulary Knowledge 0.16 -0.02 Syntactic Knowledge 0.09 0.04 Reading Comprehension 0.14 0.14 5 Word Recognition 0.02 -0.01 Vocabulary Knowledge 0.13 -0.12 Syntactic Knowledge 0.17 -0.12 Reading Comprehension 0.23 0.13 6 Word Recognition 0.14 -0.01 Vocabulary Knowledge 0.26 -0.20 Syntactic Knowledge 0.23 -0.20 Reading Comprehension 0.28 -0.20 7 Word Recognition 0.20 0.00 Vocabulary Knowledge 0.01 -0.12 Syntactic Knowledge 0.22 0.05 Reading Comprehension 0.12 -0.06 8 Word Recognition 0.12 -0.13 Vocabulary Knowledge 0.01 -0.26 Syntactic Knowledge 0.12 -0.16 Reading Comprehension 0.09 0.04 9 Word Recognition 0.12 -0.10 Vocabulary Knowledge 0.15 -0.10 Syntactic Knowledge 0.18 0.12 Reading Comprehension 0.12 0.01 10 Word Recognition 0.14 0.02 Vocabulary Knowledge 0.20 0.04 Syntactic Knowledge 0.20 0.17 Reading Comprehension 0.18 0.10

78 TECHNICAL MANUAL FOR RAPID V5.0: GRADES 3-12 Appendix D Figure 5. Growth Chart for Word Recognition Task

79 TECHNICAL MANUAL FOR RAPID V5.0: GRADES 3-12

Figure 6. Growth Chart for Vocabulary Knowledge Task

80 TECHNICAL MANUAL FOR RAPID V5.0: GRADES 3-12

Figure 7. Growth Chart for Syntactic Knowledge Task

81 TECHNICAL MANUAL FOR RAPID V5.0: GRADES 3-12

Figure 8. Growth Chart for Reading Comprehension Task

82 TECHNICAL MANUAL FOR RAPID V5.0: GRADES 3-12

83 www.lexialearning.com 300 Baker Avenue • Suite 320 • Concord, MA 01742 tel: 800-435-3942 / 978-405-6200