Supporting Teachers Efforts to Use Assessment to Guide Instruction

Total Page:16

File Type:pdf, Size:1020Kb

Supporting Teachers Efforts to Use Assessment to Guide Instruction

Algorithm-guided Individualized Instruction page 1

Supporting Online Material

Appendix SA

The Individualizing Instruction (ISI) Intervention

The individualizing student instruction (ISI) intervention integrates several areas of research – child by instruction interactions (1-4), evidence-based instruction (5, 6), the “beat-the- odds” literature on effective schools (7-9), and teacher planning and organization (10-13).

Evidence-based instruction incorporates methods of instruction with empirical evidence of efficacy. In the “beat-the-odds” literature, researchers found characteristics common to schools that demonstrated high student achievement that were not in evidence in less effective schools.

These characteristics included a dedicated block of time for teaching reading, teaching reading to smaller groups of children with similar learning needs, and using assessment to guide instruction.

Additionally, effective teachers were masterful planners and organizers so that classroom time was used efficiently.

The ISI intervention was comprised of two components: (1) the A2i software, which computed recommended amounts and types of instruction for each child based on their assessed vocabulary and reading skills, organization and planning features, and a catalogue of instructional activities indexed to the dimensions of instruction: and (2) and professional development designed to teach teachers how to individualize reading instruction in their classroom using the A2i recommendations and planning features. Teachers who fully implemented ISI had a dedicated uninterrupted time for language arts instruction, used multiple student-grouping configurations, including homogenous reading-skill groups, in order to address Algorithm-guided Individualized Instruction page 2 the unique needs of the individual students in their classrooms. Based on classroom observations, the content of the literacy instruction was attuned to the skill level of the students in the groups and amounts and types of instruction aligned with A2i individual student recommendations.

There was an observable system in place (e.g., center chart, daily schedule) for organizing students into groups and facilitating transitions from one station or center to another. The teacher followed a daily or weekly lesson plan (e.g., group activity planner or other similar written plan), the classroom was well organized, transitions were efficient, and instruction was well paced.

Students worked independently at literacy-focused centers, with activities designed to meet their learning objectives, while the teachers worked with small groups of students to provide more intensive and scaffolded instruction. Virtually the entire language arts block was spent in meaningful literacy activities.

Assessment to Instruction Software (A2i).

A2i software has five views (Figures 1, S1 & S2): the Classroom View, the Literacy

Minutes Manager, the Group Activity Planner, Classroom Set-up and the Core Curriculum

Guide. The Classroom View (Figure 1) provides the individual instruction plan for each child for each of the instructional strategies (teacher-managed meaning-focused, teacher-managed code- focused, etc.). The algorithms that compute recommended amounts of teacher-managed code- focused and child-managed meaning-focused instruction are described below. Children are also placed into groups using an algorithm that relies on their most current letter-word reading score.

Thus, each time updated scores are entered into A2i, students’ recommended group membership

(and amount of each type of instruction) changes. Teachers select the number of groups

(typically 3 to 5 groups). They are encouraged to use the algorithm-recommended student group assignments but may make changes in the group membership if they wish. Algorithm-guided Individualized Instruction page 3

Each reading group has a recommended mean amount for each type of instruction. This amount changes whenever group membership changes. It is the mean amount for each group that we ask the teachers to target. The teachers then plan their weekly schedule, including the language arts block and other literacy activities throughout the day, using the Literacy Minutes

Manager (Figure S1). Teachers are encouraged to set a consistent daily/weekly schedule and classroom routine, which tends to enhance classroom organization and students’ behavior (7, 8,

13). Once the language arts schedule is completed in the Literacy Minutes Manager, the teacher is ready to plan daily instruction using the Group Activity Planner (Figure S2). The teacher first selects the date he or she wishes to plan and selects an instructional block (e.g., teacher-managed code-focused at 10:35 for Group 3). The page then scrolls down to reveal the teacher-managed code-focused activities that have been indexed to the school’s core curriculum (Figure S2 top right). A2i can be used with any evidence-based reading core curriculum or reading activity that can be mapped onto the dimensions of instruction. Sorting and search features permit the teacher to locate and select specific core curriculum activities (e.g., Open Court, Level 1, Unit 5).

Activities to be implemented for that day are then checked off. All planned activities turn red in the curriculum index so that the teacher can follow his or her progression through the curriculum.

Once all the instruction blocks are planned, the teacher clicks the button to print the lesson plan for the day (Figure S2 bottom). The Classroom Set-up view permits the teacher to select the number of groups, number of adults in the classroom, and the time available. The Curriculum

Guide provides a catalogue of all of the instructional activities indexed in A2i (e.g., FCRR center activities, Open Court, Reading Mastery) according to type of instruction (, child-managed meaning focused, etc.).

About the Algorithms. Algorithm-guided Individualized Instruction page 4

The algorithms are based on HLM models from first grade student and classroom data presented in the Beyond the Reading Wars paper (1).

Setting the A2i Target Outcome (TO). For the A2i algorithms to work, the target spring reading outcome (i.e., grade level and at least nine months of progress), is set based on students’ initial letter-word reading score. The minimum target reading outcome for this study was set at a grade equivalent of 2.1. The metric grade equivalent is based on a 9-month school year and represents an estimate of the performance that an average student at a specific time in the school year is expected to achieve. Thus a grade equivalent of 1.5 represents the typical performance of a first grader in January and 2.1 represents the typical performance of a second grader in late

September. Grade equivalent was selected as the metric for A2i because it is a meaningful way to portray students’ skill levels and, based on teacher feedback, more useful for teachers. For students reading above grade level, the target outcome equals their grade equivalent (GE) plus

0.9, which would be the minimum amount of expected reading skill growth in one school year.

(e.g., initial GE 1.5 + .9 = 2.4). In this way, adequate yearly growth is anticipated for all students with more than a year’s growth expected for students beginning school with lower reading scores.

The teacher-managed code-focused and child-managed meaning-focused amounts required to achieve each child’s target outcome are computed using fall letter-word reading and vocabulary scores. There are separate equations for the amount of teacher-managed code-focused and teacher-managed meaning-focused instruction. SPSS syntax to compute recommended amounts is provided below:

Teacher-managed code-focused Amount (TMCF):

TMCFa = ((TO - (.2* lw_ge))/(.05 + (.05 * lw_ge)))+ 13.

TMCF_Recommended = (TMCFa - (.82 * M)). Algorithm-guided Individualized Instruction page 5

M is the current month of the school year starting in September. Thus, M would equal 0 in

September, 1 in October, 2 in November, and so on. TO is the target spring letter-word reading outcome (described previously). Lw_ge is the most recent letter-word recognition grade equivalent score. TMCFa is the intermediary amount term. TMCF_Recommended is the amount in minutes recommended in the Classroom view of A2i. The recommended amounts change each month and each time new scores are entered and depend on the child’s target outcome (TO) and current letter-word reading score.

Child-managed meaning-focused Amount (CMMF):

CMMFa = ((3.76 - TO + (1.4 * voc_ae))/(.30)) - 14.

CMMFsl = 10-(.24*CMMFa).

CMMF_Recommended = CMMFa + .5*(CMMFsl * M).

Voc_ae is the most recent WJ picture vocabulary age equivalent score in years. CMMFa is the intermediary amount term and CMMFsl is the intermediary slope term. CMMF_Recommended is the amount recommended in the Classroom view of A2i. The amount recommended will vary depending on each child’s target outcome (TO) and vocabulary score.

Teacher-manage meaning-focused Amount (TMMF):

TMMF_Recommended = 16 - M.

TMMF_Recommended is the amount, in minutes, presented in the classroom view. Note that the amount decreases each month, based on observations in the Beyond the Reading Wars paper (1)

Child-managed code-focused Amount (CMCF):

CMCF_Recommended = 15.

CMCF_Recommended is the amount, in minutes, presented in the classroom view. The amount is the same each month. Algorithm-guided Individualized Instruction page 6

Professional Development Provided

Professional development topics included: 1) using assessment to guide instruction; 2) planning for effective instruction using A2i; 3) organizing classrooms using small groups based on learning goals; 4) implementing effective reading instruction; and 5) using research to inform instruction. During this first year, professional development efforts focused primarily on topics

1, 2, 3 and 4. Teachers participated in two workshops, spring and fall 2005. Researchers (n = 6), assigned to schools, met with individual teachers at the schools about every other week to teach them how to individualize instruction and use A2i during planning time and to act as participant observers in the classroom. Additionally, teachers met after school in monthly collaborative professional development groups. Scheduling was carefully monitored so that all teachers received the same amount of time with the researcher responsible for their training.

Algorithms for teacher-managed code-focused and child-managed meaning-focused instruction were available. Because we did not, at the time, have research to support first grade algorithms for teacher-managed meaning-focused and child-managed code-focused instruction,

A2i recommendations showed the mean amounts teachers provided in the Beyond the Reading

Wars study (1).

There were results and a number of design features that would minimize the possibility that the treatment effect was the result of the professional development alone (i.e., a Hawthorne

Effect) and not the A2i recommendations and planning tools. Although all the teachers in the treatment group received the same professional development protocol and amount of training, the degree with which they individualized instruction and how much they used the software varied. Plus, the more teachers used A2i from September to May, the greater was their students’ passage comprehension skill growth. (See Table S6). Algorithm-guided Individualized Instruction page 7

The control teachers were considered to be actively participating in the study. We conducted three video-taped classroom observations (fall, winter, and spring) in both treatment and control classrooms. All teachers, not just the treatment teachers, received the assessment results for the tests we administered. We identified children who seemed to be falling behind, based on our assessments in January 2006, and these children were brought to the teachers’ attention (both treatment and control teachers). Additionally, the control teachers received introductory information on individualizing instruction using A2i and the purpose of the study during a meeting at their schools prior to the beginning of the study.

The district was also participating in reading reform initiatives including Reading First and vocabulary interventions. Reading First is a federally funded initiative designed to improve student achievement at historically low performing schools. Schools receive funding for professional development, use an evidence-based core reading curriculum, and conduct student progress monitoring assessments. Three of the five control schools and two of the treatment schools were participating in Reading First. Moreover, Reading First-like practices were mandated throughout the district and the district had just instituted a vocabulary intervention.

Teachers (both treatment and control) were required to: provide a dedicated language arts block lasting at least 90 minutes, of which 45 minutes had to be small-group instruction, and use an evidence-based core reading curriculum. Additionally, throughout the district, schools were required to provide a school-based reading coach (i.e., a reading specialist); and assess students’ reading skills four times per year using DIBELS (a progress monitoring assessment tool), and at the end of the year using the Stanford Achievement Tests-10 (SAT-10) a nationally normed standardized test of reading comprehension. Based on our observations of the teachers’ Algorithm-guided Individualized Instruction page 8 classrooms (both treatment and control), almost all of them were utilizing a dedicated language arts block with varying amounts of small group instruction.

There were six researchers who provided professional development. Two of the researchers had BAs in early childhood and elementary education and were working on their masters’ degrees – one in reading and one in early childhood education; two of the researchers had BAs in psychology and were working on their masters’ degrees– one in community psychology and one in African American studies; and two of the researchers had advanced degrees and experience teaching. Algorithm-guided Individualized Instruction page 9

Appendix SB

Methods

Procedures for random assignment and participant selection

This study utilized a cluster randomized design with a wait-list control group. This means that all schools and teachers received the A2i software and training but the teachers in the control group schools had to wait one school year to receive the software and training. Thus the experiment was conducted in the first year of a two year study. School were matched and paired based on Reading First status, percentage of students eligible for free and reduced lunch (FARL, i.e., poverty status), third grade mean Florida Comprehensive Achievement Tests (FCAT) reading score (a state-mandated test) and first grade mean SAT10 reading comprehension score

(14) (See Table S3). One member of each school pair was randomly assigned to the treatment condition. All first grade teachers at the schools were invited to participate. Twenty-five of the teachers in the control schools and twenty-four teachers in the treatment schools agreed to participate, which represented over 80% of the teachers in the ten schools. After the onset of the study but prior to the January assessment, two teachers at treatment schools left the study, one for personal reasons and the other because he was teaching second graders during the language arts time. Their students’ scores are not included in these analyses. Parental consent was obtained for 78% of the students in the participating teachers’ classrooms. Children for whom consent was not obtained participated in the instruction but were not assessed or included in the reported results.

In general, although assignment was random, control group students began the year with stronger vocabulary and reading skills. Please see Tables S2 and S3 for scores and descriptive information for children, teachers and schools. Teacher characteristics were not significantly Algorithm-guided Individualized Instruction page 10 different for treatment and control groups except that there were more African American teachers in the treatment group [t(45)=2.08, p=.043]. Descriptive information was obtained through parent and teacher questionnaires and through the State of Florida Progress Monitoring

Reporting Network (http://www.fcrr.org/pmrn/).

Measuring Student Achievement

Students’ language and literacy skills were assessed in August 2005, January and May

2006 using a battery of language and literacy assessments. Woodcock Johnson Tests of

Achievement-III (15) were chosen because they are psychometrically strong and highly predictive of performance on state and federal achievement tests, such as the NAEP (16). The

Letter-Word Identification test assesses letter-word reading skills by asking children to recognize and name increasingly unfamiliar letters and words. The Picture Vocabulary test asks children to name pictures of increasingly unfamiliar objects. In the Passage Comprehension assessment, students read sentences and passages of increasing complexity and are asked to supply the missing word. For example, “The duck is swimming in the ___.”

We present results for the Passage Comprehension and Letter-Word Identification tests.

Students’ end of the year passage comprehension score is of particular interest as an outcome because it requires students to decode and understand what they are reading and necessitates use of vocabulary and morphosyntactic knowledge. Letter-Word Identification and Picture

Vocabulary subtest results are used by the A2i algorithms to compute recommended amounts and types of instruction.

The treatment group teachers first gained access to assessment information and algorithm recommendations provided by A2i software in September 2005. The control group teachers were provided written reports of the assessment results for their students. A2i recommendations for Algorithm-guided Individualized Instruction page 11 each child were revised using the January results. Again, January scores were provided to the teachers in the control group. Final scores were also shared.

Accumulating evidence strongly indicates that children’s socioeconomic status (SES) is closely tied to their academic achievement (17), which is why SES was one of the school matching variables prior to random assignment to condition. We used students’ eligibility for the free or reduced price school lunch (FARL) program as a proxy for their SES. Children who are eligible come from families with incomes at or below 130 percent of the poverty level for free lunch, or 185 percent, for reduced-price lunch. 130 percent of the poverty level is considered

$26,000 for a family of four; 185 percent is $37,000

(http://www.fns.usda.gov/cnd/lunch/AboutLunch/NSLPFactSheet.pdf). We created two dummy coded variables (18) where children were coded 1 if they were eligible for free or reduced price lunch and 0 if they were not. Only 5% of the children were eligible for reduced price lunch compared to 54% eligible for free lunch. The remaining children were not eligible or did not apply for the program. Data were missing for 146 children. For these children, the school-wide percentage as a proportion was used (e.g., if school-wide, 96% of children were eligible for

FARL, then the child was coded .96 for eligibility for free lunch making the assumption that the probability that they were eligible for FARL was .96 and that it was unlikely they were eligible only for the reduced price lunch plan).

Assessing Teachers’ Fidelity to the Intervention -- Individualizing Students’ Instruction in the Classroom.

A2i software automatically records the teachers’ use of the software. This includes the number of times they sign on, how long they stay online each visit, and which components of

A2i they utilize and how. The total amount of time (in minutes) teachers used A2i from Algorithm-guided Individualized Instruction page 12

September 1 through May 31 was included in the models. One school used Reading Mastery as their core curriculum, which was not fully indexed and available in A2i until November 2005 and so the five teachers at this school did not use the Group Activity Planner until then. They did, however, utilize the student recommendations and recommended groups components.

Classrooms were observed bi-weekly from December 2005 to February 2006 with researchers as participant observers. Timed field notes of teacher and student activities as well as general observations of instruction were completed during every observation. On February 17,

2006, prior to release of the January assessment results, each researcher was asked to assess the teachers’ implementation from low fidelity (score of 1) to high fidelity (score of 5) (See Table

S4). High fidelity included consistent use of small groups based on learning objectives, amount of time on instruction generally in line with A2i recommendations, individualized instruction based on student assessments, and classroom management systems that supported individualizing student instruction. A teacher with a rating of fair may have used small groups, but generally all of the groups were doing the same activity at the same time. Low fidelity indicated that the teacher was using whole class instruction and no small groups, and did not individualize student instruction based on A2i recommendations. All control teachers received a score of 0, no intervention. Algorithm-guided Individualized Instruction page 13

Appendix SC

Results

Do children who receive algorithm-guided individualized instruction demonstrate stronger

reading skill growth?

Hierarchical Linear Modeling (19, 20) was used because children were nested in classrooms. Failing to take into account the nested structure of the data may lead to incorrect standard errors. The hierarchical linear models considered are multiple linear regression models in which the residual is decomposed into a within-class residual and a between-class residual.

The regression coefficients can be interpreted as for ordinary linear multiple regression models.

Descriptive statistics are provided in Table S3. An exemplar model is presented in equation S1 and HLM results for reading comprehension are provided in Tables S5-S7. Results for letter- word recognition are provided in Table S8. Note that these results control for students’ fall letter- word, reading comprehension, and vocabulary scores, gender, free or reduced lunch status and the percentage of children eligible for FARL school-wide (classroom level). FARL was coded as two separate variables with 1 representing children with either free or reduced lunch status and 0 representing all other children.

Level-1 Model (S1)

Yij (Predicted Spring Passage Comprehension W score) = β0j + β 1j*(gender ij) +

β2j*(FARL[two variables]ij) + β3j*(Fall Letter-word Identification W score ij)+

β4j*(Fall Vocabulary W score ij)+ β5j*(Fall Passage Comprehension W score ij)+rij

Level-2 Model

β0j = γ00 + γ 01*(Treatment =1 j) + γ 02*(School % FARL =1 j) + u0j Algorithm-guided Individualized Instruction page 14

β1j = γ10

β2j = γ20

β3j = γ30

β4j = γ40

β5j = γ50

The same model was used for amount of A2i use and fidelity score, which were entered instead of the dichotomous variable, treatment (Treatment = 1, Control = 0). All continuous variables, except A2i use and fidelity score, were centered at their grand mean. Although schools were randomly assigned to treatment and control conditions, using covariates increases power

(21, 22) so we included students’ gender and FARL status. We also included fall letter-word identification, passage comprehension, and vocabulary W scores at the child level. The treatment variable or fidelity ratings such as the total time A2i was used (in minutes) from September through May (Control = 0) was entered at the teacher level. Yij is the predicted spring passage comprehension score for child i in classroom j. γ00 represents the fitted mean spring passage comprehension score for the sample holding all other variables constant and γ01 represents the treatment effect. If γ01 is significantly different from zero (p < .05), then there is a 95% chance that the difference in achievement for children receiving the treatment is different than for children in control classrooms. HLM (Version 6.03) software was used to create both graphs

(See Figures 2 & S3). For both Figure 2 and Figure S3, the control group scores are presented as points at the fitted mean (A2i use and fidelity scores equal 0), whereas the treatment groups scores are represented by lines because fitted mean scores vary depending on minutes of A2i use or fidelity score. Red points and lines represent fitted passage comprehension W scores for first- graders who started first grade with above average vocabulary scores (modeled at the 75th Algorithm-guided Individualized Instruction page 15 percentile of the sample, W score = 474, age equivalent = 6.0). The blue points and blue lines represent fitted scores for children who started first grade with below average vocabulary W scores (modeled at the 25th percentile of the sample, W score = 486, age equivalent = 8.0).

End of year results (May 2006) revealed that children in the treatment group made significantly greater gains overall on the passage comprehension test compared to children in the control classrooms, controlling for fall status, gender, and FARL status (See Table S5). The fitted mean difference between students in the treatment and control classrooms was 2.63

(Treatment Effect d = .25), which translates into a two month difference in grade equivalents.

The proportion of variance explained by the final model was computed by subtracting the total variance in the full model (See Table S5) from the total variance in the unconditional model

(without child or classroom level predictors) and dividing by the total variance in the unconditional model. The total variance in the unconditional model was 260.29 (u0 or τ = 36.76

[between classroom variance], r or σ2 = 223.53 [within classroom variance]). The final model explained 57% of the variance in children’s spring passage comprehension scores.

An additional model, which included only the treatment teachers, revealed that the more time teachers spent using A2i, the greater was their students’ passage comprehension score growth overall (i.e., a dose-response effect). Results are provided in Table S6.

Additional models, with all teachers included, revealed a significant student fall vocabulary by A2i use interaction (See Table S7 & Figures 2 and S3). In general, students with lower fall vocabulary scores ended the school year with lower reading comprehension scores.

However, the more teachers used A2i or the higher their rubric score, the greater were their students reading comprehension skills by spring. This effect was greater for students with lower initial vocabulary scores, however. Children who began first grade with lower vocabulary scores Algorithm-guided Individualized Instruction page 16 ended the year with reading comprehension scores that were comparable to students with strong initial vocabulary scores but only in classrooms where the teacher used the A2i software to a greater extent (Table S7 top & Figure 2) or implemented individualized instruction with high fidelity (See Table S7 bottom & Figure S3).

When interaction terms are included in models, the size of the treatment effect depends on teacher fidelity (e.g., A2i use) and student initial status (i.e., fall vocabulary). By comparing the total variance explained in the final model with and without the fidelity of treatment variable, the amount of variance explained by the treatment and the interaction term can be computed.

Fifteen percent of the total variance in spring passage comprehension scores was explained by the amount of time teachers spent using A2i and the interaction term; 43% of the variance was explained by child fall reading and vocabulary scores, gender, and FARL. The total amount of variance in spring reading comprehension scores explained by the final model was 58%.

We found highly similar results with letter-word identification as the outcome (Table S8).

There was a vocabulary-by-A2i use interaction that was significant and replicated the passage comprehension results. Altogether, these results indicated that using A2i, supported by professional development, increases the extent to which teachers individualized student instruction, which, in turn, led to stronger student reading scores.

References

1. C. M. Connor, F. J. Morrison, E. L. Katch, Scientific Studies of Reading 8, 305 (2004). 2. C. M. Connor, F. J. Morrison, J. N. Petrella, Journal of Educational Psychology 96, 682 (2004). 3. C. M. Connor, F. J. Morrison, L. Slominski, Journal of Educational Psychology 98, 665 (2006). 4. C. M. Connor, F. J. Morrison, P. Underwood, Scientific Studies of Reading (in press). 5. K. Rayner, B. R. Foorman, C. A. Perfetti, D. Pesetsky, M. S. Seidenberg, Psychological Science in the Public Interest 2, 31 (2001). 6. NRP, “National Reading Panel report: Teaching children to read: An evidence-based assessment of the scientific research literature on reading and its implications for reading Algorithm-guided Individualized Instruction page 17

instruction” Tech. Report No. NIH Pub. No. 00-4769 (U.S. Department of Health and Human Services, Public Health Service, National Institutes of Health, National Institute of Child Health and Human Development, 2000). 7. B. M. Taylor, D. P. Pearson, K. Clark, S. Walpole, The Elementary School Journal 101, 121 (2000). 8. R. Wharton-McDonald, M. Pressley, J. M. Hampston, The Elementary School Journal 99, 101 (1998). 9. M. Pressley et al., Scientific Studies of Reading 5, 35 (2001). 10. H. Borko, J. Niles, in Educators' handbook: A research perspective V. Richardson- Koehler, Ed. (Longman, New York, 1987) pp. 167-187. 11. L. S. Fuchs, D. Fuch, N. Phillips, The Elementary School Journal 94, 331 (1994). 12. C. E. Cameron, C. M. Connor, F. J. Morrison, Journal of School Psychology 43, 61 (2005). 13. J. E. Brophy, T. L. Good, in Handbook of research on teaching M. C. Wittrock, Ed. (Macmillan, New York, 1986) pp. 328-375. 14. Harcourt Educational Measurement, Stanford achievement test 10 (SAT10) (Harcourt, Orlando, FL, 2003), pp. 15. N. Mather, R. W. Woodcock, Woodcock Johnson III tests of achievement: Examiner's manual (Riverside, Itasca, IL, 2001), pp. 16. NAEP, “The nation's report card” (National Center for Education Statistics, 2005). 17. F. J. Morrison, H. J. Bachman, C. M. Connor, Improving literacy in America: Guidelines from research (Yale University Press, New Haven, CT, 2005), pp. 18. T. D. Cook, D. T. Campbell, Quasi-experimentation design and analysis issues for field settings (Houghton Mifflin Co., Boston, 1979), pp. 19. S. W. Raudenbush, A. Bryk, Y. F. Cheong, R. Congdon, M. du Toit, HLM6: Hierarchical linear and nonlinear modeling (Scientific Software International, Lincolnwood, IL, 2004), pp. 20. S. W. Raudenbush, A. S. Bryk, Hierarchical linear models: Applications and data analysis methods. J. de Leeuw, Ed., Advanced quantitative techniques in the social sciences (Sage Publications, Thousand Oaks, CA, ed. 2nd, 2002), pp. 21. A. Venter, S. E. Maxwell, E. Bolig, Psychological Methods 7, 194 (2002). 22. S. W. Raudenbush, Psychological Methods 2, 173 (1997). Algorithm-guided Individualized Instruction page 18

Table S1

Selected Language Arts Instruction Activity within the Dimensions of Instruction

Teacher Managed (TM) Child Managed (CM) Code-focused  Alphabet Activity  Spelling  Letter Sight-Sound  Independent repeated reading of (CF)  Initial Consonant Stripping words  Word Segmentation  Phonics activities  Phonics activities  Phonological awareness activities  Phonological Awareness activities  Non-word reading activities  Sight word reading  Computer activities code-focused

Meaning-focused  Vocabulary  Student Read Aloud, Individual  Teacher Read Aloud  Buddy reading (MF)  Student Read Aloud, Choral  Sustained Silent Reading  Group Writing, Writing  Reading Comprehension worksheets Instruction, Model Writing  Student Individual Writing  Listening Comprehension  Discussion  Repeated reading of text  Timed reading Algorithm-guided Individualized Instruction Online Supplemental Materials page 19

Table S2

Description of Schools

School Treatment School? Reading First? Total Core Curriculum Percentage of children eligible number for Free and Reduced Lunch, first grade indicating family meets classrooms federal poverty level criteria A No Yes 3 Reading Mastery 93 B Yes Yes 6 Open Court 96 C No Yes 6 Open Court 88 D Yes Yes 5 Reading Mastery 82 E No Yes 5 Open Court 57 F Yes No 4 Open Court 69 G Yes No 5 Open Court 67 H No No 7 Open Court 37 I No No 6 Open Court 24 J Yes No 5 Open Court 29 Algorithm-guided Individualized Instruction Online Supplemental Materials page 20

Table S3

Descriptive Statistics by Treatment and Control Group as well as means for variables used in HLM analyses Student Assessments Treatment M (SD Control M (SD) Total M (SD) HLM descriptives Fall WJ Letter-Word Reading W Score 404 (28) 413 (23) 409 (31) Fall WJ Letter-Word Reading Grade Equivalent 1.4 (.65) 1.6 (.76) Spring WJ Letter-Word Reading W Score 451 (24.18) 455 (26.66) 439 (27) Spring WJ Letter-Word Reading Grade Equivalent 2.5 (.73) 2.6 (.87) Fall WJ Picture Vocabulary W Score 476 (10) 481 (10) 479 (11) Fall Picture Vocabulary Standard Score (norm mean = 100, SD = 15) 99 (11) 103 (16) Spring WJ Picture Vocabulary W Score 481 (9.50) 485 (11.04) 484 (11) Spring Picture Vocabulary Standard Score 100 (9.50) 104 (11.52) Fall Passage Comprehension W score 447 (20) 452 (22) 450 (22) Fall Passage Comprehension Grade Equivalent 1.4 (.60) 1.6 (.76) Spring Passage Comprehension W score 464 (15) 467 (16) 466 (15) Spring Passage Comprehension Grade Equivalent 2.0 (.65) 2.1 (.73) Percentage of Students eligible for Free Lunch 52% 34% 54% Percentage of Students eligible for Reduced Lunch 5% 5% 5% Percentage of students receiving services for limited English proficiency* 1% 1% Percentage of Children who are African American* 74% 38% 54% Percentage of Children who are White* 17% 52% 37% Teacher** and School Variables Treatment M (SD) Control M (SD) Total M (SD) Years of teaching experience 12.4 (11.58) 9.27 (7.83) 10.67 (9.72) Years teaching first grade 4.3 ((5.62) 5.30 (5.18) 4.85 (5.35) Number of teachers with Masters degree 8 4 Number of Teachers who are African American 11 6 Mean percentage of students on FARL school-wide 74.26 53.1 62.82 (24.52) *Data not available for 18% of the children in the sample. **Data are missing for one treatment teacher. Algorithm-guided Individualized Instruction Online Supplemental Materials page 21

Table S4

Descriptions of teachers’ fidelity individualizing their students’ instruction based on classroom observations. Fidelity Description; 0 Control Group Teachers (regardless of what was observed in the classroom)

1 Poor fidelity. The teacher is not individualizing instruction at all. Most of the instruction is whole class without attention to the individual needs of the students. The classroom is not well organized, transitions are long, and instruction is not well paced for either higher or lower performing students. For two (9%) teachers, fidelity was described as poor.

2 Low fidelity. The teacher is using primarily whole class instruction. When small groups are used, they are not always focused on literacy. The classroom has adequate organization, transitions are reasonable, and instruction, while not individualized, is adequately paced. For six (27%) teachers, fidelity was described as low.

3 Fair fidelity. The teacher is using small groups. However, the children in the small groups are generally receiving highly similar amounts and types of instruction (i.e., not differentiated). The teacher has attended to the grouping recommendations in A2i but grouping is based more on convenience rather than student learning objectives, skill, or ability level. The classroom has adequate organization, transitions during small group time are reasonable, and instruction, while not fully individualized, is adequately paced. For five (23%) teachers, fidelity was described as fair.

4 Moderate fidelity. The teacher is using small groups and there is some evidence that instruction is individualized and there is an attempt to meet the A2i recommended time targets for each group. The teacher is grouping children based on learning objectives, skill, and/or ability rather than convenience. The classroom has adequate organization, transitions are reasonable, and instruction is adequately paced. For four (18%) teachers, fidelity was described as moderate.

5 High fidelity. The teacher uses small groups, and there is good evidence that the instruction is individualized and instruction amounts align with A2i recommended amounts and types. The number of groups is based on effective group size and reflects A2i recommendations. The teacher groups children based on learning objectives, skill, and/or ability rather than convenience. There is an observable system in place for organizing students into groups and transitioning from one station to another. The teacher uses a lesson plan (e.g. group activity planner). For five (23%) teachers, fidelity was described as high. Algorithm-guided Individualized Instruction Supporting Online Material page 22

Table S5 Results of Hierarchical Linear Modeling demonstrating significant differences in treatment and control students’ achievement on the spring WJ Passage Comprehension Test, controlling for students’ free and reduced lunch status (i.e., meeting federal definition of poverty), and fall achievement scores. For the Treatment variable, treatment = 1, control = 0. Fixed Effects Coefficient Standard Error t-ratio (df) p-value

Mean Spring WJ Passage Comprehension W score (intercept) 466.04 1.07 433.45 (44) < .001 Treatment Effect (Classroom level) 2.63 1.15 2.28 (44) .028 Percentage of students eligible for FARL school-wide -.04 .027 -1.47 (44) .148 Gender (1 = girl, 0 = boy) 0.62 0.81 0.69 (533) .488 Free Lunch Status (1 = eligible, 0 = not) -2.79 1.33 -2.096 (533) .036 Reduced Lunch Status (1=eligible, 0=not) -5.29 1.43 -3.69 (533) < .001 Fall WJ Letter-word W score .14 .02 5.60 (535) < .001 Fall WJ Vocabulary W score .16 .05 3.38 (535) .001 Fall WJ Passage Comprehension W score .31 .04 7.10 (535) < .001 Random Effects Standard Variance Chi-Square p-value Deviation (df) Between classroom residual 1.81 4.36 63.55 (44) .028 Within classroom residual 10.33 106.87 Deviance = 4088.2 Algorithm-guided Individualized Instruction Supporting Online Material page 23

Table S6 Results of Hierarchical Linear Modeling demonstrating significant association between the number of minutes teachers used A2i and their students’ achievement on the spring WJ Passage Comprehension Test, controlling for students’ gender, free and reduced lunch status (i.e., meeting federal definition of poverty), and fall achievement scores. Fixed Effects Coefficient Standard Error t-ratio (df) p-value

Mean Spring WJ Passage Comprehension W score (intercept) 466.18 1.44 329.47 (20) <.001 A2i minutes used (Classroom level) .02 .007 2.97(20) .008 Gender (1 = girl, 0 = boy) .27 1.15 .23 (219) .815 Free Lunch Status (1 = eligible, 0 = not) -3.37 1.91 -1.76 (219) .079 Reduced Lunch Status (1=eligible, 0=not) -1.67 2.16 -.77 (219) .442 Fall WJ Letter-word W score .19 .04 4.87 (219) <.001 Fall WJ Vocabulary W score .12 .08 1.47 (219) .138 Fall WJ Passage Comprehension W score .22 .04 5.33 (219) <.001 Random Effects Standard Variance Chi-Square p-value Deviation (df) Between classroom residual 1.81 4.36 63.55 (20) .028 Within classroom residual 10.33 106.87 Algorithm-guided Individualized Instruction Supporting Online Material page 24

Table S7 Results of Hierarchical Linear Modeling demonstrating differences in student achievement on the spring WJ Passage Comprehension Test between control and treatment groups when teachers’ use of A2i (total minutes from September to May) varies, controlling for students’ free and reduced lunch status and fall achievement (top) and when treatment teacher rubric score varies (bottom).

Fixed Effects Coefficient Standard Error t-ratio (df) p-value Mean Spring WJ Passage Comprehension W score (intercept) 465.80 1.03 451.65 (44) < .001 Treatment = A2i minutes used, Control = 0 0.014 .004 3.44 (44) .002 Percentage of students eligible for FARL school-wide -0.03 .025 -1.07 (44) .289 Gender (1 = girl, 0 = boy) 0.59 0.82 0.72 (532) .472 Free Lunch Status (1 = eligible, 0 = not) -2.66 1.28 -2.09 (532) .037 Reduced Lunch Status (1 = eligible, 0 = not) -5.20 1.41 -3.68 (532) <.001 Fall WJ Letter-word W score 0.15 .03 5.79 (532) < .001 Fall WJ Vocabulary W score 0.23 .05 4.48 (532) < .001 A2i X Fall Vocabulary W score interaction -0.001 .0002 -2.77 (532) .006 Fall WJ Passage Comprehension W score 0.30 .04 6.83 (532) < .001 Random Effects Standard Variance Chi-Square p-value Deviation (df) Between classroom residual 1.70 2.88 56.84 (44) 0.093 Within classroom residual 10.30 106.06 Deviance = 4105.3 Algorithm-guided Individualized Instruction Supporting Online Material page 25

Table S7 Bottom Fixed Effects Coefficient Standard Error t-ratio (df) p-value Mean Spring WJ Passage Comprehension W score (intercept) 466.22 1.06 440.28 (44) < .001 Teacher fidelity (1 = low to 5 = high), Control = 0 0.53 .33 1.60 (44) .117 Percentage of students eligible for FARL school-wide -0.03 .03 -1.14 (44) .262 Gender (1 = girl, 0 = boy) 0.53 0.80 0.66 (532) .507 Free Lunch Status (1 = eligible, 0 = not) -2.67 1.27 -2.11 (532) .035 Reduced Lunch Status (1 = eligible, 0 = not) -5.20 1.42 -3.66 (532) <.001 Fall WJ Letter-word W score 0.15 .03 5.70 (532) < .001 Fall WJ Vocabulary W score 0.22 .05 4.42 (532) < .001 Fidelity Score X Fall Vocabulary W score interaction -0.04 .02 -2.17 (532) .030 Fall WJ Passage Comprehension W score 0.31 .04 7.00 (532) < .001 Random Effects Standard Variance Chi-Square p-value Deviation (df) Between classroom residual 2.34 4.48 68.56 (44) 0.010 Within classroom residual 10.29 105.93 Deviance = 4096.2 Algorithm-guided Individualized Instruction Supporting Online Material page 26

Table S8 Results of Hierarchical Linear Modeling demonstrating differences in student achievement on the spring WJ Letter-Word Identification Test between control and treatment groups when teachers’ use of A2i (total minutes from September to May) varies, controlling for students’ free and reduced lunch status and fall achievement. Fixed Effects Coefficient Standard Error t-ratio (df) p-value Mean Spring WJ Letter-word identification W score (intercept) 454.56 1.67 271.13 (44) < .001 Teacher fidelity, A2i minutes used 0.02 .005 3.47 (44) .001 Percentage of students eligible for FARL school-wide -.03 .04 -0.96 (44) .341 Gender (1 = girl, 0 = boy) -2.39 1.08 -2.21 (532) .027 Free Lunch Status (1 = eligible, 0 = not) -1.56 2 .03 -0.77 (532) .443 Free and Reduced Lunch Status (1 = eligible, 0 = not) -1.70 2.71 -0.628 (532) .530 Fall WJ Letter-word W score 0.41 .04 10.86 (532) < .001 Fall WJ Vocabulary W score 0.06 .07 .76 (532) .450 A2i X Fall Vocabulary W score interaction -0.001 .0004 -2.72 (532) .007 Fall WJ Passage Comprehension W score 0.40 .06 6.27 (532) < .001 Random Effects Standard Variance Chi-Square p-value Deviation (df) Between classroom residual 3.32 11.02 69.20 (44) .009 Within classroom residual 14.42 208.00 Deviance = 4472.22

Algorithm-guided Individualized Instruction Supporting Online Material page 27

Figure S1. Classroom View (left) The colored bars represent the amount of instruction recommended by the algorithms based on students’ vocabulary and letter-word recognition scores. Children’s names, when clicked, provide the scores and a graph depicting actual scores compared to targets. Another algorithm assigns children to groups based on their letter-word identification score. The teacher selects the number of groups and may change group membership. Literacy Minutes Manager (right). This is where teachers scheduled their literacy instruction. Although teachers could schedule literacy throughout the day, they were encouraged to use a dedicated ninety minute block of time for reading at a minimum. The school district mandated that a 90 minute block of time be dedicated to reading instruction. Algorithm-guided Individualized Instruction Supporting Online Material page 28

Figure S2. The daily planning features of A2i,

including the Group Activity Planner group and calendar

(top left), indexed reading activities (top right) and,

printable lesson plan (bottom). Algorithm-guided Individualized Instruction Supporting Online Material page 29 e

r 475.0 WJ_VOC_W = -5.070 o

c WJ_VOC_W = 7.930 S

W

n

o 471.3 i s n e h e r

p 467.5 m o C

e g a

s 463.8 s a P

g n i r

p 460.0 S 0 1.00 2.00 3.00 4.00 5.00 Rubric Score (1 to 5)

Control Group

Figure S3. Vocabulary-by-Fidelity Score Interaction. The points represent the fitted mean for first-graders in the control group whereas the lines represent fitted mean scores for children in the treatment group, which varied by fidelity score. The red point and line represent fitted mean scores for students who began first grade with fall vocabulary scores falling at the 75th percentile of the sample (W = 474, Age Equivalent = 6 years) whereas the blue point and line represent fitted scores for first graders with lower fall vocabulary scores falling at the 25th percentile of the sample (W = 486, Age Equivalent = 8 years). Graph created using HLM version 6.03.

Recommended publications