The National Academies National Research Council National Science Resources Center

Math/Science Partnerships Workshop Assessment of Student Learning

February 2, 2004

The Keck Center 500 Fifth Street, NW, Room 100 Washington, DC

Proceedings by:

CASET Associates, Ltd. 10201 Lee Highway, suite 160 2

Fairfax, Virginia 22030 (703) 352-0091 TABLE OF CONTENTS

Page

Connecting Cognition and Assessment 2

James Pellegrino

Equity in Assessment 60

William Trent

Classroom Assessment of Learning: What Does it Mean 96 for MSPs?

Lorrie Shepard

Implications for MSPs of Large-Scale Assessments 133

Marge Petit

1 P R O C E E D I N G S [8:30 a.m.]

Agenda Item: Connecting Cognition and Assessment

MR. LABOV: Welcome back. We hope you had an enjoyable evening last night and dinner. We have a lot of things planned for today, including both plenary sessions and breakout sessions later this afternoon.

Because Mel is not here, I have asked a couple of other people from our steering committees to do the facilitating. This morning, the facilitator will be Dr.

Gollub from Haverford College.

Jerry is a member of the National Academy of

Sciences, a physicist at Haverford and was the co-chair for a report that is actually in your CD on learning and understanding, the advanced study report on AP and IV.

So, let me turn this over to Jerry.

DR. GOLLUB: Well, this morning we are going to be fortunate to have -- to be able to hear from Jim

Pellegrino from the University of Illinois in Chicago. Jim

Pellegrino has perhaps done as much or more compared to any other single individual to further the agenda that we are concerned with here. He has chaired a number of National

Academy studies and is an expert in cognitive psychology and education, the relationship between thinking and learning and the delivery of educational programs.

So, Jim, why don't you begin and tell us about 2 connecting cognition with assessment.

Agenda Item: Connecting Cognition and Assessment

DR. PELLEGRINO: Good morning. It is a pleasure for me to be here. I don't know if you saw outside, but there is a handout that has most of my slides from today, in case you haven't picked one up.

Like other presenters, I wasn't sufficiently on time to get them into the NRC ahead of time and, in fact, I made a few adjustments last night. My job, I think, this morning is to do a couple of things. One is to try to make some connections between some of the NRC reports that I have been involved with and others have been involved with, having to do with the area of cognition and assessment.

What I am going to try to do is connect some of my comments today to what you heard yesterday. I thought that Lorrie and Andy did a terrific job in setting up sets of issues and what I am going to hope to do is connect to some of that, as well as depending on time, preview some things and set the stage for some things that are going to happen later today in other sessions, both, perhaps, plenary sessions, as well as in some of the breakouts.

What I am going to try to do, although I am here to talk about particularly NRC reports, I am going to say a few things about a couple of other things. One is "No

Child Left Behind." Another is brief comments about 3 another report and then focus most of my remarks on ideas that come from how people learn and knowing what students know and trying to tie them together.

By the way, I should mention that I think that the CD-ROMs that the NRC has put together are one of the most incredible intellectual resources that anybody could possibly have. When I got my copies of these things, I just went ecstatic, to have all of this material on a CD-

ROM was just phenomenal. By the way, if you have ever tried to go to the NAP website and download anything, that is like pure torture. I understand why they do it. On the other hand, you have on CD-ROM, two of the most incredible sets of intellectual resources that one could ever imagine having.

Let me start with -- let me set my remarks today in the context of the larger issues facing states and MSPs.

The problem that is occurring right now has to do with how states are going to set up their assessment systems and implement them in the light of the "No Child Left Behind" legislation, which includes a lot of different provisions.

I think the critical issues and, hopefully, what we will talk about today is some ways to think about how they get addressed, are things having to do with what should be the targets for assessment, relative to things like content standards, how those targets might be 4 apportioned across perhaps different elements of a comprehensive assessment system and, most importantly, how assessment that occurs both at the large-scale level and at the district and school level can be designed in such a way so that it supports learning and instruction, as well as accountability.

This is really an incredible conceptual and operational design challenge. I am not going to say that we have the answers but there are some ways to think about this. So, what I want to do in the course of my remarks this morning is really cover five things. First, I want to say a little bit about the NCLB requirements and some critical issues.

Then I am going to briefly mention a piece that offers some advice for policy makers, largely to set up where I am going to spend most of my time today, which is a conceptual basis for working through critical issues of assessment design and use and then end with some comments about what I think is needed for further progress and then some ideas about ways in which MSPs and other groups working on this might be able to learn some more.

I will return back to this overview so we can keep track of where we are. Well, we all know about "No

Child Left Behind" or as most of us refer to it as "No

Child Left Untested." I just want to remind you about 5 some of the key provisions because although we didn't hear about much of this yesterday, we do know that facing us, particularly in the 2005, 2006 academic year, is comprehensive assessment in reading and mathematics, covering grades 3 through 8.

Behind that is coming assessments in science, but not in terms of every child in every grade, but in terms of grade bands and that is two years behind and plus associated with this is the idea of adequate yearly progress and the idea of establishing proficiency levels, as well as the reporting.

Now, the provisions of No Child Left Behind have created an incredible press on states and I am sure you are all feeling this in the context of your own project, particularly with respect to mathematics assessment. Jay also mentioned something yesterday, that the NRC does have a committee that is working with NSF support on thinking about the issues of science assessment and, hopefully, because the time lag there, perhaps, we will be able to offer to state and to partnerships, such as yours, some advice as to ways to think about science assessment that supports learning and instruction and at the same time is in conformance with the requirements under No Child Left

Behind.

The mathematics assessment area, particularly at 6 the large-scale level is one we are going to need to talk about and think about a lot harder because of the press there. I ran across this cartoon awhile ago and it seems like for many of us this is the state of the world where No

Child Left Behind is leading us.

It is perhaps an unfortunate depiction where we are more concerned about test scores than we are about actual performance, in this case on the basketball court, last night on the football field, or in your case in the classroom. But it seems to me that we have to do everything we can to avoid this sort of situation as a sort of outcome.

So, what are some of the key issues? Well, issue is what gets assessed. We started to touch on this a little yesterday, but it is the issue of what standards and at what level of granularity. Then the other question is how does usable information get derived? Here we need to think hard about who needs what kind of information, in what form, by when and for what purpose.

I will come back to the issue of type of information and purpose later. Then another issue is whether systems of assessment are necessary and whether they are desirable or feasible. In other words, can we do it all in one fashion or do we really need to think about this in a more systemic way that runs from what the state 7 is doing down to the district and the classroom and then figure out how we apportion the elements of an assessment system to meet the needs of different users? And then also how we make sense of data from multiple assessments.

So, NCLB in some ways opens up a set of issues, which are important ones for us to consider almost independent of the requirements, the legal requirements of what has to get implemented because it raises some very important issues about how to think about assessment as a way to support instruction.

Now, this is not an NRC report, but I do want to mention a report that was issued prior to NCLB by a group headed by Jim Popham, where it tried to anticipate what was going to happen and tried to develop a guide for policy makers in terms of the whole idea that we could actually build assessments that had the dual goal of supporting instruction and accountability. It is not my job to talk about that today, but the commission that produced this report came out with a set of nine requirements for assessment, which together define a coherent way to think about things.

I have highlighted a few because they point out a set of recurrent issues that we all have to deal with. One requirement was the idea that states need to prioritize their content standards. We heard this yesterday, a mile 8 wide, inch deep in terms of standards. You can't assess everything particularly if you are going to assess it at the state level.

Another issue is unambiguously describing those standards and then other ones having to do with providing assessments at the classroom level, not just at the state level and making sure that curricular breadth is attended to and so that you don't unduly focus -- I mean, Lorrie presented some very nice evidence yesterday on the history of what happens with assessments depends on what is on the state test and how it can narrow instruction, it can narrow curricular breadth.

Well, I should mention that this model put forward by this commission is actually a model that we are trying to develop in some detail for the area of science assessment as a part of the NRC committee that is working on making a set of recommendations for how states should think about science assessment. So, hopefully, by later this year, you will have something that the NRC will be putting out that will provide a set of different approaches, including one that comes -- that is derived from this kind of approach.

But I want to use this as a springboard to sort of highlight the issues that need further guidance. Again,

I want to start with standards. What is the conceptual 9 basis for making decisions among standards? If you are going to select and prioritize standards, on what basis do we do this?

This is one that troubles me greatly. How do we specify the meaning of the standards? What is assessable and how? What is the process of translating standards into assessment practices, both for large-scale tests, as well as for classroom assessment purposes? Then how do the system pieces come together, because in essence you have to think about this as a system?

Now, I am presenting lots of questions up front because I think that we need to understand that these are major questions that need to occupy our time and our thinking. I am not proposing that we have the answers to all of these, but we have some ways to think about this.

That is what I want to share with you and others on this program today will also share that with you.

So, what is a conceptual basis for working through some of the critical issues of assessment design and use? Well, I want to argue and you might say I am slightly biased, having spent three years of my life in numerous academy committee meetings with 16 other very smart people trying to figure out issues about assessment, but I want to argue that ideas coming out of the Knowing

What Students Know report, provide us with some significant 10 conceptual guidance, as well as some practical suggestions.

I am going to take a minute, just to remind you as to why the NRC with NSF support, by the way, put this committee together and what was the committee's objective?

Basically, for a variety of the reasons that Lorrie pointed out yesterday, there was a sense that what we are currently doing or have been doing in terms of assessment practices in the United States is woefully out of touch with what we know theoretically in both cognitive psychology, educational psychology, curriculum instruction, and in the area of measurement.

The issue was could we, in fact, explore what it is that the current knowledge base implies about how we should think about assessment. So, the committee translated this into an objective to establish a theoretical foundation for the design and use of new kinds of assessments that will help all students learn and succeed in school. The whole idea here was how do you think about assessment as a facilitator of the teaching and learning process, not just a way to audit what is going on.

The other thing is the committee felt that what we need are assessments that make as clear as possible for students, teachers, and other education stakeholders the nature of students' accomplishments in the progress of their learning. This is the kind of assessment that we 11 need. It is not the kind of assessment that we have largely had operational in terms of both standardized tests, as well as classroom assessment practices.

So, I will say a few words about the report and then I am going to sort of launch into some of the conceptual details and examples. The report from our perspective -- and, again, this is a very biased one -- it proposes a vision of educational assessment that is based on contemporary, theoretical, and empirical knowledge about how people learn and what are effective and appropriate ways to measure that learning.

What we try to do in the report, in addition to reviewing this theory base, is describe what is implied in terms of improved approach to assessment design and use and we offer some examples. I should say -- and I will return to some of those examples in my presentations today. There are a lot of promising examples, particularly at the level of classroom assessment, instructional programs, and ones that include applications of technology.

These examples are very useful in the context of things like MSP projects, et cetera, because they help us see what it would mean to put some of these pieces together and also how to begin to implement them to support teaching and learning.

The other thing the report does is it -- as all 12 good NRC reports do -- it provides directions for future research, development, policy, and practice. It would be disingenuous to say we have all the answers. In fact, what we have defined -- and there are a lot more questions than answers, which dictate a lot of research that is needed at the level of actual practical implementation as well. So, the MSPs and the work that you are doing provide a tremendous opportunity to push the field forward because of the opportunities that you have in terms of what you are trying to do, in terms of change at the level of classrooms and schools and districts.

For those of you who have looked at the reports divided up into four sections, if we had days, we could go through all the details. I am not going to bore you with all the details and I am not going to go through it from start to finish. In fact, what I am going to do is start at the end. I am going to start with some of the implications and recommendations and then work backwards from there to illustrate some of the key conceptual ideas and, hopefully, here I will make a good bit of contact with some of the ideas that Lorrie presented yesterday and some of the things that you are going to hear later today.

So, what was one of the key recommendations that we made in this report? Well, one of them was that policy makers -- and here is a case where those of you working in 13 states have an opportunity to influence people. Policy makers have to recognize the limitations of current assessments and we have to do everything we can in our power to marshal an argument that encourages them to support the development of new systems of multiple assessment that are ultimately going to improve their ability to make decisions about education programs and the allocation of resources.

Ostensibly, the reason why we do assessment is to gather information so that we can use that information to make intelligent decisions about future actions. The argument that we make is that the kinds of assessments we have right now are not the ones that provide the information that is useful at multiple levels of the system. Now, in addition to this -- and I think Bill Trent will reinforce this -- important decisions should never be based on a single test score.

I mean, this has been said over and over and over again. It has been said in the standards on testing. It has been said in high stakes. It has been said in numerous places and, yet, that message doesn't seem to have gotten across. The other thing is that systems should measure growth in achievement over time. What we are really interested in is not static snapshots, but we are interested in tracking growth because, in fact, when we are 14 talking about learning, we are talking about change over time. If we are really interested in what is happening in schools, we should have assessments that actually measure growth and achievement over time.

I hate to tell you this, but the actual standardized tests we have, although they might have scale scores that imply something like growth over time, that is a myth. Okay? Those tests are never designed on a conceptual basis. It is all, excuse me, a trick of psychometrics. I can say that because I am not a psychometrician. I am a cognitive psychologist. But some of my best friends are psychometricians.

The other thing is that -- and this is, again, which is something that Lorrie made a strong point of yesterday and I want to emphasize as well -- the emphasis needs to be shifted from the whole idea of assessment of learning to an increased importance of assessment for learning. The predominant model that we have is the sort of summative assessment of learning rather than assessment for learning.

I want to reinforce this and just reinforce some of the points that Lorrie made yesterday. This simple little diagram captures some of the ideas that I am talking about. One is that, obviously, what is really important is what happens over here in the classroom in terms of the 15 teaching and learning process and, ostensibly, we believe that what happens there should show up on high-stakes summative tests.

Now, there is one issue that we need to talk about and maybe others will talk about is the extent to which these high-stakes summative tests are actually sensitive to instruction. Depending how they are constructed, they may or may not be. The other thing that people are trying to think about is the whole idea that one of the problems with these kinds of measures is that they provide very delayed and indirect feedback to the classrooms.

In fact, many of us would argue that the kind of feedback that they can provide is not very helpful to support the teaching and learning process. So, rather than just thinking about everything being driven by these kinds of assessments, we want to focus more over here. This is where the action is.

This is where the rubber meets the road. This is where the information can be used on an ongoing basis. So,

I can't emphasize enough the importance of classroom-level assessment and the whole idea of formative assessment. Why focus on this? This is a repetition of some of the points that Lorrie made yesterday. It ties in also with theories of learning. 16 While instruction is going on, teachers need to know whether what they are doing is working. Okay? In order for them to make adjustments in their instruction, they have to have ways to figure out what students are currently understanding and so they can figure out how to adapt. That is a particular key feature of this and although this seems to be a principle of learning that goes back a long way, we often times forget about this.

Students need feedback.

They need a certain kind of feedback to be able to monitor their own learning success and to know how to improve. Lorrie mentioned the work of Paul Black and Dylan

William about the evidence of effect size in terms of the use of formative assessment. What is needed for it to work? Lorrie also mentioned Sadler's three elements. You need a clear view of where you are going, the learning goals. You need information about the present state of the learner and you need actions to close the gap.

Now, that is easy to say as a general model, but the real challenge -- and this is the one that I want to focus in on -- is actually knowing what it is that students know. This part, which has to do about information about the present state of the learner, what we need are conceptually rich systems that aren't just about assessment, but that actually link together three things; 17 curriculum, instruction, and assessment.

We have heard a lot about alignment, standards, et cetera. The alignment that has to go on is the alignment between curriculum instruction and assessment. I am going to argue that that alignment needs to be driven by conceptual theories of what it means to know and learn things.

So, that may be a long preamble in some respects to trying to answer this question. How do we begin the process of designing and implementing these kinds of assessments. Well, it has to start with understanding what assessment is. I think although it may seem so simplistic, one of the contributions and key ideas of the Knowing What

Students Know report is to realize that assessment is fundamentally a process of reasoning from evidence.

We can never truly know what is inside a kid's head. We can't sink a probe in there and sort of figure out exactly what they know and how they know it. We are always trying to make inferences about what they know from evidence. There are three critical elements that need to be closely articulated and thought about in terms of how they work together.

Assessment has to start with some kind of a model or a theory of how students represent knowledge and develop confidence in a domain, in an area of mathematics, in an 18 area of science. What does it mean to know things? How does that learning progress and then connected to that are observations, tasks, or situations that allow one to observe students performance.

It is these tasks, these observations that are absolutely critical because they provide us -- if they are well thought out and well designed because they provide us a potential window on what it is that kids know.

The third part, though, is interpretation. It doesn't do any good if you have a rich theory of knowledge and learning and/or you have rich tasks if you have no principal way to make inferences from the observations back to that underlying conceptual theory. One of the arguments that we make is that problems in assessment can be traced to a couple of different types. One is the nature of the theory of cognition.

Most standardized tests that have been developed in this country of academic achievement have their origins in the early part of the 20th century in aptitude testing.

They are based on a theory of measurement and a theory of mine, which is largely an associationist, behaviorist model, which is antithetical to current understandings about the nature of knowing and understanding.

So, what we have is observations that are based on impoverished cognitive theories. Now, those 19 impoverished cognitive theories and those impoverished observations, it doesn't matter how sophisticated your statistics or your measurement model is. The most sophisticated statistical model, psychometric model, isn't going to get you very rich inferences about what students know. It is the theory underlying the observations as the observations themselves are impoverished.

So, one of the things we need to realize and think deeply about -- and this applies not only to large- scale tests. It applies to what happens in the classroom as well, that these three elements of the assessment triangle have to be coordinated. We need to think deeply about what that means. I want to argue that where we start is from the cognition part. Okay? That is where advances in theory and research provide guidance to us.

Now, the report also hits on two important themes that I want to emphasize before launching off into what is cognition. One theme is what is variable and the other is what is constant. Now what do I mean what is variable?

What is variable in assessment is the purpose and context for assessment use. This is a mistake we make continuously in education.

We think that one kind of assessment can fit multiple needs and, in fact, that is that we need to differentiate assessment that is intended for formative 20 purposes at the classroom level, assessment intended for summative purposes and assessment that often is intended for program evaluation purposes. It turns out that the design issues for each of these are different in terms of the constraints, the trade-offs and issues. One of the mistakes that people make is to think that large-scale standardized tests can meet the needs of supporting classroom instruction because they were never designed to be diagnostic.

So, this issue of recognizing different purposes and contexts and designing appropriately is a very important one. Now, what is constant, however, is it doesn't matter what kind of assessment we are talking about. The principles underlying any assessment activity are the same; that is, there needs to be a close connection among cognition, observation, and interpretation.

What can change is the grain size, the scale, et cetera. So, in the Knowing What Students Know report, we argue that there really are some major scientific foundations for thinking about and pursuing quality educational assessment. One part of this are advances in the sciences of thinking and learning. The advances in the sciences of thinking and learning over the last 30 or 40 years provide the knowledge base that underlies the cognition vertex. This is the knowledge base that informs 21 us about what kinds of observations are important and sensible to make.

If we really care about students understanding mathematics rather than simply just doing procedures, well, we can consult this knowledge base to tell us what does that mean and what kinds of tasks or observations will help reveal that. The other thing is that the advances in measurement and statistical modeling are tremendous assets in the interpretation vertex because many of the kinds of inferences we want to make and many of the types of models that we have about the nature of learning and understanding are far more complex and intricate than the kinds of models that we have worked with in the past. To make sense from complex data sets, we often times need sophisticated statistical techniques and technology support, not that teachers and school administrators need to understand and know psychometric modeling, but that we need to use these tools to help develop better, more intelligent assessments because they will allow us to provide information that supports the kind of inferences about students' knowledge that we want to make.

So, this is where How People Learn comes in. The

How People Learn report or reports, which were ultimately put together into the expanded edition, provide a kind of summarization of what it is that we currently understand 22 about the nature of knowing and learning and understanding.

What I want to do is to lay out a little bit of what are those ideas, some key ideas -- Lorrie already talked about some of them -- and what their implications are for assessment and then take you through some examples.

People often ask me, you are a cognitive psychologist. So, what is cognition? Well, that is not an easy question to answer partly because there are a couple of challenges and these are challenges that have practical consequences with respect to assessment and instruction.

First of all, there is the challenge of what we call articulating multiple explanations of thought and behavior.

We really want to understand this, but we have to understand that thought and behavior ranges from micro processes to macro processes.

Cognitive theory spans the gamut from what is going on in terms of perceiving stimuli and moment to moment thinking to more macro processes, things like extended problem solving. The time period over which our theories try to capture and describe behavior and learning can vary tremendously. We can talk about learning that goes on in a short period of time, in a few minutes, or we can talk about learning that unfolds over an entire semester or over multiple semesters.

So, one is this challenge of figuring out what 23 level of explanation are we looking for, are we capturing in a particular body of research. The other is what I call multiple levels of explanation. This is the issue of how we focus the explanation. Now, in the cognitive literature there is an argument, so to speak, about what is the appropriate level of explanation. Much of the cognitive literature and much of what you find in the How People

Learn report is focused on what we call cognitive accounts of individual prophecies and knowledge representations.

That is, what is in the mind of individual children's heads? Okay?

But there is another perspective that has emerged over the last decade or so. Lorrie alluded to this in terms of when she mentioned Vygotsky's work and she mentioned issues of the zone of proximal development. It has been called the so-called situated or sociocultural accounts of collective processes and distributed knowledge representation. The idea that knowledge isn't just in the minds of individuals, but knowledge is also in the contexts that are shared among individuals; that is, they are part of the practices that we engage in and they are shared among individuals and that that is an important aspect of what we need to think about in terms of the nature of knowing, as well as the nature of assessment.

So, let me see if I can illustrate this for you 24 because this may sound a little bit abstract, the cognitive level of analysis. The cognitive level of analysis, it turns out that we know a lot, which has implications for assessment and it comes from research that has looked extensively at the nature of competence and the development of competence in particular curricular domains. One of the things that cognitive and developmental psychology did over the last 30 years is it moved out of the laboratory. It moved from what I will call arbitrary toy path to studying learning in real domains of instruction.

There are three different ways in which we can talk about this cognitive level of analysis. One is how do we characterize performance? Another is how do we characterize development? And another is how do we characterize knowledge? These are associated with things like task analysis, what we call trajectories of learning and forms of representation.

All of these things are key parts of cognitive theories. Here is an example for something very simple, arithmetic. This is a problem that is very familiar to many of you. It is a simple word problem. Melissa had six pencils. Henry gave her 14 more. How many pencils does

Melissa have now?

From the point of view of task analysis, what cognitive research does is to ask what are the foundations 25 for competent performance? What is it that kids need to know to be able to actually deal with problems of this type? We have theories that talk about knowledge of concepts like cardinality in sets and knowledge of strategies like counting and joining sets.

So, this is one level of analysis, to ask, okay, what is it that kids need to know to be able to do this?

Now, another level of analysis is to focus on this issue of trajectories. Okay? That is, how do kids over time come to approach problems like this? In the literature, we know that there are a variety of trajectories that in this case go from direct modeling, that is, representing sets with counters and joining them and counting all, to representing sets of numbers and doing things like counting on from the first number or counting on from the larger.

It actually turns out that we know a lot about kids understanding of number and counting and the fact that over time they get much more sophisticated. So, if we give them a problem like six plus four, initially they may need to represent both sets and actually physically count them and then later on they may sort of represent the first number mentally and then go six, seven, eight, nine, ten.

Okay? Then later on they get even more sophisticated in terms of their thinking about this and they set their counter to the larger number and count on the smaller 26 amount.

So, if we give them a problem like four plus eight, they won't set their initial counter to four and then count on eight. They will set their initial mental counter to eight and go nine, ten, eleven, twelve. Okay?

Eventually, they get to the point where their facts are in memory. They are no longer counting. They have the -- if we actually look, we can see children over the early grades developing this kind of mental representation, this kind of trajectory. These are important things to understand in many different areas.

I think I have time to give you an example of a case where not understanding this can have devastating consequences. When my son was seven years old and he was in elementary school in California, it was our first experience with the California public schools, post-Reagan, and what happened was he came home one day with a fact sheet in which what he had done in school that day was they gave him all of the individual digits and he had done all of the single -- all the additions and all the subtractions.

He had all his additions right and he had almost all of his subtractions wrong. The paper wrote on it,

"Christopher needs to memorize his subtraction facts."

Okay? Well, I looked at Christopher's paper and because I 27 knew something about how kids understanding of addition and subtraction developed, I looked at his paper and I realized that with virtually every subtraction problem, he was off by one, except for the ones like eight minus seven, which he got right. So, I said, Christopher, tell me what is six take away three. He said six, five, four. There is my answer. Take away three.

He was doing what is called counting down. He had a very efficient -- he had an understanding of subtraction as a decrementing process and he just had a procedural flaw. Did a little diagnosis, a little practice, got him to sort of say, Christopher, you don't say the first number. You start and put your finger up when you say the second number.

So, I said, Christopher, what is six take away four. Okay. What is it? Five, four, three, two.

Practiced that routine. Off he went to school, feeling like he could now solve all subtraction problems, knowing

-- and I knew damn well that eventually he would learn his subtraction facts.

The point was that unless we understand where kids are and can diagnose some of this, then we end up not being able to respond to the needs of kids and there was no way that Christopher was going to memorize the subtraction facts that afternoon, no matter how much behaviorist, you 28 know, principles I wanted to apply that day.

Now, let me turn to a different level of analysis, the sociocultural level. Here the argument is that -- and, again, there are very rabid forms of what I will call the situative view and then there is more moderate views. I take a view that the cognitive level of analysis in the sociocultural are complementary. They are different levels of description and they get us to focus on different issues having to do with learning, instruction, and assessment. But from a sociocultural point of view, the most critical implications for assessment are derived from the study of the nature of practice and the form of participation in communities.

Here, when we try to characterize performance, development, and knowledge, we are talking about different things. Here, characterizing performance has to do with us characterizing what are communal practices. Okay? When we are talking about development, we are talking about characterizing trajectories of participation in those practices and characterizing knowledge, we are talking about forms of mediated activities.

Now, I don't expect you to understand all of the details of this and I am not sure that many of us actually truly understand some of this at the depth that some of the sociocultural theorists would like us to. But let me give 29 you an example. Let's take a sociocultural look at arithmetic. From the point of view of practice, the practice is accounting. It turns out that arithmetic is the new math of the 15th century.

Where did it come from? It was learning to participate in a guild. It was the apprenticeship model of development in the counting house and the knowledge that was developed, that we know as current arithmetic, okay, the algorithms are the algorithms of the counting houses, which we teach today. That is actually a look at -- so, it is important to understand our practices and from whence they came. So, one of the things when people talk about this from a sociocultural perspective and the community of practice with respect to mathematical practices, we can talk about things like number theory and important ideas like conjectures, the commutative properties of numbers.

In terms of trajectories of participation, we can talk about trajectories of participation and things like argument, going from cases to generalization to proof and that are represented in things like these simple little equations and then representations that allow students in the context of a classroom to participate in the practices of mathematics, which is part of what many of us want in terms of understanding, not that we want to turn them into mathematicians or number theorists, but that important 30 things to learn have to do with number theory.

So, let me give you then some important generalizations about performance and the implications for assessment. One set of generalizations that we can find in

How People Learn has to do with the nature of competence and expertise. It has to do with the fact that performance develops in communities that value certain forms of knowledge and activity, like modeling and science.

The nature of the community and the practices that are established in a classroom, in a school, whether it is with respect to mathematics or science are extremely important. We also know that knowledge that gets tuned to very specific patterns of activity, like solving certain kinds of problems. The classes of problems we give are critical in terms of what gets represented.

A couple of other things that we understand is that there are no magic levers. There is no silver bullet with respect to developing competence and expertise. It takes practice and it takes disciplined inquiry and it takes a multiple contextualized experience. With respect to assessment, it means that assessment that has to be designed to capture the complexity of competence, performance and capture it ranging all the way from mental processes to participation in forms of practice.

If we want kids to be able to explain things, we 31 have to engage them in processes that allow them to do that. They have to come to understand what is a good explanation. They can't just all of the sudden be given a test and sort of say explain your answer. I am having this same problem with my undergraduates in my cognition and memory class.

What do you mean you want us to explain our answers. Nobody has ever asked us to explain our answers before. That is not fair. Well, it may not be fair, but it is the only way I can figure out whether you actually understand what you are talking about.

Some generalizations about development. Not all children learn in the same way or follow the same path to competence. You can't turn a novice into an expert instantly. You have to go through a change process, conceptual change is not a simple linear uniform progression. You don't move directly from erroneous to optimal solutions.

Christopher was not going to move immediately from erroneous to an optimal solution in retrieving his number facts. He had to go through a stage of actually continuing with his procedure until he had enough repetitions, such that he had a long-term memory representation of what the true number fact was, then he could retrieve it. 32 Intermediate forms of knowledge may not represent explicit forms and that is okay, but simple relationships may not hold and then in terms of participation, there is a thing called legitimate peripheral participation.

Participation often starts at the edges and becomes progressively more aligned with core disciplinary practices. What do we do in our graduate programs when we bring students in. We start them in an apprenticeship model in which they have to come to learn, to think like a scientist. They don't come in thinking like a scientist.

They come in thinking like an undergraduate. They have to learn how to do science or do mathematics or do physics.

So, from the point of view of developing assessment, we have to identify specific strategies and forms of activity with respect to the role they play in these developmental trajectories. For example, counting on your fingers is fine at grade one. It is probably not fine at grade three. But the point is that we need to understand that in terms of -- and be able to assess where a student is on this kind of a trajectory and some generalizations about knowledge, in particular, disciplinary knowledge.

It is organized in particular ensembles that facilitate its use. One of the things that we have come to learn about competence and expertise is that it is not just 33 that experts know a lot. It is how they know it and how that knowledge is organized into things like schemata and that allow them to rapidly apply it.

Again, we can't just give them the schemes. They have to develop with time and practice. The important thing is to recognize that those schemes exist. Now the other thing we have to recognize is that this knowledge includes and is amplified by processes of mental self- regulation the so-called bad term of metacognition that nobody seems to understand.

What we are talking about when we talk about metacognition is the capacity for individuals to spontaneously evaluate both their own knowledge and its limits. What do we know about good learners? We know that when they are reading a text, they are constantly asking themselves questions about am I understanding this. They are engaging in a process of self-interrogation. That is what we mean by metacognition. But the important thing we also have to understand about metacognition is that it can be domain specific. The metacognitive strategies you need in solving mathematics problems are not necessarily the same as the metacognitive strategies you need in terms of reading a science text or reading a history text.

So that part of metacognition is domain specific and disciplinary knowledge is developed in communities that 34 foster identity and interest. So, there is a whole host of implications for assessment in terms of what kind of knowledge -- I won't go into this, but I want to sort of turn to why cognitive models of content knowledge are absolutely critical, what these kinds of models and theories, of which we have many tell us is what are the important aspects of knowledge that we should be assessing.

I believe that it is this kind of knowledge that gives deeper meaning and specificity to standards. When we say a child needs to understand numbers, what do we mean by that? What does it mean to understand numbers at the first-grade level versus the fifth-grade level. So that these kinds of theories and empirical data give meaning and specificity to standards. They give us strong clues as to how this kind of knowledge can be assessed.

That is, they begin to point us in the direction about what kinds of observations, what should be assessed at points that are proximal or distal to instruction. What should be assessed in the classroom versus what might be reasonable to assess at a district or a state level. And the Holy Grail, okay, they can lead to assessments that yield more instructionally useful information, both within and across levels and contexts of assessment.

That is the whole goal. It is not just about assessing kids for the sake of assessing kids. It is 35 assessing them for the purposes of supporting instruction and learning. They can also guide the development of systems of assessments. In Knowing What Students Know, we talk about three properties, comprehensive, coherent, and continuous. Lorrie mentioned the coherency point. I am not going to dwell on that today.

So, from the perspective of what you have to do, what we have to do, what does this imply for assessment design? It implies that assessment design should always be based upon some model of student learning, together with a clear sense of the kinds of inferences about student competence that you want to make for the desired context of use. The three things go together. Okay? That is, what you want to assess at the classroom level for formative purposes is not necessarily the same as what you want to assess at the district level for perhaps benchmark or summative purposes.

But in both cases, you want to start from some underlying model of what it means to know and learn and then you figure out what are appropriate assessments that fit that particular context of use. The reason I would argue you want to start with the student model is that it suggests the most important things to assess. What is really important? What do we want to make inferences about and what kinds of tasks are going to give us the evidence? 36 Now one of the dilemmas that we face is that often times where we start is just we start from the task level rather than the theory level. We start with, oh, that is an interesting task. You went through an exercise yesterday of trying to make sense out of tasks. It is not the easiest thing to do. What do these tasks really get at? What are they connected to in terms of underlying conceptualization of what it means to know something?

Is this appropriate to give to a kid at this grade level or not? I would argue that if you just keep it at the task level, you are never going to address the real issues about what we should be assessing that is going to support instruction. It is not denying that good tasks aren't important and to design, they are and that they can be learning experiences, but you need to start from also an understanding of what it is you want kids to know and learn in the first place.

Now, there are a variety of aspects of student models that are important. I am not going to bore you with the list, but I do want to point out a valuable information source and then give you at least one example to take home as an illustration of how this can work. The adding it up report is a very nice compendium about what we know having to do with children and mathematics learning.

There is a great deal of information in there 37 that can support our thinking about how to develop quality instruction and assessment. That is based upon theory and empirical research and it is one of the things that is on

-- I think it is on the learning CD-ROM, not the assessment

CD-ROM. So, if I had more time, I would run you through a set of examples.

This is not just an abstract exercise. There are various very nice examples of the thinking through of this in the areas of mathematics and science. It ranges all the way from early development of number sense up through physics inquiry and physics instruction at the high school and college level. You will probably hear some of this later today in some of the breakouts, but I want to give you at least one example of how you put these pieces together and why I argue the way I do for the importance of this sort of cognitive underpinning of instruction and assessment.

It comes from the work of Robbie Case and Sharon

Griffin on the development of number sense. It turns out if we look at the development of the concept of number,

Case has argued that, in fact, the foundation of number has to do with kids developing what he calls central conceptual structures. They are laid out here in terms of a progression, the ability to verbally count number words, to count with one-to-one correspondence, recognize quantity, a 38 set size, mentally simulate sensory motor or counting so that, in fact, you don't have to put the fingers up there.

You can count in your head.

Then you graph these structures onto the system of the whole number line. This is a graphical depiction of part of Case's theory, which has been elaborately worked out in terms of empirical data. Well, this is wonderful cognitive and developmental psychology. What does it have to do with instruction?

Well, before I do that, I want to point out something about why we should worry about children developing a good concept of numbers. It turns out, as depicted in this graph, that low SES (socioeconomic status) kids often lagged substantially behind in conceptual development of number and it has tremendous consequences for math achievement, which increased with age as depicted in this diagram.

So, what Case and Griffin did was they developed an integration of curriculum, instruction, and assessment in a program called Number Worlds. Number Worlds is grounded in this kind of theoretical base of how children's conceptual understanding of number progresses in terms of these central conceptual structures.

The actual instructional program includes a range of activities that allow teachers working with kids to make 39 sure that each of these critical understandings gets put into place. Now, where does assessment come in?

Assessment comes in the sense that integral to this whole thing is the number knowledge test, which is designed to assess where a kid is in terms of conceptual understanding.

Now, I am going to give you -- I am not going to show the assessment. I want to show you what happens when you put the pieces together. This is an evaluation of the

Number Worlds program that comes from a three-year longitudinal study that covers grades K through 2 in which

-- and this has been replicated -- treatment and control groups from low-income, high-risk urban communities were compared also with a normative comparison group, which is a middle-income magnet school.

These are the data that I want you to look at.

If we look at the pre-K level over here, we see that if we assess children's conceptual understanding of number, both the treatment and the control children are at prior to kindergarten are well below their middle-income counterparts, which is typical. You can also see that over the course of kindergarten, first and second grade, the control group kids, who were not exposed to the Number

Worlds program, continue to show that lag behind the normative group.

However, the children who were taught and early 40 on using the Number Worlds curriculum and all the tools that are there show that by the end of kindergarten, they have caught up with the normative group, even though they are significantly disadvantaged and they maintain that or even actually begin to outpace that because the conceptual structures were put in place and in part it was facilitated by an integration of the instructional and assessment practices that were grounded in a theory about what it needs to understand number.

Certainly, the argument is if you can't understand number, you are not going to be able to develop the kinds of procedures. Now, there are other examples as well. Perhaps you will hear about the facets program in physics from Jim Minstrel. My point is if you want more information about these, I will be glad to provide and others will.

There are examples. They are applicable to math and science instruction and they represent attempts to tie these ideas together, not just about assessment, but weaving assessment and curriculum and instruction together, tied to underlying conceptual theories about what it means to know something and learn something.

So, a few more minutes here and I will be done.

What is needed for further progress? I can't emphasize enough one of the things that is going to be needed is a 41 better balance and coordination between large-scale and classroom assessment practices. To the extent that these things remain out of sync with each other, we are not going to be making much progress. My greatest fear is that the large-scale assessments that are going to get set in place under "No Child Left Behind" in mathematics are actually going to exacerbate the problem.

So, we are going to have to fight hard to make sure that doesn't happen. We need instructional and assessment materials that incorporate the knowledge that we already have. We have a lot of information about domain- specific knowledge trajectories. We are not using what we know already. If there is anything here, we need to beat up on textbook publishers or whoever to get this information into the hands of teachers.

We heard a little bit of this yesterday. I can't emphasize enough the issue of teacher education, preservice and in-service. What we are talking about here and Lorrie talked about it as well and others, we are talking about a change in the way people think about their pedagogy. We are talking about a change in the culture of learning in the classroom and the role of assessment as a key part of the teaching and learning process as a facilitator, not just an auditor. It is very difficult to get people to make this kind of pedagogical shift regarding assessment. 42 If you get a chance, read some of Paul Black and

Dylan William work over in England in terms of the information that they have about working with teachers and what it takes to actually bring about this kind of transformation. Another thing I would argue is technology supports. Many of the things -- and I think you are going to hear about this from Ellen, who will be covering this later today.

Many of the practices, particularly powerful formative assessment practices are enabled by technology.

In my packet of slides, there is a variety of examples there. Also, if you want, I don't have it on there, but there is a set of URLs I can give you that have to do with places to go to find out more about some of the tools, the diagnoser software that Jim Minstrel has developed, tools that have been developed by Ron Stevens, a whole host of work, cutting edge work, that brings together cognitive theory and technology so that people can actually implement some of these assessment practices at the classroom instructional level.

I don't have time to talk about that. So, one last point. Where and how to learn more? In the How

People Learn report, in the Knowing What Students Know report -- and you won't find this slide in there either. I inserted it last night. We argued that part of what we 43 need to make progress is to develop this cumulative knowledge base on teaching, learning, and testing.

I would argue that the NRC reports, the ones that you have, are part of that compendium. They are part of that knowledge base. Activities like this are a way to begin sharing that knowledge base and, of course, without

-- we need that knowledge base to have an impact on multiple arenas that have an influence on educational practice, one of which is the area of teacher education and professional development. Another one is, obviously, education policy. Now, workshops like this and reports on

CD-ROMs are a great research, but you have to go back and you have to do something. Okay? You have to do professional development and things like that. One of the things that I realized a long time ago was that it is great to have people read these reports, but in my own courses, what happens is when I have them read the report, it is not enough.

So, what we have developed -- and you can talk to me about this -- is a set of tools that allow one to go beyond this, a set of instructional materials that can be put together by individuals to meet their needs on issues of learning, instruction, and assessment. It is a very flexible system. It tries to capitalize on the knowledge base that is there. It has a range of resources, including 44 case materials, interviews. It is an easy way for people to begin to develop a resource base that they can assemble together for the purposes of developing courses, professional development activities, et cetera. It was designed expressly for that purpose. We are actually using it in concert with work we are doing with the Chicago public schools in the area of professional development.

This is a bit of our rogue's gallery. It turns out that if you want to understand some of these assessments, you can read until you are blue in the face, but you know what? You have got to talk to the people or you have got to ask the people who actually do some of this stuff and we have a series of interviews that are part of the materials that include some of the best minds talking about what did they really have in mind, like when they developed balanced assessment in mathematics or Paul Black discussing what really is formative assessment. These are short. These have all been edited down. This is not 30 minutes of a talking head, but the point is that we have tried to create resources that can support getting this knowledge base into practice in the hands of people like yourselves, not that we have the answers, but we have a resource to help you take some next steps.

So, my last slide is just to leave you with a question and hopefully it is a question that we can 45 collectively provide a better answer to. That is we know that more tests are coming. Okay? It is inevitable. If we don't do something intelligent about this through actions of groups like yourselves, they will not help children learn, but the option is truly there to do a much better job.

So, I will stop there and take questions.

[Applause.]

AUDIENCE: Can you give us the website?

MR. LABOV: Jim, if you could put that back and actually what I am going to do is turn that off for a second. Then we are going to leave it on for the break so that you can copy them down then.

I just wanted to point out a couple of other reports that are on your CD-ROMs that you should know about that are related to exactly the kinds of things that Jim was talking about. So, let me just switch screens for a moment.

There can be other people asking questions while

I am doing that.

AUDIENCE: [Comment off microphone.]

MR. LABOV: No, it is not. It is a list that I have put together. It might relate also -- I don't know,

Ellen, you have similar things for later?

Okay. Let me find -- where did that slide go? 46 Let me find the slide, too. I will get it for you.

MS. GARTON: Besides the one that were on the list, the last one that you took us through that is your website, what is the address of that?

DR. PELLEGRINO: That you are going to have to get from me because it is password protected. But I will be happy to talk with folks about that.

I am going to make an impolitic comment, okay.

We actually proposed this under the MSP RETA program to support the MSPs, but need I say more? But it was actually designed with the idea that it would support practice and we are working with the Chicago public schools on its comprehensive math/science initiative, trying to utilize these resources so you can take it to the next level.

MR. LABOV: If I may for just a minute, we have the URLs on the right screen. On the left screen, I just wanted to remind you of some other reports that are available on the Focusing on Assessment of Learning CD-ROM that you have received. These are specifically for our colleagues who are in higher education.

One group is called "Evaluating and Improving

Undergraduate Teaching in Science, Technology, Engineering and Mathematics," and if you read through this what you will find is that many of the things that Jim has talked about, as far as multiple measures that Lorrie and Jim and 47 Andrew talked about with formative assessment and use of assessments for learning is included in that. And it is for higher education, in the context of higher education.

Just to remind you, this is what the report looks like, the cover. Then a second report, "Learning and

Understanding, Improving Advanced Study of Mathematics and

Science with High Schools," while it focuses on AP and IV at the high school level, remember that these are supposed to be emulating -- they are precollege courses and there is a great amount of information generated. As I mentioned to you, I was the co-chair for this report that focuses on higher education and the integration and articulation between high school and college.

So, for people particularly in higher education,

I would also suggest that you read this, take a look, and then on the other CD that we sent you subsequently, the compendium report on student learning, there are just two things that I wanted to remind you. Jim already did, helping children learn mathematics, is on this CD-ROM and for your colleagues who are dealing with issues in reading and how children learn to read, the one that is -- starting out right, a guide for promoting children's reading success, also covers many of the issues that are in How

People Learn, Knowing What Students Know, and these sorts of things. 48 So, there is a whole body of reports that the NRC has been developing that are built on these themes and build upon each other. We hope that these two compendium

CD-ROMs really will serve you in many different ways and your colleagues, who are not necessarily here and who may not even be in science and mathematics.

DR. PELLEGRINO: I should add also that it is from resources like that that we can draw an example of cases, et cetera, to help reinforce many of the principles and issues. That was part of what we were trying to do because you can't just send people off to read 30 reports.

So, we have to have a way to cull from them the nuggets and the examples and then go beyond it so that people can actually figure out what is Number Worlds about or what are facets about or whatever because there is a lot of useful information out there that could be put into play right now in math and science classrooms that just isn't because it is not clear where people -- you know, they are going to figure out all that -- all the things that are there and how to use that information.

MR. LABOV: One other point I forgot to mention on the assessment of learning, if you look toward the bottom for undergraduate myths and trade-offs, the role of undergraduate admissions, another critical issue that people face in higher education when they think about who 49 is coming to their classrooms and what they know and may not know.

AUDIENCE: [Comment off microphone.]

DR. PELLEGRINO: I think one way to think about it has to do with what are the practices -- if you look at the science standards and the kinds of things that we want kids to be able to know and be able to do, including communicating science, if you don't have practices in a classroom that actually work at the issue of scientific explanation, well, what is a model and how to engage students in that kind of a discourse and bring them along into that, that is part of the sociocultural practices that we want to engender. So, one way to think about it is what is the nature of the discourse community that we are establishing in the mathematics classroom, in the science classroom and to what extent do we design over the grades and over experiences a way for children to acquire that kind of understanding.

It is not like we want to turn them -- we want to take a third grader and have them be like a university instructor in terms of the discourse of science. We want to start getting them into certain kinds of discourse and thinking structures that are promoted through interactions in the classroom. If we are going to do that, we also have to monitor that and assess for that in terms of how kids 50 are doing that and the culture of the classroom.

It also relates to some things that Lorrie was talking about yesterday, which has to do with the culture of assessment practices. You know, is assessment seen as a punitive kind of thing or is assessment seen as an opportunity for people to share their thinking and to make their thinking visible so that everybody can learn more effectively.

The kind of discourse that we sometimes have at the university level, where we sort of puzzle about things is a little bit of the kind of discourse we would like to have children participate in and you can't do quality formative assessment unless you establish a set of cultural norms that promote that.

AUDIENCE: You mentioned at one point on one of your slides about the way in which discipline and knowledge is developed in communities and contributes to the -- creates new identity and that is very consistent with one of my favorite profound statements about physics. Physics is a social science.

DR. GOLLUB: Jim, could I ask a little bit about this course development website that you mentioned. Is this for any kind of course or just specifically for courses in science and math education or what? What type of courses are supported there? 51 DR. PELLEGRINO: The tool is actually designed so that it could be done for any kind of course. The whole issue of it is we were trying to develop something I would call -- let's say put it somewhere in between the empty shell kind of course tools like web CT and blackboard and things like that. It is up to you. You put the content in and it is sort of agnostic about that and what I might call digital libraries that has everything in it, but in which you have to figure out how to assemble it.

The point is it is a tool that allows you to develop what we call sessions and courses by putting together resources. Now, you can put the resources in. We have put the sources in. You assemble together a set of resources and it forces the instructor to think about, well, what are you going to do with that resource? Okay?

Then you build sessions and so you can work from existing ones or you can build new ones. It allows one to put in their own content. It was a way in which we could start to build over time some intelligent materials that were reusable and that were customizable for different people's purposes.

DR. GOLLUB: Is this available or did you say this is one that is not available?

DR. PELLEGRINO: Well, it is available. It is a proprietary thing right now in the sense that we have 52 developed it and we are very happy to share it with folks and work with folks, but right now we just can't put it out there, partly because there are a whole other set of issues, including the issue of permission on articles and things like that. I mean, the things that we own in there are some of the videos and other things we have created in briefings. There are other things in there like materials.

So, you get into this issue for educational purposes versus if you are sort of putting it out on the web.

I would probably go to jail tomorrow if that happened or something.

DR. GOLLUB: Could I suggest that the last couple of minutes be focused on questions having to do with applying these fascinating ideas to situations of interest to the members of the program here? Would anyone like to focus on that?

MS. CLELAND: I am Donna Cleland. I am with the

MSP from Philadelphia, the Philadelphia area.

At the point in time where we are, we are just sort of starting and we are looking for some kind of a tool that would serve as a good science assessment as a baseline in the districts that we are going to be serving. Do you have any advice as to what exists now that might be used for that purpose?

DR. PELLEGRINO: Well, my first question would be 53 what aspects of science are you looking to assess? I mean, it is not dodging the question. It is sort of saying what do you want to know. I mean, it goes back to the issue of what is the context of use and what kind of inferences -- for what purpose? Do you want this to help the classroom teachers or do you want to help at the district level? I think asking -- refining the question that way leads one to, well, here is a possibility.

I wouldn't say there is one universal tool out there. There are a variety of things that can be brought to bear, depending on what aspects of science and what grain size that you are interested in doing this. But it is a conversation we can have about where to start.

MR. ERICKSON: Clark Erickson from Minnesota

Department of Education.

I have a question relating to the observation part of the triangle in technology. Students have rapidly evolving abilities in technology. Technology is just changing so dramatically and so fast. I am wondering how we can take advantage of some of those abilities that students have in the observation section.

DR. PELLEGRINO: Well, the fact that kids are so attuned to using technology means that we have -- it is not an impediment for them to do things in the domain of technology and so what technology opens up for us is ways 54 to collect far more complicated kinds of observations and then if we marry those together with effective data routines, we are able to get things like we can give them complex problems to solve and then track what their actions are and whether their search of a problem space looks like a novice or a more competent learner.

So, there are things like that in terms of tools and tasks that have been developed, which capitalize on the fact that it is no big deal. It is almost like game playing in some cases. Some of the kinds of exercises that

Ron Stevens has developed in the (audio unclear)IMEX project are a little bit like playing sleuth in a science mystery and then making all sorts of decisions and then tracking where kids are and then you can actually use that to discuss effective versus less effective search strategies, problem-solving strategies. So, there is tremendous opportunity there.

Another one I should mention is a program that allows kids -- one of the biggest problems -- if you want kids to process expository texts and learn how to figure out what is in the text and how to summarize it and learn from it, you may need practice doing it. Well, teachers can't evaluate tons of summaries. But there are tools out there like Summary Street and other things that have actually been implemented in middle school and high school 55 classrooms that allow kids to get practice in feedback on how to summarize expository texts and that there is evidence that shows that they actually get better on standardized tests and actually learn content.

So, it is not a problem. It has got to be connected up, though. It is not technology for technology's sake. All of these things are connected to some underlying theory of the nature of the cognitive process, as well as how to mind the data that you collect in the technology tool.

AUDIENCE: -- this morning, developing essentially measures for classes that could be used to actually guide instruction at a classroom level. I am just thinking to follow-up that question that was raised by the

Pennsylvania person, by the time the district develops those kinds of -- the battery of those classroom assessments, the project will have already been over. I think the concern here is what are realistic expectations to kind of move within the MSP project to move in that direction that can be used for purposes of evaluation.

I am concerned about the practical part of it.

DR. PELLEGRINO: Well, I think there are two things that can be done, one of which is, first of all, there are things out there that one can actually begin to adapt for use. But the other part of it that can't be 56 ignored is that there is a whole issue of assessments literacy that has to be addressed and that is getting teachers to think in these ways about their own practice.

You can build this in some ways from a bottom-up process, as well as providing good resources at a top-down level. It is not like everybody has to reinvent the wheel or invent something new and, in fact, I think one of the things that should be an emphasis in the MSPs is finding ways across MSPs to harness the strength across them so that some tools get developed that can be used across the projects.

I understand the dilemma and the time factor.

DR. GOLLUB: I would like just to make one quick comment. I think some of us are often overwhelmed by the complexity of developing tools for assessments. My own experience is that there is a lot of information you can get from very simple free-form assessments from students.

You ask them simple questions and let them write prose and if you have a modest number of people, you know, 20 or 30, you can get a phenomenal amount of information in that way without developing multiple choice tools.

Thank you. We have now --

MR. MABLY: Thank you very much.

Here we are 30 years later and I am trying to think -- oh, I should say Colin Mably, representing New 57 Jersey Department of Education.

You alluded just a minute ago to a problem with your website, et cetera. So, I want to introduce an political question: what advice do you have for us enthusiasts for everything you said, and enthusiasts for teachers who are really professional in dealing with the people that want certainty? Because I think what we can describe about the former is it is an induction into uncertainty.

DR. PELLEGRINO: Well, I think your question is really a provocative one. Yes, people want certainty and the only answer I can give them is, in fact, that -- is to go back to the fact that basically assessment is an inference process. Like all inference processes, it depends on the quality of the data and it is an imperfect act. There is no certainty out there. I mean, we are not certain what the national debt is, the gross national product or anything. So, why should we have anymore certainty with respect to a kid's academic achievement.

The problem I see is the other side of this, is that they believe that what they are getting from a single test score has real true meaning. It is a precise estimator. That is the problem.

They actually have that belief that they can put trust in it in ways that are not at all commensurate with 58 what it actually represents. How we get that message through to policy makers I am not sure. I think one of the great disservices that psychology has done to the world of education is that it has actually sold the belief in measurement and assessment. As I said, much of the genesis of this goes back to early aptitude assessment and the fact that we are still in many cases operating with a technology that was relatively atheoretical.

DR. GOLLUB: Okay. At this point let's thank Jim for a fantastic presentation.

[Applause.]

We will reconvene in about 12 minutes at 10:15.

[Brief recess.]

DR. GOLLUB: Let me first mention that later in the morning, at the end of the morning, everyone is going to be given a yellow card and on this card, you will be asked to put two things. One is an example of something that you have learned that you feel you can really use.

So, I am telling you this now so that you might be on the alert for such wonderful insight.

The second thing you will be asked to write is an important question or a burning question that has not yet been answered that you would like to have some help on.

So, please be on the alert for both of these things, useful ideas and burning questions that you might be able to 59 share with us to use tomorrow toward the end of the program.

At this point, I would like to introduce Bill

Trent, who is professor of both education policy studies and sociology at the University of Illinois at Champaign.

It looks like Illinois wins at this workshop.

I am not sure what the difference is between educational policy and education policy studies. So, that means you are an expert in the study of educational policy,

I guess. Is that right?

DR. TRENT: That is another hour or so, though.

DR. GOLLUB: Okay. We will skip it for now.

But Bill has important things to share with us concerning access and equity in education, issues with which most of us are very concerned and issues where we need to do better. So, I am going to invite Bill at this point to lecture to us and engage us in a discussion of --

I forget what your title is -- equity in assessment. Bill

Trent

Agenda Item: Equity in Assessment

DR. TRENT: Thank you very much.

I have to tell Jim that this is exactly what I have always wanted to do is to be able to come back behind his erudite, profound, and stellar presentations. Next time, I am going to make sure that doesn't happen to me.

Usually we have some banter back and forth about which one 60 of us is at the real U of I. But we won't do that now.

I come here to do this presentation on the heels of having spent a workshop with a planning and implementation committee that has responsibility for monitoring and seeing to the full implementation of a consent decree in a small town in the Midwest. So, some of the things I will be talking about are, indeed, quite fresh in my mind and, hopefully, they will enable me to say some things that will be useful to you as I try to introduce this topic.

I am going to start by walking through a series of pieces of research and data that try to make clear the nature of the challenge that we face on issues of access and equity. Hopefully, this will come through fairly clearly. The focus of this workshop on assessment is timely and occurs at a critical point as the results from the "No Child Left Behind" required assessments become more clear, combined with the results from graduation tests and state standards-based assessments, the resulting discussions have necessitated a renewed focus on the role of rates of socioeconomic status, ethnicity, and school factors in shaping academic preparation, access, and performance.

The persisting inequalities require a new level of attention focused on issues of equity. That main point 61 I want to address in this presentation is pretty straightforward. It is about the constraints on opportunities to learn for students of color and poor students and those are substantial. As a consequence our assessment tools and practices are called into question, especially when they are the basis for high-stakes decisions about studentsā€™ placement in the school setting.

After I walk through a few of these, I am going to turn to the recommendations and try to discuss a few of the recommendations form three reports, the high-stakes report, the two CEETE, Committee on Educational Excellence and Testing Equity, one on understanding dropouts, the other on testing English language learners. I will give you the titles and references to those. All three are on your learning and assessments CD-ROM.

A point of focus. A key assumption in the conceptualization and design about assessments is that all students, by and large, have received a substantial dose of treatment; that is, the instruction and learning experience. This is a necessary assumption and one with which we can all agree basically for the construction and design and conceptualization. When, however, we address the equity question, it tells us immediately that the assumption is untenable for a substantial proportion of our students and for a number of reasons. 62 This is my picture of the pipeline. The reason I developed this was to try to get clear for myself some ideas about different points in the pipeline where there are serious challenges. I start at the point of differential access and participation by race, ethnicity, and SES in the preschool years. We know that a gap exists then.

But upon entry into school, we have the variety of state and national assessment. We have the ways in which race, class, and gender differentiation occur in those early school years and the variety of ability grouping and tracking and gateway courses that intervene in ways that shape the learning opportunities and experiences of our students. These early experiences and often times repeated experiences cascade throughout the educational careers of young people and they eventuate in substantially reduced numbers of African American males in particular, of

African American students more generally, of Latino students, of Southeast Asian students, of Native American students as we get toward the end of that pipeline.

Many of us in higher education already see the consequences of those earlier processes substantially reducing the numbers of students of color we see at our level of the higher education -- of the education pipeline.

Several factors influence the opportunity to learn and 63 thereby impact assessment. A great deal of my experience has been with segregated schooling, working on problems of school segregation, focusing on disparate school quality, teacher expectations, parent resources, and community resources are items and factors that belong on that list.

Next month -- well, I guess it is, next month, we will celebrate the 50th anniversary of the Brown decision.

In 1978, a divided court ruled to preserve the use of race in college admissions. In that decision, Justice Harry

Blackmun wrote a more frequently quoted phrase, but one which is challenging, it is operationalized; that is, in order to get beyond race, we must first take account of race. There is no other way.

Just this past June, seven months ago, eight months ago, we were able to sustain our ability to use race for the purposes of admissions as long as it was used in a way appropriate with the letter and spirit of our constitution. But mainly in there was this notion of compelling interest and, thankfully, we do still have guidelines under which we can, in fact, use rates in order to address the disparities that appear to be strong correlates.

Segregated schooling is increasing. Hyper- segregated, high minority and high poverty schools are especially harmful to student learning and attainment and I 64 am always cautious about how I say that because I don't want it to sound like blaming the victim. So, I am going to try to make clear what I mean by that by some of the examples we have. African American and Latino students are often in schools where the quality, of course, is available, quality of the curriculum, indeed, is limited.

African American and Latino students are often in schools where they are overrepresented in special education suspensions and expulsions, but underrepresented in gifted education. And, of course, race and ethnicity, gender, and poverty status at the student level, along with school- level factors, like school size, overall racial composition, teacher credentials and expectations are consequential for educational outcomes.

There are important correlates of school racial composition that implicate equity. High SES and high race concentrations, differential college-going rates, differential teacher expectations, differential access to quality teachers, differential access to high-quality curriculum. It is a bit repetitive.

From the Office for Civil Rights (OCR) data, this is actually a part of another study that was sponsored by the NRC. It is one of the few pieces of data we have that gives us a national picture that gives a breakdown by race and ethnicity of participation in schooling, particularly 65 for our public schools.

From the 1999-2000 data, the significant point is that we see the extent to which African American students are in schools with high concentrations of other African

American students and in schools with a high concentration of other low-income students. The same holds true for

Latino students in substantial ways, more often in schools with a high percentage of other Latino students and more frequently in schools with high numbers of students with free and reduced lunch.

One of the correlates of those high concentrations is that we see real disparities over the course of a decade, two different cohorts of students, the

1979-1980 graduates in the high school and beyond years and from the National Educational Longitudinal Study just about a decade later, we see that there is roughly a 10 percent gap between schools that are less than 10 percent minority, as opposed to schools that are more than 75 percent minority, a 10 percent gap in the college-going rate, and that is persistent over decades.

I was an expert witness in the Missouri v.

Jenkins case and in that case, the city, the district had actually done a survey of its features and we found five items that referenced features efficacy, the sense in which teachers felt that they could make a difference in the 66 schools in which they worked. What was striking to me was that at the early school years level, we couldn't find very much of an effect of teacher expectations; that is, that for the most part, students of color were fairly safe up through grade four, five, six.

The middle school years are where you begin to see the difference. Low teacher expectations are highly associated with high concentrations of minority students and in those contexts you get a real gap in test performance. That is a real critical issue. Now, I am not saying that those teachers necessarily felt inefficacious because of the studentsā€™ rate in class. It could very well have been other aspects of that school site having to do with being underresourced, having to do with being understaffed, having to do with being poorly equipped or overcrowded.

All of those things make a difference, but we cannot rule out the potential for race and ethnicity to be a considerable part of that. We have found in different school districts that one resource that children have differential access to, one indicator of quality, would be mean years of teacher experience. We interviewed principals and principals told us that if they had relatively mature staffs, that they could, in their estimation, produce an effective school, an effective 67 learning context.

The disparity in teacher seniority, teacher experience, maps pretty closely with racial propositions of the school. We have high turnover rates in schools that are high minority with high concentrations of poor students. We have more stability in schools that have high concentrations of majority students. And you can see from this table that at the elementary level, middle level, and high school level, while not in all instances it is statistically significant, you do see the consistent pattern of disparity year after year, 1996 through 1998, from 1995 through 1997, for these data.

Teacher experience is distributed that way.

Teachers with advanced degrees distributed in the same way, generally what we hear about are the instances in which the percentage of teachers, who are teaching out of credential, especially in math and science, are more often found in schools that are high minority, high poverty. It is very difficult to retain teachers with those credentials in those schools. This table yields similar kind of evidence.

From 1995 through 1997 for the years for which we had data in this instance. In racially identifiable white schools, you have a higher percentage of teachers with higher credentials. The same for the pattern in schools that are in compliance, that is, meet this racial balance 68 requirement and differently so for schools that are racially identifiable and in this instance, black.

Another indicator of school quality referencing the curriculum, we found data in looking at the Gratz case for the University of Michigan that showed that over 71 percent of the African American students in the Detroit metropolitan area were in schools that had three or fewer

AP courses. Fifty plus percent of the Latino students were in schools that had three or fewer AP courses. The lawsuit in California is about the maldistribution of AP courses, such that students could be getting 4.0s, but would not be able to compete for entry into Berkeley or to the UC system.

The Michigan formula that was considered offering an advantage to African American students in this instance clearly offered no advantage to black and Latino students.

In fact, it was doubly compounded because the quality of the school is referenced by the richness of the curriculum, was an added factor in the admissions formula.

At a school district like San Francisco, you have real disparities between schools, again, closely associated with the racial composition of the schools, such that a

Lowell has 21 or more AP courses, 16 or more honors courses in 1996 and as many as 24 AP courses in the year 2000. It has a school percent black of about 13 percent. Only one other school comes close to having -- only one other school 69 in the year 2000 had double-digit numbers on AP courses, but none of the schools with substantial numbers of AP courses in that district are schools that offer students of color, particularly rich curriculum.

Correlates of school racial composition and differential access to high-quality teachers and curriculum are associated with low and/or poor performance on a variety of assessments. This is a typical test. This is an end-of-the-year test. Students testing present low mastery and this is the math test. All students in racially identifiable schools that are racially identifiable black perform less well than their counterparts in either racially identifiable majority schools or racially imbalanced schools.

Again, it is not about the kids, but it is often times a lot about our decisions about how we allocate resources. These patterns are persistent across years.

That is the case for math, percentage of students who test in low mastery. It is also the case in reading and we all know the centrality of reading for either math and/or science performance. Patterns are consistent.

Higher numbers of students test at the level of low mastery and in substantial ways and black students test substantially higher than their counterparts, whether or not they are in racially balanced schools or predominantly 70 black students.

We use the national data to try to construct an indicator of the extent to which across the nation's public schools. What are the odds of a minority student being in gifted and talented courses, as opposed to being either in special education or having been suspended or having been expelled? These are from the OCR data for the year 1999 to

2000. The odds ratios tell the story pretty clearly.

Now, the issues here are about opportunities for instruction and the quality of instruction depending on which segment of the curriculum you are actually in. So, we have here again the racial composition of the school and the associated odds of being either in gifted and talented programs or in special education and some of the numbers are very depressing. When you look at the rates of black or Latino students being expelled, being suspended, that means you are outside of the school and have no opportunity for the instruction that is embedded in the test.

We also looked at it across high poverty, low poverty context, same pattern.

DR. GOLLUB: [Comment off microphone.]

DR. TRENT: It is interesting. That is a very good question. Thank you for asking it. Thank you for stopping me there. Because what it shows is that schools that approach some form of racial balance, if you look 71 through the 6 to 25 and the 26 to 50, you see that the patterns in there have the highest likelihood of being expelled, along with the 1 to 5 percent. We found this in small towns throughout the Midwest, throughout the Midwest unfortunately, in places where there were oftentimes either low percentages or moderate percentages of students of color. The overrepresentation either in special education or month suspensions or expulsions was excessive.

In the town in which I live in a school district that has 30 percent African American students, more than 66 percent of the suspensions and expulsions were African

Americans. I don't think that is atypical across the country unfortunately, but it takes a tremendous toll on the instructional time lost.

In looking at attendance patterns at particular high schools in San Francisco, we found attendance rates as low as 66 percent. So, what is it about schools and the way we are operating them that tells students to only go

66, 65, 67, 70 percent of the time? Of course there are associated family problems. Maybe any number of challenges of that sort, but those are very, very difficult numbers.

For students that are missing 30 percent of the instructional time at school, what is it that our assessments tell us?

What can they tell us? 72 MR. KRAMER: The odds of a black person against the odds of a nonblack person, is that what the ratio is here? In a 1 percent black school -- the odds of a black person are 63 percent as high as the odds of a nonblack person of -- is that what that is saying?

DR. TRENT: No, no, no, no, no, no, no. No.

This is an odds ratio that has been -- the ratio of less than 1 suggests that the odds of being in a gifted and talented program is substantially less.

MR. KRAMER: Of the nonblack person?

DR. TRENT: Right.

MR. KRAMER: So, what this is saying is that black kids in a 76 percent black school -- just so I understand, in a 76 percent black school, are the odds of a black kid being gifted and talented and a white kid in that same school being gifted and talented --

DR. TRENT: Almost equal, almost balanced.

MR. KRAMER: -- a black kid is much less likely to be in the gifted and talented program. Similarly, in mostly black schools, black kids versus white kids are equally likely to be suspended, but in a mixed school, black kids are much more likely to be suspended. That is what this is saying.

DR. TRENT: That is exactly what it says. I should make that clear, but that is precisely what it says 73 and that is part of the challenge that we are dealing with.

That is what compounds our challenge.

There it is again, the similar distribution, but this time broken out by high poverty and low poverty context of the school.

A specific example of over and underrepresentation to try to clarify your question, I can't think of -- I can't see your name badge, but here is an example that tries to clarify that for you. The middle line represents the enrollment percentage for African

American students, the solid line. The heavy dash line is special education. The small dotted line is gifted and talented. That is for African American students.

This is for Latino students. This is for white students. This is for the district overall. This is enrollment percentage for the district across all schools.

It is for white students. The overrepresentation in gifted and talented, yes, and, again, that is data that runs from

1998 through the year 2000.

AUDIENCE: Could you go back to the previous slide?

DR. TRENT: The graphic? So, in a school district where you have an average enrollment right around

20 percent of African Americans in 1988, you had twice as many represented in special education. By the year 2000, 74 you have approximately 17 or 18 percent. You still have twice as many in special education.

These conditions are just some of those that challenge useful and effective assessment of student learning. Following are recommendations from the National

Research Council reports that can help guide good assessment practices in the face of these challenges. What teachers teach and what students learn vary widely by tract with those in lower tracts receiving far less than a world- class curriculum. If world-class standards were suddenly adopted, student failure would be unacceptably high.

That is a quote from Bob Linn. The recommendation, accountability for educational outcomes should be a shared responsibility across all relevant parties. High standards cannot be established and maintained merely by imposing them on students. In other words, we have to be thoughtful about these standards, how we implement them and how we operationalize it.

If parents, educators, public officials, and others who share responsibility for educational outcomes are to discharge their responsibilities effectively, they should have access to information about the nature and interpretation of tests and test scores. I think this point has been made over and over again. Colleges of education like my own, I really strongly feel need to do a 75 far better job of preparing classroom teachers to be effective in the use -- the development and use of assessments. I don't think we do nearly enough in the preparation of teachers to be effective, to use effectively the assessment tools and understandings and insights we have.

Such information should be made available to the public, should be incorporated into teacher education and into educational programs for principals, administrators, public officials, and others. A test may be appropriately used to lead curriculum reform, but it should not also be used to make high-stakes decisions. Many of the placements we saw in those slides are the consequence of assessments.

Sometimes single indicators that place students in those contexts, that is inappropriate and to make high- stakes students about individual students until test users can show that the test measures what they have been taught.

There is a substantial amount of evidence, I think, in these examples to suggest and clearly show that these students -- many of the students that we are concerned about have not actually been exposed sufficiently to the material.

Test users should avoid simple either/or options when high stakes and other indicators show that students are doing poorly in favor of strategies combining early 76 intervention and effective remediation of learning programs. I think nowhere is this more important than in math and science.

Let's take a typical example and I won't name the school district. The school district has had to work exceptionally hard to increase the numbers of students participating in level 3 math and science. These are the top tier science and math courses. Nowhere back down the curriculum have they systematically introduced courses that are designed to attract and prepare students early on, identify students early on for participation in instructional experiences that would lead them eventually toward a level 3 experience.

Right now they are beginning to talk about eighth graders. That is late. School districts who are getting it done better are introducing these experiences as early as the fourth grade. In order to have a chance of identifying the students and beginning to prepare them with interventions and remedial or compensatory work that has a better possibility of having them gain the necessary confidence, as well as real ability to be able to successfully compete for access to and success in those level 3 courses.

But that is a long-term planning process that school officials need to engage in. High-stakes decisions, 77 such as tracking, promotion, and graduation should not automatically be made on the basis of a single test score.

Jim said this earlier and we really want to hammer on this, but should be buttressed by other relevant information about the studentā€™s knowledge and skills, such as grades, teacher recommendations, and extenuating circumstances.

In general, large-scale assessments should not be used to make high-stakes decisions about students who are less than eight years old or enrolled below grade three.

Each of these recommendations in different ways reinforces the idea that we have to be much more attuned to a richer variety of ways in which we come to understand the students and what they are capable of and appreciate the extent to which practices that we employ in fact frustrate, if not attenuate, their learning opportunities.

Many of our assessment programs do not contain a well-designed evaluation component. Policy makers need to monitor both the intended and unintended consequences of high-stakes assessments on all students, but especially on significant subgroups of students, including minority students, English language learners, and students with disabilities.

I thought I had edited this out, this last sentence, but I will have to say it without it being a complete sentence here. As tracking is currently 78 practiced, low-tract classes are typically characterized by an exclusive focus on basic skills, low expectations, and the least qualified teachers.

Students assigned to low-tract classes are worse off than they would be in other places. This form of tracking should be eliminated. The way that last sentence should read is as follows: Neither test scores nor other information should ever be used to recommend students for an inefficacious treatment. We know fully well how many of these treatments do not, in fact, even begin to produce the improvements in learning that the placement nominally is intended to achieve.

We have to stop those practices. Since tracking decisions are basically placement decisions, tests and other information used for this purpose have to meet professional test standards regarding placement. We have additional recommendations regarding promotion and retention. Many people argue that, well, we should retain them and hold them back.

It is inappropriate to retain students and give them the same treatment that they just received or to hold them over the summers and give them the same treatment that they had just received. We need to find and employ techniques and methods that not just retain the students, but, in fact, that are designed to address and better serve 79 the students learning needs.

In many ways that means we have to be more sophisticated about our assessment instruments. In many of our meetings, my colleagues often times heard me as a nonpsychometrician and as a noncognitive scientist argue for diagnostic tests. We need assessments that help us understand what it is that children know, but how they know it. We need to understand not just that the student can't get the formula right or doesn't get the formula right, but where in that process of trying to execute that formula does that student have a flaw.

Those kinds of diagnostic instruments would be more useful procedures. It would be more useful to construct the kind of instructional experience that the students need. So, it would not only need to have better instruments in the sense of instruments that are designed to fit the curriculum, but we need instruments that really do help us improve instruction, assessment instruments that would do that.

In higher education, we were just talking about this the other day in relation to the Brown decision.

Brown had a tremendous impact on higher education. I think it was three years ago now, the Knight decision in Alabama was a decision about higher education in those states and it is almost a decision that says if you like school 80 desegregation so much, we are going to give it to you.

One of the stipulations in there was that the historically black campuses had to use a cut score of 15 on the ACT. That would have effectively reduced the enrollment at many of those HBCUs (historically black colleges and universities) on the part of African American students by as much as 25 to 30 percent because the schools they came from did not prepare them to score 15 on the ACT.

So, in effect the courts were saying, yes, you will desegregate, but here are the new rules to the game.

Cut scores are very, very hurtful in instances when used in that fashion, when used in an uninformed way, and when used without other complementary information. One additional piece of information from work that I am involved in now, for the last three years, the Gates Foundation has awarded the Gates Millennium Scholarship. It will do so for the next -- well, up through the year 2019.

Every year about a thousand students, students of color, who are high-need students, receive that award.

Every one of those students has to have a 3.2 GPA. Test scores are not used in the selection of the students. We are using an algorithm of noncognitive measures in addition to the GPA and other supportive information in the student's application to make those selection decisions.

Those students -- this noncognitive work is work that has 81 been developed. Some of you may have seen The New York

Times article about two weeks ago, work that has been under development since the late seventies by Bill

Sevlechek(audio unclear) at the University of Maryland.

Sevlechek, Brooks, and Tracy started this work back in the late seventies when the University of Maryland was one of the 19 -- one of the schools in the Adams case, Adams litigation, as one of the states that had previously operated dual systems of higher education.

We can be much more sophisticated in higher education in identifying and selecting students, who can benefit from and grow in the educational context that we provide. So, if a cut score is to be employed on a test used in making a promotion decision, the quality of the standard setting process should be documented and evaluated, including the qualification of the judges' employed, the method or methods employed, and the degree of consensus reached. In other words, there is a substantial data responsibility that we should engage in order to have this done effectively.

Let me quickly get to just a couple of the recommendations because you will have this available to you. A couple of the recommendations on students with disabilities: More research is needed to enable students with disabilities to participate in large-scale assessments 82 in ways that provide valid information. This goal significantly challenges current knowledge and technology of measurement and test design and the infrastructure needed to achieve broad-based participation.

In higher education, I was working in the chancellor's office at the time when this began to come through, the issue of accommodations. None of that had ever been budgeted, what kind and range of accommodations, what were useful and appropriate accommodations. We just hadn't planned with the kind of foresight.

I don't work there anymore in that office, so I have no more need to know exactly what it might cost, but I suspect it is fairly substantial in terms of how it is driving university budgets now in order to make effective accommodations. There are other issues about accommodations. When do you introduce the accommodations?

Is it fair to place a student in an assessment, offer them an accommodation with which they have little familiarity?

So, how might a calculator assist the student, who hasn't had the opportunity to practice sufficiently?

So, these are real issues that we can pay very close attention to and that a growing body of evidence I think is becoming available for. The needs of students with disabilities should be considered throughout the test development process. 83 I want to get just to a couple on students with

-- who are English language learners. Some of the work of the Committee on Educational Excellence and Testing Equity, especially that publication on testing English language learners that was led by Kenji Hakuta, is just an excellent resource.

Systematic research that investigates the impact of specific accommodations on the test performance of

English language learners and other students as needed.

Accommodations should be investigated to see whether they reduce, construct irrelevant sources of variance in English language learners without disadvantaging other students, who do not receive those accommodations.

Development and implementation of alternative measures, such as primary language assessments, should be accompanied by information regarding the validity, reliability, and comparability of scores on primary English assessments.

Placement decisions based on tests should incorporate information about educational accomplishment, particularly literacy skills in the primary language.

Certification tests should be designed to reflect state or local deliberations and decisions about the role of the

English language proficiency in the construct to be assessed. We repeatedly in these different volumes call 84 for these broad forms of participation and involvement in the standard-setting process, as well as in the development process for these assessment instruments.

Again, I recommend and refer you to those instruments. I will repeat again the italicized portion here. Neither test scores nor other information should ever be used to assign children to inefficacious treatments. I am particularly concerned about this because of the high rates of placement of African American males and BD, behavioral disorder, and other special education categories, especially those that are very difficult for students to extricate themselves from.

The disparate disciplinary treatments that students receive in school are very consequential for students of color, again, particularly males. I think these are processes and practices that we can, in fact, correct, change, alter, and reduce the negative consequences, but it does take systematic research. It takes intentional behavior on our part, but it also takes school districts doing what I would consider an equity audit on each of these issues.

An equity audit needs to tell us more about our constituents, more about the circumstances in which they live, more about the students and their competencies, more about the disparities across our districts. This is the 85 reason for the disaggregation requirement in the "No Child

Left Behind" legislation. It is very important that we not look at that as a punishment, but we look at it as a way of enabling ourselves to get a handle on some of the patterns of ill distribution that we see in terms of participation, especially in math and science classes in our schools.

I graduated 120 miles from here from an all- segregated school system, had the good fortune of having exceptional teachers. I had exceptional teachers primarily because of the discrimination that they experienced. I actually went to undergraduate school as a math major and took great pride in doing so. Came back from a three-day road trip and changed my mind in my junior year.

But the point of that is to say that there are many, many young people with the requisite skills and competencies. We have to make it possible for them to have the access to the opportunities to learn much like I did, but, hopefully, under better circumstances of quality and educational provision.

Thank you.

[Applause.]

MR. LABOV: Before we have questions, let me just tell you that we will get Bill's presentation and also the

URL list from Jim and we will try to have those to you either by lunch or immediately after lunch. Bill just came 86 in this morning and we didn't have a chance to get it.

MR. LANGENBERG: Don Langenberg.

Bill, as you know, NCLB requires the reporting and use of assessment outcomes disaggregated by racial and ethnic identity, economic status, et cetera. There are many critics that say they wish that hadn't happened and they wish it would go away. But some, I suppose, wish it would go away because they really don't want that disparity in their own situations to be apparent. But some that I would say have good objectives say that it merely complicates the job of reducing the disparities because it tends to reinforce people with learned helplessness, who believe that the disparities, the gaps are inevitable.

There is nothing that could be done about it and so we don't have to try to do anything about it.

Do you have any advice about how to think about this?

DR. TRENT: Yes, I do, and, in fact, NRC does, too. I have started reading through with the intention of making good use of it a recent publication by The Academies and the name of the book -- and it is not on one of the CDs

-- but it is called From Neurons to Neighborhoods. For those of you who want an intense read that will help you set aside arguments about the naturalness of any disparity along race and ethnic lines. I think you will be well- 87 informed going through that. These are some of the top scientists, brain researchers included in there.

I mean, this is fairly sophisticated stuff. I would strongly urge people who haven't come to grips with that issue to take a good look at that book. It talks about several kinds of development and the intersection of nature and nurture and the interdependence of nature and nurture in order for both the emotional development, as well as the cognitive development to proceed along good lines.

It also talks about the complications and conflicts of poverty. So, that is one sort of scientific thing that people could do, could read in order to begin to inform themselves about that.

This issue of believing that phrase about all students is an important one and we can't dismiss it because we know we have a lot of people who don't believe.

I took the position that as folks used to say when we were

-- you tell me what the standards are and where the bar is set. It is my job to get over it or to assist the kids in getting over it. The teachers I had could do that.

So, on the one hand, as Katy Haycock argues, not necessarily convinced of it, we need the standards because it is unfair to place students through 12 or however many years of schooling and have them come out inadequate. So, 88 we need to understand where the bar needs to be set. But if we are going to do that, we have to do it with awareness that we have an absolute responsibility to make sure that people have what they need in order to be able to meet the bar. We are not doing that. We are simply not doing it.

We are not providing the resources people need to do that.

I don't know how to answer the question about whether or not people will fully engage in good instructional practices if they are not convinced that they will make a difference. But I do understand why the trigger that is in "No Child Left Behind" having to do with schools that make progress, can't make progress on the average, they have to make progress in the disaggregated categories.

My experience with school districts is that school districts have usually used that as a way of exempting themselves from responsibility, to explain away the low performance of their schools. As long as "No Child

Left Behind" does not allow that, then I think it is a very important piece of leverage to have. Politically, it may be one of the only kinds of leverage that we have that drives educational resources toward those students in those disaggregated categories.

So, I think of it for that reason. How we get people to effectively teach and address those categories, 89 how we make use of smart materials to use with students in those categories, I think still challenges us, but I think we can figure out how to do that, but we need the resources to be able to do it.

MR. KRAMER: A follow-up on that, a follow-up question. I am Steve Kramer. I am with the Greater

Philadelphia MSP.

I don't know and I am wondering if you have any data on this. My fear with the disaggregated standards of the "No Child Left Behind" is the sense that you are going to be judged as a teacher on whether you have met a hundred percent of your kids by 2013, meeting the standard and what that does is if you are in -- if you choose to go, say, to a high poverty inner city school, it looks much more difficult than if you choose to go to the school near where

I live, which everybody's parents are doctors and scientists.

I am fearing that you are going to have a flight, even more than you have had now of everybody applying to the cushy schools where it is easy to meet the standards and then they get to choose the best teachers and cream them off and you get lower -- you get increasingly low- quality teachers in the most high-need schools because of the pressures of the NCLB. That is my fear.

I am wondering if there are any data or sense of 90 that?

DR. TRENT: We already have a lot of data and I am going to call on Andy in a minute here -- we already have a lot of data that show that before "No Child Left

Behind" went into effect, we already had that. If anything, "No Child Left Behind" might accelerate it a bit but we don't know how much. Now, the other part, however, is that it is also a part of that legislation, that specific kinds of resource supplements that are supposed to come with -- I mean, it is not that we are just going to beat up teachers, but there is supposed to be some supports provided to those teachers, professional development, additional education resources, even -- there are scholars and researchers now calling for more supplemental education. There are a variety of things that can be done.

The resource issue, I think, is an important piece of that.

The other concern I have is that -- and this is not to attack anyone, but I get real troubled by the ostrich approach to administration. If we can't see it, it doesn't exist. For years, we have never seen this gap, not until it got to the, again, well, why aren't there more kids showing up for fairs who compete well on the SAT or the ACT? We just never saw it. Or if we saw it, we assumed it was cultural deficit. We assumed it was oppositional dispositions. 91 We were able to assume ourselves away from effectively engaging a substantial pool of talent. The

Gates Scholarship Program stands up and says, wow, there are a whole lot of kids, who are right on the verge of poverty. All of these kids who have received the Gates

Millennium Scholarship have to qualify for Pell. So, we can find a thousand kids every year who are African

American, Latino, Native American, and some Southeast Asian kids, all of whom qualify for Pell, all of whom have a 3.2 or better and we don't have a clue about what their test scores are.

Some probably test pretty well and they are going to the most elite institutions we have in the country.

DR. GOLLUB: I know Andy wants to make a comment and I would like to make a closing comment and then we will have to move on.

Andy.

MR. PORTER: Andy Porter.

I am speaking in complete support of what Bill just said, but it is tempting with something like NCLB to

-- there are some parts of it that are just not workable, like all the kids are going to be proficient in 2013 or

2014 or whatever it is, but we shouldn't mix that up with this idea of disaggregating the data and looking at it. To me, try to keep those separate in your mind. I am hoping 92 that new legislation will change NCLB. It will keep the good parts and fix the bad parts. That is what my hope is.

But I just want to point out that people who study school effects and what makes schools effective have found and replicated over and over again that schools that do track student achievement and look at disaggregated student achievement do better in terms of overall level of achievement and in terms of the achievement gap. This literature has been around for a very long time. So, that would be in support of this.

The second comment I want to make is that when performance assessment came around and I was a big enthusiast of this, but there were several of my friends anyway that thought that the achievement gap would actually be narrower on the performance assessment. What basically we found was it was larger, but that is important because what it says to me anyway is that poor kids, students of color are not getting the kind of instruction that allows them to do well on these performance assessments. They are not getting the kind of instruction that allows them to reason, generalize, and the like.

So, I am with you, Bill. Let's not hide those differences. Let's expose those differences and then let's attack them.

DR. TRENT: I don't think we could begin to 93 unpackage the causal aspects of it until we actually know what the parameters are. I mean, it is like anything else.

And I like the monitoring. I try to do it with Weight

Watchers, but it doesn't work.

DR. GOLLUB: I would like to take the prerogative of the chair to ask the last question here. In the reports you cited, the suggestion was made that -- or it seemed a possible consequence that if we use assessments to place people into programs that are improved, then we will be okay, that the problem is the programs aren't any good.

So, the kids remain disadvantaged.

But couldn't you question that assumption? Isn't it possibly just inherently not going to work in the current situation? For example, because peer relationships and peer learning is so important, has a real impact, that would be one reason and that we know that you will not have an equitable distribution of teachers, of teacher talents in these programs, where the less able students are tracked into.

I mean, this is a question I am asking. Is it ever really legitimate at the present time to have separately tracked programs for those who are not as well prepared.

DR. TRENT: That is a challenging question. I may be overly optimistic. I do think that -- and I think 94 there has to be a short-term and a long-term way of thinking about this. I think we have to take advantage of interventions like the one Jim identified with the math intervention at the early grade levels because it shows evidence of being able to work toward closing the gap.

So, I do think that while we are doing research that informs us about what kinds of new interventions have potential, I think we can introduce those interventions in smart ways and make progress and use those in classrooms where they can have a substantial effect and we can place students in those kinds of treatments.

I do think we really do to focus in on how do we make these -- how do we make some of these bad forms of treatment go away. I mean, we have to do that. I mean, any good organization wants to get rid of those kinds of practices that -- should want to get rid of it that don't help it. If you are a chief executive in a high school or on a college campus, it is important to know which one of your clients are not benefiting from your treatment and does that mean that you want to change your treatment. You have got to find out how to change your treatment.

So, I think the long run is you have got to figure out how to change that treatment and how to make it more efficacious. In the short term, you have got to figure out what to do instead of when you don't have a more 95 effective treatment to put the students in.

Then, finally, in those instances where we have some interventions that we feel show merit, we can use them.

DR. GOLLUB: Thank you. I appreciate your helping us to avoid oversimplification.

We have to move on, but I think -- Bill, are you going to be around for awhile participating in later efforts this afternoon?

DR. TRENT: Yes, I will.

DR. GOLLUB: The breakouts and so on. So, there will be lots of chances to interact with you and I hope there will be a lot of informal questions and discussions with Bill.

So, thank you very much.

DR. TRENT: Thank you.

[Applause.]

DR. GOLLUB: So we have Lorrie Shepard. Are you presenting right now? Okay. She is going to talk about classroom assessment of learning, what does it mean for the

MSPs and Lorrie Shepard, who spoke to us yesterday, is going to lead this discussion.

Lorrie.

Agenda Item: Classroom Assessment of Learning:

What Does it Mean for MSPs?

DR. SHEPARD: Actually, what I thought would be 96 helpful is I took copious notes during Jim's presentation, a few more during Bill's presentation and during the break,

I typed up an outline of some things that I would like to revisit. We do this tag team often. We are all members, by the way, of the NRC's Board on Testing and Assessment.

So, we have had these conversations and, yet, I always learn something each time we are in front of an audience.

I would like to distinguish here between the idealized system that "Knowing What Students Know" lays out and practical strategies for trying to respond to the conceptualization that is offered in the absence of fully worked out systems. So, those are the two big ideas and then there are lots of bullets under the practical ideas.

So, I agree with everything that Jim said, but in an occasion when I was asked to write a critical response,

I said, well, that is great, but it will take 50 years.

So, what are you going to do in the meantime?

It is already worked out in some areas and number understandings is probably the most elaborated version, but in most cases we don't have this model that explains student learning, that we can then implement fully into an assessment system. What I would like -- the case I would like to make is to do that we need a curriculum and the way to do the kinds of coherent connectedness that Andy showed in his arrow diagram and that I referred to and that Jim 97 referred to would be to have a curriculum that models student learning, that has a conception of the subject matter and then actually has an understanding of how it would be enacted. That is what we are lacking in most instances.

Let me contrast that idea with what I think we do have, for example, even with national assessment, of which

I am a great defender and protector. National assessment is built to be a monitoring device, not a curriculum- specific assessment. In fact, in one of the committees that I served on for the National Academy of Education, we said that it was good when you are monitoring and you are monitoring across multiple curricula, that you build in a comprehensive assessment.

In that case, we actually had a different meaning for comprehensives than is in "Knowing What Students Know."

A comprehensive assessment for monitoring purposes has to be the union of all possible curricula. Otherwise, you can't measure differences between curricula. You can't measure the way that shifts from physical sciences to biology which NAEP (National Assessment of Educational

Progress) was sensitive to over the decades of changes in fads in science curricula. So, that comprehensiveness, however, is dangerous if you try to take that assessment and turn it into a device to monitor student learning. 98 It is how we get a mile wide and an inch deep.

It is one of the critiques, in fact, of the advance placement test because they, developed by ETS, had the same idea, that it would be comprehensive with respect to all the different ways that high school teachers could teach these advanced topics. And they even say, if you go read one of their booklets and those of you who teach AP, I am sure you have, that students should expect to get only 60 percent of the content correct, some of those things, not because they are not able, but because they should not have been presented with all of that content.

There are things about monitoring devices that are quite different from the substance of a summative test that would be used for classroom and external accountability assessment the way "Knowing What Students

Know" proposes it. Let me just try to describe what I would do if I was building the idealized system.

I would not take a whole bunch of comprehensive assessments, each of several years, and then imagine that those things when I drew the psychometric line between them, modeled student learning. Instead, I would build a curriculum and I would test it with real children to see how they get from here to there.

Be careful because you are all living in systems that claim to measure growth. Jim mentioned this. And all 99 they are doing is drawing the equi-percentile line from one slice in time, scoops of items, to the next slice in time, scoop of items and they have no idea how kids get from here to there and sort of stay at the 75th percentile or jump around erratically.

So, it is true that psychometricians have done horrible misdeeds. One time to a bunch of young scholars, we were supposed to talk about our identities as researchers because they were kind of learning about how you get inducted into educational research and they were all socioculturalists. I had to stand up and explain as an old-timer what my identity was and I said I am a psychometrician and I am sorry. I didn't have to say anything else.

It is a detriment to curriculum and learning that we use the large-scale assessments that we have. I am just trying to help you understand how different it would have to be to actually do what "Knowing What Students Know" says we should do and having a coherent set of assessments, such that the large-scale assessment, tracks learning and reported in aggregate form how students are doing over time on the same learning continua that is also available in real time to be used in classrooms. Quite different from almost everything that is out there, and the people that are trying to build these things include people at the 100 Vanderbilt group that Jim worked with previously. They include Mark Wilson, developments in science and I think this is important because if I would ask you to do anything, I would suggest that in your respective states, the only prayer is that people will take advantage of the fact that science is out of the NCLB frame of reference presently.

I could imagine a state undertaking building a thoughtful curriculum-specific science assessment by the year it is due. Not a hope in mathematics. In mathematics people are doing crazy things, like they have got a fourth grade math and they have got an eighth grade math and they are putting nonreferenced tests in between or they are just taking those existing tests and they are filling in something that approximates the same standards referencing, but, like I said, they have not a clue as to how the kids get from him to the next to the next. They are in-filling by amortizing the difference, you know. Someone learned extrapolation and interpolation. They are doing content interpolation. That is not the same thing as studying 30 kids in context of good instruction and see what they do.

Another thing that the psychometricians do that is horrible is reify bad practices. There are actually vertically scaled things -- actually in "Knowing What

Students Know" there is one horrible example of key math 101 that shows that a typical tenth grade item is a hard division problem. That is what I mean by reifying bad practices. If you just scale what is, you are also not figuring out what it is in context of good instructions, it means to make progress. So, we are way short -- I am still on bullet 1 -- we are way short of doing what "Knowing What

Students Know" proposes to do.

So, switching them to the big bold heading 2, practical strategies are what I call clinical approximation. So, in this world where it is chaos and you can't really trust the underlying models of the existing large-scale assessment scheme, what should you do to work at the level of classroom practice.

Yesterday I mentioned briefly and I will just reiterate this Venn diagram exercise because it will be different in every state how much you have to work against the conception of knowing and understanding in mathematics and science that is represented by the large-scale assessment versus to what extent you can capitalize on it.

So, you need to know the domain that you are interested in based on good standards and you might also have bad standards in your state.

So, you have to start with some conception of the domain about what you really want students to know and then yesterday, I think, when I was waving my arms, I implied 102 that there were two co-equal circles that we would overlap.

I think better of that now. I think that the best way to think about where the assessment fits in that domain is most likely a smaller circle that may be wholly within the domain, but it is very likely to be a subset and the important thing, it could have some part of that circle that represents the assessment that is outside and that is usually test-specific demand characteristics that students have to know that if you really just cared about their developing proficiencies in the discipline, you wouldn't do those things. So, that is helpful to acknowledge.

It is very important, both conceptually and politically, that you come to a clear understanding of that area of the larger circle that is not in the assessment and

I would put up a big version of this on the wall and I would paste examples on the wall of the kinds of student work that exemplify those things that aren't in the assessment. We have worked with school boards, for example, where we had students write essays and we pasted those up to show the difference between that kind of work and what was in the state writing assessments so that the school board members could come to understand. Here is part of our curriculum that is not in the assessment.

Here is how even then the teachers can take it further because this is informative to their instructional 103 practice. Here is how what we do teach about the rubrics in the state assessment link to our real ambition for them in the larger domain so that teachers are thinking about generalization of skills from what is assessed to the full domain. It is a useful conceptual exercise. It is also the case, picking up on what Jim said this morning, that we need to develop some strategies for developing learning progression for understandings of growth and when researchers work on it, they have the opportunity over time to do what I said, follow 30 students and actually Robbie

Case's research is an example, actually see with enough students what is typical.

Here is what real teachers can do now? They can collect portfolios, not student portfolios. They can collect their own portfolio by units of instruction that model student growth. So, what they want -- and who was speaking by example from North Carolina? They have a great literacy example that goes from kindergarten examples through second grade that is an instance of what I am speaking of here.

What you need are samples of student work that model a trajectory. Here is where kids typically start.

Here is how they take the next step and the next step and the next step. That can be for scientific inquiry. That can be for mathematical explanations. It can be for 104 understanding of number.

For as many important conceptual continua as you have, you should collect these things and they become the basis of team discussions among third grade teachers, for example. Once you have got that down for a couple of these, you can start doing variations from them because it is not true that every kid makes those same steps and then what a sophisticated teacher wants to know is what are the different ways kids can do this.

What you put in the folder is a portfolio of kids' progress and you begin to describe what is the normal trajectory and then how kids can depart from that. I think that our literacy colleagues have done a lot better job of modeling this kind of thing and then studying it in ways that really engage teachers. So, they have come up with running records. They have come up with graded texts.

They have come up with strategies that work in practice, that make assessment real time accessible instead of scores on formal assessments at the end of a chapter.

We know about in the special development literature, to the next point here, that teachers need time to do these things and they need the opportunity to try out some of the strategies that have been spoken of in the context of their own practice. I mentioned this yesterday.

I will repeat. If we know from the metacognitive 105 literature that kids can't see a bunch of criteria on the wall and then use it to not only grade their work, but make their work better, they don't know what the criteria mean.

In fact, learning in a discipline, we can argue, is learning what the features of good work mean and look like and you can't do that until you get feedback grounded in your own work. So, it is a feature of good writing is that there is -- it is that the paper follows a line of reasoning and arguments. You can't just tell kids that.

You have to show them examples of it and then you have to show it to them in their own work.

Then they still won't get it. So then you have to say, well, why don't you just tell me why you believe what you believe. Then they say it out loud and they argue with you and they take a position and they argue some more and then you write down -- we do this a lot for kids in kindergarten, but it works really well with sixth graders.

You write down what they just said. Then you show that that is a line of argument and then for the very first time, they can do it in their own work.

But if you just keep posting the criteria, they are not going to get it. Teachers need that same opportunity to ground all these features that people like us make in their own work. You can't just wave your arms and say social constructivism, sociocultural theory and 106 have people change their practice.

In fact, it is very frustrating because they often corrupt the practice. They often translate what you just said and make it behavioristic. An example that I often use about scaffolding, supporting kids, making instruction simpler -- I apologize if you have heard this before, I use it a lot -- years ago, when my husband was talking to our seven-year-old daughter, he asked her a scary question. He asked her -- I don't know what the context was for this to come up -- well, how many seconds do you think there are in a year? You know, you could just see panic and what all kids do, which is I don't want to do that. I don't want to play your game, Dad. You know, I am busy on something else.

Kids want not to engage when the questions are too scary. So, what he did was say, well, how many seconds do you think there are in a day. Then she answered it and she went all the way to a year, just right there. Now, what a behaviorally oriented teacher would have done would have been to say what is 60 times 60.

The difference is that the thinking was left in what my husband did -- naturally, he is a physicist. I don't know how he was smart enough to do this -- and the thinking is taken out of the breaking down of a behaviorally oriented teacher does. That is what I mean by 107 this can be subverted if you don't really understand it.

We are not just talking about the kids really understanding. We are talking about the teachers really understanding.

I link in my mind these arguments about what it takes to make real professional developments with curriculum embedded assessments because I have found that that helps teachers get this kind of experience. I have done a little link between those two ideas and I also think

-- I am thinking vividly of a teacher, who on behalf of her team of teachers sat me down one time in one of our joint research meetings and delivered a speech to me about how angry they were with me for having withheld a whole bunch of these rich materials.

Our reasoning for doing that was that we did not believe in -- so, our beliefs were instrumental here -- we did not believe in scripting lessons for teachers. So, our idea was to work with teachers all year, every week, to help them develop assessments in their classrooms. This was about Christmas time and they were really fed up because they found out that someplace else we were working with Marilyn Burns' curriculum unit and they wanted to know why we test these units from them.

They really persuaded me and what we proceeded to do then after Christmas that year, that it was useful to 108 deliver intact curriculum units whereas -- and I can see her shaking her finger at me -- whereas, the teacher had said someone else has figured it all out and it all makes sense. So, it has coherence. It has the curriculum design that you are trying to tell us about, which we have never heard of and it is against all of our instincts and it has assessments that are linked to that that we also don't know how to develop.

Then what happened was after a three-week unit on multiplication that had been canned and handed and where what we did was provide supports about what was supposed to be going on -- you know, why are you asking this question?

What is the underlying concept we are hoping the kids are going to get? Then what we noticed was that the teachers in the school could generalize to geometry and measurement, which by the way they had never taught. It was also true in this project that they were better at trying these inventive techniques in content that they hadn't taught before, but in number sense where they had experienced curriculum, they had a lot of trouble giving up those old things.

But they generalized beautifully the things that we had talked about in measurement and geometry from the canned curriculum unit that I had been afraid to give them.

So there are ways that you can support teacher learning, 109 using models like what we talk about in supporting student learning.

I have put next technology because that was another criticism when I was challenging "Knowing What

Students Know." We do not want technology with these sophisticated models that hide these understanding from teachers. It is fine if you want to do all this work because teachers cannot be expected to do this themselves.

But then you have to have a strategy for introducing it in classrooms that lets them have enough time to see what is going on in that technological module so that they can generalize it. So, just as when you hauled in Marilyn

Burns and they used it as a canned thing and you supported their understanding of what is going on, you also have to give them opportunities to appreciate what is going on with the technology or it will be what they do to send kids away to do and not a bit of that richness will come into their understanding in the classroom.

So, technology can actually make things worse if it does not support teachers getting smarter about these issues. All of that builds to the point I am making here about your needing models of teacher learning that are just as rich and elaborated as models of student learning. You have got to have a plan that makes that visible to them.

Let's give them some credit for their own metacognition. 110 Here is where we are trying to go. So, you do give the speech that I gave yesterday about all of this, but then you don't expect the speech to be sufficient. Then you try out some things very concretely that allow them to challenge their own understanding.

Relevant to Bill's talk, if they hold very stratified ideas about who can do what, then one of the things you have to do is have a curriculum that challenges that. You have to have some tasks that kids can do that when you support them in doing it, kids they didn't think were capable of this reasoning can suddenly do it. Now, that is really tough. That is why you have to have thought through your selection of tasks that help teachers do this.

Yesterday, one of the comments from the audience was about student work. I had the same experience.

Teachers get lots smarter as soon as you engage them in talking about student work and especially if you model for them what we are looking for in student work and even teach them what we mean by evidence of students extending their thinking, evidence of students sharing thinking so you can do this with student work or you can even do it with transcripts or segments of video tapes of their lessons.

These are all things to mediate teacher interaction. So, imagine teams of teachers coming together and looking at excerpts that you helped them select in a 111 safe environment of their teaching, where suddenly the kids are, in quotes, making their thinking visible. If that is one of the things you want to work on, if classroom discourse is one of the things you want to work on, then why not video tape that model it in their own experience.

I also use clinical interviews, the next bullet, as an instance of how teachers can get smart alongside student work. Clinical interviews help teachers get smart about their students thinking and here it does help to show video tapes, what this looks like when there is kind of an ah ha. What does it look like when you finally understand what students are understanding, what kind of questions you have to ask. What does it look like when you expand a student's knowledge and when you have to back up. Here, again, I use Marilyn Burns's video tape where you have to have an implicit understanding of how students build their understanding of number so that when you ask them a problem that is too difficult, you know how to ask them an easier problem.

You know how to back up without taking the thinking out. These things, if teachers are not just doing it naturally, then you have to have a curriculum that helps them do it and you also have to answer their very practical frustrations with people like me telling them this is all possible and they say, you know, I am already tired. How 112 could I possibly do what you are talking about? So, you have to have very safe instances.

So, it may be you interview one kid or interview three children that you think are at different places in their learning in this unit that you are working on. You have to have concrete things they can do in their own practice that are practical in the context of an already crowded day. I also think that studies of classroom discourse -- that is what I alluded to with video tape segments. That literature is very rich for doing some of the things that Jim was talking about.

Most teachers are still doing the question, response, evaluation, traditional interaction where the teacher is at the center and it is one with each student interaction and if what we are trying to do is get kids into the habit of talking with each other, to come to understand a phenomenon. Then we have to see what that looks like and we have to see how we actually support it and get it started in our own classroom. It is very hard to start these new norms if teachers don't have experience with it and students don't have experience with it.

Their learning will typically be to put kids in cooperative groups and then have the kids solve the problems individually and then compare answers. So, you need to know that that is where a lot of people start. How 113 do you disrupt that? How do you start rewarding the kids for talking about their problem solutions? Well, you start making it an expectation that they put their answers on overheads and take turns explaining how they got their answers to the rest of the class and you make it a goal that you will use that regularly to talk about how you changed your teaching, based on what you heard yesterday about three different kids presenting.

They have to start honoring that in the course of teaching, as well as creating opportunities for kids to do this. I think it is just practical matter of thinking of how it should work, giving teachers real concrete suggestions. Yesterday I also mentioned and I will end here with you could plan specific interventions like we want to try explicitly to improve feedback. I think Dylan

William is here and he might even talk about some of the things where they have introduced formative assessment with teachers and asked the question just what would happen in your class if you started focusing on student learning, if you started everyday asking the question what did the students learn today or what would the feedback be that I could give that would help kids get better. What would that look like? Can I figure it out for three kids and then does that help me address the whole class in a different way, including showing first anonymously and then 114 eventually when it is part of the classroom norm and not scary anymore specific kids' solutions.

When I saw that Jacob did it this way, here is how I changed what I was planning for us to do. Self- assessment is another technique that you can introduce specifically to support learning and then debrief with teams of teachers about whether it helps and what you need to do to make that work practically in classrooms.

So, those are some thoughts in trying to translate, I think, what Jim was talking about this morning. I will take questions. I didn't mean to take even that much time.

MS. APAZA: It would be very helpful for me if you would draw -- I am June Apaza from the MSP in Rapid

City, South Dakota. It would be helpful for me if you would draw the Venn diagram exercise so I can see what it looks like.

DR. SHEPARD: It is not that impressive actually, but here is what one group of teachers did where they decided that this was what the standards in our state stood for and then they just drew a circle that looks like this that is the state assessment and then they tried to elaborate what goes here. What are the things that are not in this circle?

We actually paste a student up here or draw 115 strings and post it on a bulletin board that says when we are doing this kind of student work, we think we are honoring the standards, but we do not believe that the state assessment will fully capture that. I would think that in some states this is a tiny circle and in other states, they have made an effort and it is more representative of the domain of that standard, but it is still not the entire thing.

You can put dispositions in this larger circle, as well as different skills and uses of skills that aren't captured here.

MS. BUNT: Nancy Bunt, Southwest Pennsylvania MSP

-- that they are being pressured to make sure that they are teaching what is going to be tested on the assessments. By drawing that Venn diagram, my concern is that they will devalue what is outside that circle and under pressure they will decide that they will not deal with what is outside the circle and focus only on what is inside the state assessment.

I liked your message better yesterday of by learning a deeper understanding, you will be able to accomplish what is on the assessment as well and I am worried about how you move them from that.

DR. SHEPARD: Let's see, Nancy. You said that yesterday and I was nervous then and now I am even more 116 nervous. Because here is the thing, here is the really honest truth from that teaching the test literature is that

I cannot promise teachers that if they teach for deep understanding, they will absolutely beat the teaching the test strategy because I just showed you data that showed you can make test scores go up on the test in ways that don't generalize. Now, I don't think that when you teach for understanding, the kids are going to suddenly be helpless.

Yes, it is true, but I cannot tell you that they will always beat the kids especially with a very narrow assessment. So, the truth of what I am arguing from the data differs in different states. The narrower the test, the better if you only care about the scores, you will just drill on the test. I cannot tell you that I can beat you if I don't drill on the test because I have got data that show you can raise scores and it does not generalize.

So, I want to stay even on the narrow test and my commitment is to the students. My commitment is that then they will be able to do the next grade's curriculum. Then they will be able to do a million other things that tap this, not this.

MS. BUNT: I guess if you look at your whole -- all the state assessment, all the way up through -- and I am looking at the upper grades where we are off the charts 117 not being able to get anybody there, that is saying that if you are really focusing your time on developing the deep understanding of the standards, the commitment to that whole circle and to more of it, that you are going to get more kids and it is focusing on the kids to that level.

DR. SHEPARD: Absolutely. And this is not just an affirmation, you know. We have to worry about what I call the field of dreams rhetoric that goes along with raising standards. Just set them out there and somehow miraculously by your expectations, high expectations, kids are going to get there. I think you have to set those high standards that Bill was talking about and then I think you have to ensure opportunity by means of a thinking curriculum to get every single step of the way.

It is not a sequentialist argument the way the behaviorists talk about it. It is a set of opportunities starting early and continuing to think about the meaning of what you are teaching. So, we are in agreement about what the real teaching has to look like and you have to commit to the real learning goals and know where the test fits.

So, for example, a lot -- when we talk about teaching the test, here is how now translating this sort of rhetoric into practice. Many teachers and it is worse in urban school systems start in September imitating the test and they use formats that look exactly like the test. Then 118 when things start getting closer and closer to the test, they go to the cafeteria and don't talk and they have testlike conditions and that is how they learn mathematics.

They have an oppressive psychological effect, as well as a terrible sort of drill attitude about the tasks that they are undertaking. Those are the kids that scores may go up, but they can't do the question even if you ask it a slightly different way.

So, what you commit to professionally -- and I think teachers are willing to commit to this -- is you commit to this. You commit to giving kids examples.

Again, the literacy community is ahead of us, where to teach to the test in a way that is professionally defensible -- and I am sorry, I can't remember her name to cite her. It is in a Heinemann publication. They talk about the test as a genre. So, we learn about -- when we are good writers, we think about audience and we think about different purposes for writing and we think that taking tests is one of the many genres that we need to become knowledgeable about.

So, yes, they prepare the kids. The kids have at least one practice session so that bubbling and all that stuff isn't bizarre, but we don't distort the curriculum and give over to that day in and day out. That is what we have to do in mathematics and science as well. What does 119 it look like when people ask you questions?

You can do then sort of prior knowledge checks by

-- in December, asking kids in a test format something you know you covered in September and brainstorm with them about what they know about this. So, we can do our knowledge activation techniques and respect what we have been developing in our culture in the classroom to use our knowledge to solve these things and you can point out that it has some special demand characteristics. Sometimes in tests they ask us this way and what do you know about it.

So that kids get sophisticated at diagnosing why the test is taught this way and act this way and connected to their real learning, which is the big picture.

DR. GOLLUB: Lorrie, there is a question over here and could others who want to be recognized, please raise your hands and Lorrie will recognize you in a moment.

MR. LANGENBERG: Don Langenberg with what I hope will be a couple of relevant sound bytes.

Several years ago, our K-16 partnership in

Maryland brought together some university faculty and some high school faculty and they jointly developed a C paper.

They defined what looked like a paper in high school or college that was worth a C. While they were at it, they had to say a lot about A papers, B papers, D papers, and F papers, but they all found that mutually useful and that C 120 paper is being used fairly widely in lots of places.

The other sound byte goes back to a memory I have of a conversation in the auditorium of The Academies building perhaps 25 years ago. The topic was the use of calculators, then relatively new, in the teaching of mathematics. I remember a distinguished professor of mathematics from Harvard, I think, saying it is more important to know when to multiply than it is to know how to multiply. I still haven't decided how I feel about that. So, any advice would be welcome.

MR. HAMOS: Jim Hamos. I am actually one of the

NSF program officers.

I am going to try to make the following comments not as a program officer, but related to the Venn diagram and especially -- because I am not sure this is the model.

The model that states have gone through and I was once in one of those states is that -- well, it was a model ten years ago and it went back most recently. State standards exist. State assessments year by year move around those to try to cover them over a given amount of time. You can't test everything in the same year.

It seems to me -- I am not even sure -- I mean, if that was in real time, moving around that little dot, so teachers are trying to cover everything. That is one problem they have. It seems to me that what you are 121 saying, though, is that the blue area out there is actually a dream of schools having rich curriculum and instruction and that their belief is that the work that they do sometimes ties to -- relates to state standards and the assessments will pull that out in some way.

So, I see the -- I am hearing you saying teachers when they get together, they talk about all the things they do in schools and opportunities and they try to link it back as their leadership will tell them to do, you know, day by day, what are you doing related to your standards that will be assessed within MSPs then, MSP communities.

They are very interested. They are all over the place.

There are ones that talk about inquiry instruction and certainly in The National Academies there has a whole conversation around that. The ones that are in places where the partnerships are reduced to pacing charts, day by day kids walk in and say this is a standards you are related to.

There are other ones in here that are talking about NSF curriculum, whatever that -- however they have been created over the last few years. So, the models that exist in this room, I would argue, are all over the place.

So, number one, I would like to get your -- I am not sure -- perhaps a little reflection on that.

DR. SHEPARD: Let me pull out a couple of big 122 ideas because you said quite a number of things there.

One is I think we could draw a picture like this for any given grade level and what is implied by the standards and the state curriculum guides if they exist for that grade. The assessment will still be a subset of that.

So, I am trying not to think about the particularities of a given grade level. I think you still have this problem of the one versus the other in any grade where we have set that aside.

Truly, I think if you are at the right level of aggregation, all the standards do track across grades.

They should track across grades. We sort of like don't stop doing geometry suddenly in a particular grade level.

So, imagine this then looking at a grade level.

Another thing that I haven't been -- I haven't clarified is that sometimes the teachers do not have the expanded year. Sometimes their understanding of the subject matter is as much narrower than the powerful curricular ambitions in the standards as the tests are a narrower version. Then you have a different task, but you could still use this diagram, which is to show the kinds of powerful reasoning tasks. I agree with Jim that you don't just stick with the tasks, but sometimes they do in a frame show what we need to be talking about.

In fact, you can use those as an intervention. 123 When you bring in a task that is very different from how teachers are typically teaching, they will quickly say my kids can't do this and when you try it with their kids, they learn how their kids are reasoning from their curriculum to that task. So, those can be good intervention strategies. I just have found it a useful heuristic.

In most cases, teachers avow that their curriculum is bigger and richer than the state assessment, but I agree, you probably have to go both ways and think about that. What is the overlap and where are the gaps in what we are teaching.

MS. MC COY: Thank you. My name is Theresa

McCoy. I am with one of the New Jersey MSPs.

I am very interested in your idea of the trajectory of teacher knowledge, teacher learning about how to use assessments to inform instruction. I wonder if you have any examples of how that trajectory differs with new teachers, experienced teachers, teachers with understanding of their subjects and speak a little bit about that.

DR. SHEPARD: I may resort to stereotypes to answer your question because I do think there are very large differences between elementary teachers and secondary teachers in this respect and that the old adages that elementary teachers teach children and secondary teachers 124 teach subject matter. So, for people who have deep content knowledge at any level, the trick is figuring out how to make this accessible to kids. So, they tend not to be good reflectors on their own learning and even if they were, they don't understand -- they do not have a model of entry for a student who isn't just as facile at it as they were.

So, what you need to have for them are good examples of kids who didn't have a clue and what the steps were for that kid to have the opportunity to get it. So, you kind of have to model a student trajectory for that kind of teacher so that you can show them different ways in

-- especially for kids from very different backgrounds because otherwise you are just going to reify the fact that certain kids can't get it.

For elementary teachers my experience is that their own lack of comfort with the scariness of mathematics or science is what you are working against. For them, I think you need safe opportunities to learn the content.

That means workshops where they can work with scientists and mathematicians basically relearning content that you think they should already know. But they often know it in very rigid ways. So, they are doing it very procedurally and they are scared to death if the kids could just step one tiny step outside that channel, they feel like, you know, they don't know what to do. 125 So, it is very different for different groups of teachers. That is a big divide but obviously you have to know the prior knowledge of your teachers, think of them as learners and also model for them what progress might look like. Just try to generalize the model you used for student learning. How would you do that with teachers?

MR. SMITH: David Smith from Philadelphia MSP.

We have mentioned sort of belatedly yesterday the role of higher education and I just wanted to highlight that again and bring that back into our context because that is the locus of teacher preparation. I think that, you know, if we look in the context of this diagram, there is another circle that doesn't necessarily fully overlap the standards that has to do with disciplinary experts' view of the content and how they prepare students in higher education in the context of that view.

DR. SHEPARD: I agree. And what are the interventions? Well, it probably means I think doing some innovative things like what if you actually use some well- designed fourth grade curricula to work with science professors about what it looks like when fourth graders are learning this material and then does that apply any difference in how you would teach freshmen and sophomores about this because it really hooks you to try this.

If you present examples about real understanding 126 of concepts -- oh, at dinner we were talking about the nitrogen cycle and how could it possibly be that someone would come to college and not understand the nitrogen cycle. Well, because they have been told it. They have been told it at least three times, but what would it look like? What kinds of experiences would you provide to middle school students so that they could understand it and then how would it be different if students like that came into your college classrooms.

Just to repeat what I said yesterday, remember you scientists prepared the teachers that delivered that instruction. So, I think hands-on experiences with that phenomenon with real curriculum and real kids' work is also effective at the college level.

MS. CLELAND: Donna Cleland with the Philadelphia project.

It just sort of goes back to the Venn diagram that you have there. What we found with our State of

Pennsylvania and the piloting tests that they have given out so far is that that green circle there is really -- it is very factoid in nature. So, it is not at all assessing the processes of science inquiry. So, the blue circle out there represents a very important segment of actually things that are in our standards, but they are very difficult to measure in a multiple choice format. So, the 127 measurement of them is not occurring.

So, I mean, what we -- I think what I would like to find out more about it, how does one measure those kinds of skills, you know, that will -- if we want to change pedagogy so that people are teaching in an inquiry-centered way and they are encouraging this thought, this development of thinking within their students, then how is it that we can have a tool that children who have been taught in that kind of way do better.

So, the teaching to the test then becomes teaching toward what it is that we truly value.

DR. SHEPARD: Right. Back to Nancy's point, the better the test represents the broad learning goals, the less corruption there is. So, when I warned you that sometimes teaching the test can beat the thinking curriculum, that is in the narrow instance where the test is just factoid or it is just algorithmic and you get this opportunity to raise the scores falsely. The richer the external test is as a representation of the full processes and content categories that you want to represent, the more that good teaching leads to proficiency on the test and the less sort of distortion you will get, either distortion of the curriculum or distortion or inflation of student learning.

So, that is why the character of the external 128 assessment is so important.

DR. GOLLUB: Time for two more questions.

MS. MAUZY-MELITZ: Debra Mauzy-Melitz. This is a comment on the last question and a couple of thoughts.

Could we change the Venn diagram to a series of circles within circles, just incorporate Andy's concepts, the inner circle as a higher cognitive learning. The outer circles are the factoids and then start looking at it as pi, so that, you know, the pi -- you know, you have to teach this much factoid to equal the learning concept in the inner circle.

DR. SHEPARD: Then the state standards would have a different meaning, though, but we could say centrally what our big concepts are and make the facts peripheral to that. We do have some difficulties coming up with assessments that do what we want them to do without a curriculum. So, I will go back to my first point. We can build tests that measure facts. We can build tests that measure processes. When people have tried to do this, however, when they don't have an agreed-upon curriculum, I think that they make the mistake of making what I would characterize as very aptitude like tests. This is another problem with NAEP, in fact, is if you can't assume that all the kids have read the same book or studied in depth the same particular content, we tend to build very generic 129 tests and that is misleading the curriculum.

It means that we are insensitive to kids learning in specific areas because I think the ideal assessment would be one that tests knowledge of facts in the context of processes, of inquiry processes in the case of mathematics because the real knowledge is not just here is a fact or here is a process absent content. It is here is how in the sophisticated models of expertise that are in

How People Learn, here is how experts when they see a problem, even know what kind of problem it is, they know what particular knowledge is required to answer that problem and then they have a good sort of problem space representation of that.

We find it very hard to make tests that assess that combination because of fairness criteria that say it is unfair to ask a particular question that taps too deeply into something that is specific if all the kids haven't had the opportunity to learn that. So, that is why I find that the issues of fairness that Bill brings up are very different depending on whether I am trying to gather large- scale assessment data about what do kids actually know versus making a decision about a child staying in third grade. To protect the child not being erroneously retained because of curriculum that didn't give them the opportunity, we then make these tests that are more 130 generic, but aren't very good at measuring the outcomes that we truly value.

So, it is another one of the ill side effects of building tests that we could use for every possible purpose; whereas, I would try to make a distinction between those so that I could end up with richer assessments.

DR. PELLEGRINO: I just want to make one last comment. The issues that Lorrie has been trying to talk about in these questions are ones that are not easily resolvable, particularly when we end up with -- and this is where "No Child Left Behind" is driving us -- if you really care about your safe standards and assessing them adequately, you can't possibly hope to do it in a 90-minute census test that everybody gets. This is the practical limit. What has happened is even in states that have had what I call more thoughtful assessment programs, like in

Maryland, the Missed Path, that is a program where you are trying to get an adequate feel of how the state is doing.

You are not trying -- so, you end up with a matrix sampling model rather than a census test.

The realities are there are constraints that limit just what you can assess and how well you can assess it and so long as the model is that the state assessment somehow or other is supposed to represent everything, inevitably it represents nothing terribly well. That means 131 it is not a particularly good index, especially for high- stakes purposes.

I don't know how many times we have to say this.

It is an issue of design and engineering and you end up with a suboptimal design and there is nothing you can do about it. On the other hand, what Lorrie is talking about is hopefully states will recognize what they can assess well on a state assessment and what should be left to be assessed -- what standards should be left to assess closer to the point of instruction so that you have a more systematic and balanced system of assessment so that you don't leave things out.

DR. GOLLUB: Thank you, Jim.

At this point, I think we should close the session. Let's thank Lorrie again.

[Applause.]

[Whereupon, at 12:18 p.m., the meeting was recessed to reconvene at 1:00 p.m., the same day.] 132 A F T E R N O O N S E S S I O N [1:00 p.m.]

MR. LABOV: Welcome back. Facilitating this afternoon's session will be another member of our steering committee and I would like to introduce to you Mary

Colvard. Mary has been a long-time teacher, for 31 years, who retired a couple of years ago, is now currently working in New York State with the State Department of Education in a number of areas, particularly in biology education. She was a high school teacher of biology.

She has numerous awards and I won't go into those. You can read them in her biographical sketch that we provided for you. But I want to tell you that I met

Mary several years ago when I was the study director for our Committee on Undergraduate Science Education and we made a practice of having teachers on the Committee on

Undergraduate Science Education to be able to provide that kind of perspective in the wisdom of practice that many people in higher education simply don't know about and

Mary's input, perspective, and knowledge were just invaluable to the kind of work that we did there.

So, it was a real pleasure for me to be able to have her serve on this committee because she really does bridge knowledge and expertise in both K-12 and higher education.

So, Mary, we welcome you here and thank you very 133 much.

[Applause.]

Agenda Item: Implications for MSPs of Large-

Scale Assessments

DR. COLVARD: Other teachers in the audience, I am so glad you are here because I -- just my own little editorial comment, it is so important to have a true partnership. So, it needs to be represented by all facets of the MSP perspective.

One of the things that is important to us on the steering committee is to get your feedback. So, the yellow cards that we gave you earlier, if you haven't, would you take a few minutes after Marge is done and fill these out and make sure that we get them. I would like to use them in my little formative assessment session to show you how you can quickly look through lots of information and get an idea of what is going on and what questions and discomforts there are and what happiness there also is.

I would like to introduce Marge Petit and I asked her what I should say about her and she said, well, they can read it, but my name is Marge and she also said that she is going to bring you the perspective of an implementer and a developer. So, when you are looking at the role of, well, the MSPs and what the large-scale assessments may have in terms of impact on your project, she can bring that 134 to you.

DR. PETIT: I am going to try a little bit different -- I watched Lorrie and Andy yesterday struggle behind that podium and I actually made a power point where things kind of slide in. So, I thought maybe it would be a little problematic if I didn't exactly see when things were sliding in and what was going on. So, bear with me on that.

The bottom line is I have been an implementer for the better part of my 35 years in education. Like some of you, I have taught middle school math and science primarily for 23 years and then worked in our state (audio unclear)

-- anyone else? No one in the SSI. Oh, my goodness.

Then I actually worked in state policy. So, Jim can kick me around where Jim goes. I was a policy maker for four years and for the past three years have been working on providing technical assistance to seek an assessment accountability. My heart is in classroom assessment. My work in both, both in the technical assistance I provide and the real world of implementation, which I worked in policy and in the SSI is in large scale as well. I mean, it is part of our lives.

So, I am going to give you a perspective from an implementer, an implementer, who has learned after ten hard years how to really avail myself of the research that is out there. I actually don't know if in 1989, when I sat on 135 the first Math Portfolio Committee in Vermont whether the research that I now know was really available or not because we were making it up as we went.

What I do know now is almost all my work and I read daily stuff from Lorrie or Jim or -- I mean, you can just kind of go around and use Andy's work, Norm Webb's stuff as part of the regular way you start to think about things. This has been for me just wonderful to have the top people in the country here on assessment, but it is also powerful for all of you to get this information as you are starting to think about these things.

So, that is my pontification and then we are going to kind of move on. At dinner last night, I kind of had a lesson about let's make sure we all know what we are talking about. So, I am just going to give some quick definitional stuff. The higher education individuals at the table kept saying all this definition stuff going around, we are not quite sure what someone is talking about. So, I just want to make sure we are on the same page. I am not going to spend a lot of time on these definitions.

First of all, when we are talking about large- scale assessment and the implications -- I hope you, by the way, saw on the first slide, the topic I was given was the implications of large-scale assessment for MSPs. I hope 136 you notice I changed it to some implications because the topic is huge. The topic is huge as you could tell from what you got in background this morning.

Last night we were talking about, you know, what does all this mean and all this definitional stuff. So, I am just going to give six definitions. When we are talking of large-scale generally -- and this is actually true of higher education. It doesn't matter if it is an SAT test, whatever it is, you are talking about an assessment, a test, that is delivered to a large group of a large population of individuals, higher education students, adults, whoever they are, under standardized conditions.

Now, principally, standardized conditions in a lot of our minds goes back to our own experience, a timed test. That is not what standardized conditions necessarily mean. Standardized conditions can mean the test is not timed. As a matter of fact in the standards-based world, that is what you would like. You want to know what kids know, not what they don't know.

It also doesn't dictate a type of question.

Since in our world, at least -- when I say our world I am actually thinking I around Lorrie's age, so I am talking about that world. If anyone is in their twenties you have had a little bit different experience, but for the most part all of us took multiple choice tests, correct, with 137 the exception of maybe I remember doing back when I was getting into college, they flirted with writing essays.

So, I do remember doing that, but for the most part a standardized assessment doesn't mean one kind of item type.

It can mean a variety of things and we are going to talk about that.

Standards, I hope everyone understands, is the articulation of the expectations that are valued and in this country every state kind of decides for themselves.

It used to be every school district -- no, no, no, no. I will take it back. Every school decided for themselves.

Okay? Every classroom decided for themselves. So, when we are moving to this idea of people coming to consensus about what is valued, it would make sense particularly with kids who -- there are so many children who live in this mobile society where they move in one year from place to place and

I don't want to spend a lot of time on that. But the idea is that these are a set of expectations that people -- for the most part, about half the space in the country, actually less -- I think there are only 14 or 15, who have actually articulated those standards grade to grade.

Most have had grade groups -- so many states kind of were in the grade grouping arena, very general standards as Andy pointed out. You could almost put anything inside most of them and that is what all the reviews said. I am 138 going to go past that next definition and go to grade-level expectations because that is a word that is being used in the field that clings to "No Child Left Behind" and that basically is let's take those state standards to grade groupings and say what that means at third grade, fourth grade, fifth grade, sixth grade, seventh.

So, if I am talking about grade-level expectations, that is what I am talking about. The last thing is grade-level assessments, which for many states, almost all the states here, with the exception of Michigan

-- has Michigan just had grade-level assessments or -- okay. I don't remember right now. I thought I had all that, but I don't now. But for the most part, states have not been administering grade-level assessments because all you have had was grade cluster standards. So, the game is nil.

So, we have got the general lingo. We have got large-scale assessments. The one I stuck in the middle, standards-based large-scale assessments, just means that the large-scale assessments that a line -- and I am putting

"a line" in quotes because we have already learned today that if you look at any of the literature that has come out, very few state assessments are really following a line for their state tests. There is literature out on that.

So, that is a standards-based assessment. Now we are going 139 to see if the system works.

What is the primary goal of MSP? My understanding is the primary goal is to improve student learning in mathematics and science. Okay? I think we can all agree on that. What are the -- by the way, we all have the same -- I thought that was just a Vermont school house.

Here are the givens. All schools within the MSP are responsible for improving student learning in mathematics and science as it relates to your state standards and grade-level expectations. All schools are responsible for improving student learning in relations on state tests. That is the first time that kind of reared its horn this morning was right before lunch.

By the way, you know, what am I really responsible for here. These are givens. All schools are responsible for making adequate yearly progress and Jim mentioned that this morning. If you are not familiar with that, it basically says over time, I have got to a certain

13 years or 12 years and I have got to get all my kids proficient and I have to make certain progress every year and if I don't make certain progress, I get identified as a low-performing school and, you know, it kind of goes on.

There is a lot of pressure on schools. Everybody agrees. The state-level assessment is currently the basis for adequate yearly progress. I suspect one measure of the 140 effectiveness at NSF in the education department will probably -- if they haven't already, I am going to be -- I am fairly certain they are going to say one measure of your effectiveness is increasing the number of kids who become proficient. Agreed? Okay.

So, if these are givens, then what are the implications for MSPs? Now, remember, some implications for MSPs. Okay? Yes?

AUDIENCE: [Comment off microphone.]

DR. PETIT: That is wonderful. I did not know that. That is wonderful. Excellent. So, you don't have to make it -- I will repeat it, Nancy, and see if I got it.

Okay?

Their MSP was asked to set their own targets.

So, they don't have to -- the MSP doesn't necessarily have to meet the AYP -- love this, MSP AYP -- the adequate yearly progress. They have to meet the goals that they have set. Is that true of the other MSPs? That is terrific. However, it doesn't deny the fact that the schools in your MSPs have to make adequate yearly progress.

So, that doesn't go away even though -- and it is wonderful, Nancy, that you are able to do that, but the reality is the schools within it -- and that is a tension in the system. That is real.

So, what are the implications? What are some 141 implications? One is the quality of grade-level assessment will impact your work. The quality of grade-level expectation will impact your work. You get what is happening here? Oh, and by the way, large-scale assessment cannot provide all the information that is needed to improve student learning in math and science. Those are some pretty big implications. Oh, you missed that. I have this clever thing where I shade out part of it. You will see it again.

So, what I thought I would do is a couple of things today. We are going to look at those three things from a couple perspectives. The first one about the large- scale assessments, I am going to tell you what is happening real time in states across the country and I am going to let you know about it because if you are not paying attention, you should. Okay?

Now, remember, in my current job, I work in states across -- in the organization I work in, we work with about 14, 15 different states and different everywhere, from Alaska to New England, Louisiana. So, we are very familiar with what is kind of going on and the policy shift and the real world. Remember, I spent four years in policy. So, I can tell you what that world is like a little bit.

So, I am going to talk about these topics. I am 142 not going to talk about them here. The second to last one

I was going to talk about, but it is probably a two- or three-day session and maybe needs to be done some other time, about standard setting, the implications and the design of the assessments and the implication costs, the last one, implementation costs, just keep that in the back of your mind because it will impact everything we talk about.

So, the next slide is new. You will recognize it from Andy. So, the first thing -- I wasn't going to talk about alignment at all because I figured Andy had done that and Andy actually did do it, except for one thing.

Wouldn't it be wonderful if we thought about alignment by design, as we were building the grade-level expectations?

As the assessments were built, people would say what do I really want here? What do I mean by line slope? Do I want them just to memorize something about it? Do I want them to be doing conjecture and at what grade?

We are going to come back to that. You don't actually have this slide because this was 6:00 this morning, but we are going to come back to this a little bit later on. Now, here are some big deal things that are going on in your states and a lot of people thought all this would be settling out, okay, that it would all be resolved by the time we got to these meetings. But I can 143 tell you from e-mails that I have received this past week, these things are still flying around and undecided.

So, you have this basic idea of you have a set of grade-level expectations. We assume that this is curriculum around which those set of expectations are built. Correct? That is kind of our underlying assumption.

However, when you get past this -- oh, no, I do have to tell you about something. It is a little thing about "No Child Left Behind" and that little thing has to be that before school starts of the year following the assessment, the Department of Education has to inform parents that their school was identified as low performing, that they had access choice. They can go to other schools.

You have to do it before the school year.

Now, I don't know proportionally where I got that. I think probably you can go closer to September and the reality is going to be even harder. However, that has put a new pressure on the quality of the assessment that you as leaders, I urge you to pay attention to it. So, let me just show you. Here we go.

If you assess early, you have enough time to get assessments back. You can do rich assessments. You can have extended response questions. However, kids don't have the time to learn the full curriculum. Now, that has kind of been going on for awhile. Most of our states assess in 144 March, April, some even in February and everyone kind of let's that slide. Now with the high accountability, there is a little more pressure.

What if we assess late? Let's give the kids the full curriculum, but I have to get results back before

September. So, what does that mean for the kinds of questions I can include on the assessment? Well, is there any constructed response, that is the question in which the kids explain their reasoning in any way or solve complex problems, if there are any, they are going to be pretty limited. For the most part, there are discussions in some states about, well, it is -- and they are policy discussions. I actually got an e-mail from a colleague, a math colleague in another part of the country this week that said, help, help, where is the research? Where is the research because the legislature is getting pressure from the schools to have a full curriculum. They want to administer the assessment late in the year.

Then the state department is saying, well, the only way we could have the turnaround if we did it that late is multiple choice and the legislature says so what.

Do multiple choice. So, please pay attention. It is a really quick thing.

Here are some things we have seen going on, just so you know some strategies. On the left-hand side we have 145 opportunity to learn and item type. So, test early, but redefine the content to be assessed, considering the time of year. Before, when I was working on this last week, two or three times, poor Janet got two or three versions of this last week, I put out a call to a lot of states to say what is actually going on and I think Nevada was a state that said if I got this right, that they said they are kind of reshifting what would be expected by the time the assessment goes on. But they still value this other part of the assessment.

Test in early fall. There are some states that do that already. Indiana is one. Actually the tristate

New England -- if you don't know about this, these states in New England, who have decided to meet the requirements of "No Child Left Behind" together, which is a remarkable feat, very exciting. It is Vermont, New Hampshire, and

Rhode Island. So, they have decided to go that route. We can talk a long time philosophically the ups and downs of it, but they have kind of gotten comfortable with it.

Late April, early May and have a very tight return schedule and just keep your fingers crossed and some states are doing that. Another thing that actually came out of some South Carolina research is they were testing late in the year, including multiple choice and constructive response. They were identifying kids who 146 might need summer school, based on the multiple choice questions. So, there are different things going on to try to deal with this issue, but it is a real one.

The alternatives that either do not maintain opportunity to learn or mix of item type is test early but don't adjust the content demand or test late and do not include constructive response questions. Now, I don't know if I have the whole set. I suspect I don't. We have 50 states. I suspect there are many more combinations of responses than what is up there, but the big idea here is it does matter about this, when the assessment is administered and when they are -- and it matters the most about this thing, which is the item type.

I think yesterday and again today, Lorrie made a pretty good argument about the importance of paying attention to not keeping to one type of item type. Which is one of the reasons why I actually got Lorrie's data, just to remind you yesterday, about this. I will give you an example. Texas actually -- I had visited some Texas schools and they have got a lot of hard work and many of them believe that their accountability has really made a difference. They particularly believe that it has made a difference in their ESL (English as a Second Language) population because they were paying attention to what was going on with that population like it hadn't been prior to 147 that.

But here is what I found in some elementary schools in Texas and we have actually heard -- my friend,

Beth, said, oh, yes, I heard about this. I call it the

Sharon Wells effect. Sharon Wells was a teacher, who with other teachers built a mathematics curriculum built around multiple choice questions. So, if you ask what the curriculum was in the school, the teachers would hand you a notebook this thick with multiple choice questions and you would say no, no, no, no, curriculum. Yes, yes, yes, curriculum.

So, item type really matters, depending upon what you value for mathematics and science. I think that is what Lorrie was trying -- struggling with up here because we all want to believe what Lorrie said and if we really teach the understanding, then it will generalize no matter what happens on the assessment. Unless something is wrong, my understanding is there is not any research that supports that at this time. Is that correct or not? That if you teach for deep understanding, that they will do well in the state assessment, regardless of -- this is research. I wanted to clarify that because I thought at one point this morning and I -- so, this whole piece that Lorrie was talking about this morning, you needs to pay attention to the kinds of items, what you are actually assessing. 148 So, let me give you an example of another way that item types matter.

Yes?

AUDIENCE: What was the point of the previous slide?

DR. PETIT: Oh, okay. The point of the previous slide -- which one? That one? I brought that up again when we are talking about item types and our science linkage, my experience in Texas, where actually there is a curriculum and I don't know exactly how widespread it is, but I think it is fairly widespread in elementary schools of a curriculum built of multiple choice questions. The assessment in Texas was built on multiple choice questions.

So, it is complicated as to whether the kids -- how much more math kids know as opposed to what is the relationship between the amount of math they know and how much more savvy they are about taking multiple choice questions.

That was my point. I don't know the answer to that. I am posing the question.

AUDIENCE: Eighth grade NAEP over those years?

DR. PETIT: Actually that was -- Lorrie had that slide. Did not include that and Maryland actually performed better and had a similar trajectory over those years. Do I remember that correctly?

DR. SHEPARD: When you see the pair of slides, 149 you see the believability of the Maryland results, that it is less of a rise, but it is corroborated by NAEP; whereas, the Texas rise is dramatic but not corroborated by NAEP.

In fact, on NAEP, the Texas rise is no greater than the

Maryland --

AUDIENCE: The Maryland assessment is more open- ended.

DR. PETIT: That is correct. I mean, that is my point and so there are a lot of inferences in there. I just sort of strung a string of inferences out with unanswered questions, but with some data that is kind of behind it. So, I also just as a point, item types also mattered, depending upon in this case the mathematics you actually want to get at. So, on the left-hand side is kind of the goal or the expectation. So, we want to have kids who can generalize patterns and represent them symbolically. So, you give the kids a pattern. You ask them to write a rule symbolically.

I want to contrast that with this item, which would be -- you like my little target. Everything is a little off target. Could be some kids would solve that by generalizing rules. More than likely, kids will use substitutions so that if multiple choice questions are the route you are going to get at certain mathematics and the same will be true of science, you might not get there. 150 So, that is another reason why item types matter and these aren't even complicated items if you are thinking of more cognitively involved questions, where kids have to do proofs or reasoning or they have to solve nonroutine problems, then you are dealing with a whole different --

We are flipping between these ideas. So, let's see if we can keep it together. Where the testing time is is going to matter. It is going to matter for the breadth of curriculum you can really teach and learn and assess and it is going to matter about the potential item types that will be on that assessment.

I am just going to go to another thing because here is another little thing that I have seen happen in some states and actually Lorrie doesn't know it but she really helped me during the summer as we were dealing with one state. Reliability is really related to the length of the test. That is the number of items, number of points.

The stronger -- the more questions you can ask, the better the inference is to a limit. You get to a certain point and depending upon what your framework is, you can't go any further.

So, I arbitrarily put plus points with the items, depending upon what -- how many things you are assessing.

Now, in one state the original design of these tests was to be three hours and the state board says no, we are not 151 going to have a test that long. You are going to have to give us a design that is only one hour long. So, think about that. You need about 50 questions. You need about

50 points. One hour long sounds like 50 multiple choice questions. So, you can kind of begin to see how these things impact.

So, the message is pay attention because these things are going on in states right now as policy makers are making decisions, as there are cost implications.

There are all sorts of complicated, very, very heated discussions that are going on in your states that will impact your work.

This article -- actually I had the article here.

Did any of you see the Science "Lite" article in Ed Week this past week? You ought to take a look at this. This was a summary but I am sure if you go to the website, which

I did not put up there. Doug Carnine and others had actually done a 30-state study on science assessments that is presently out there. They looked at standards and I think they should look at items from 22 of the assessments had actually released items and there is -- I am sure the study has more detail, but they talk about three general things they saw.

These are generalized. It doesn't mean the assessment in your state exactly, although I did hear Donna 152 talk about, if I am right, about the low-level questions that they saw in these field trials in Pennsylvania maybe.

I don't remember what that was.

So, here we see things they brought up. Science

"Lite," that is at high school, you are assessing kids on middle school content. At middle school, you are assessing them on elementary content and at elementary school I am not sure. But they said that was something you need to pay attention of. I am hoping that the group that is working hard at NRC to put this next report together will provide guidance to deal with some of these things, items that don't require science.

This is a famous thing to do in both science and social studies, in particular. Tell them what the content is to be fair and have them answer a question that you have to be able to read to do. You don't have to know science.

So, that is an example up there. You actually don't have this particular slide also. Maybe if we need to get additional slides, if people are interested, but I encourage you to go to last week's Ed Week because the article is in there.

Items that have content, not having good distractors, not having a whole bunch of different technical costs, so that is something you also need to pay attention to. Now, this report is going to come out from 153 NRC. The hope is that it is going to impact science as it is being developed. This is a couple years out. Many states already have science in place. Many states are trying to adjust to the new grade-level expectations or however they are doing it, but you need, again, to take a look at this stuff.

This is something I have personally become very interested in because I am worried about it. This is a personal thing -- professional, it is not personal really.

It is professional -- a universal design in assessment is a wonderful idea. Here is what it means. A universal design is basically a -- Martha Serlow (auditory unclear) and others out of the University of Minnesota have been working on this idea that came from the world of architecture, where you would design buildings so that there was access for everyone.

We all know the results of that work. Universal design, there are very few buildings now we enter that there is not access for people with almost any type of handicap and that is -- this work that is now impacting assessment is coming out of that idea, saying can we build assessment that provides access to almost all kids and we know there will be a population and generally in the rhetoric, we are talking about 98 percent of the kids.

So, can you build an assessment to do that? Now, 154 the goal is wonderful. The criteria that they use that I am not going over right now, but you should look this up on the NCEO (National Center on Educational Outcomes)website, just so you are aware of this. It is about proving group questions. It is about doing the right thing with questions. However, what is happening is two things. One, they don't have examples in your discipline. As a matter of fact, the work that Carl Lager and I are doing, it is the first examples. They actually have an NCEO of examples of how you can take items that have flaws in them and what we call conserving the math construct. Then you can actually look at the items or design the items and really make explicit decisions about what goes in.

I implore you to pay attention. I have looked across a lot of items in a lot of states and I am seeing this play out in very funny ways. In some places, people are thinking, well, just take the context out and it will be accessible for everyone. Well, unfortunately, in mathematics if you take the context out for a lot of good mathematics, you need the context to hang on so if a kid is going to give a good explanation -- the context is why you use the math.

So, there are things like that going on. There are things -- one place I was at, I looked at an item, which was a typical grid question, where you had to shade 155 one-fourth of it. We all know this. Okay. So, the question was shade one fourth. Now, on the surface, say, okay, shade one fourth. Do I shade the number one-fourth or what? But the truth is that you could get away with that. You could get to field trials. Most kids will shade one fourth of the grid, but mathematically you cannot conserve the construct. What are you shading? One fourth of the whole. So, you need to say what you are shading.

So, that is a very simplistic example. I don't have time to go into it, but, again, I encourage you to pay attention to what is happening. I know how it is playing out in math. I can't imagine how it is going to play out in science. I think you kind of get the point.

So, please take a look and pay attention to that.

So, those are my messages. You can see these are some implications, but those are some pretty important things that are going on. The time for testing, the implications for what it has on item type. The amount of time the test is has implications for item type. The kind of item type has implications for the level of cognitive demand that you can give. Okay?

So, those are just some of the issues. There are a lot more out there.

This next point, which now we are back to still another implication is about the quality of the grade level 156 expected. The foundation of any assessment or accountability system is based on the student performance on the framework. We all know that, right? I was really thrilled when I heard -- I was with Tim Kurtz one day and he said this inspirational thing. He said, Marge, just make sure that this is the case. We were working with this

-- he said if a school is identified in New Hampshire under

"No Child Left Behind," I want to be sure that the response in the schools is to teach good mathematics. Okay?

So, when you are thinking about the grade-level expectations, this is no small thing that is going on. By the way, if you have not looked on your website or if you are not aware of the progress in your state -- how many of you are actually involved in the work that is going on now in grade-level expectations? Just a couple of you. Pay attention. This is on the website in Pennsylvania as part of "No Child Left Behind," the Pennsylvania Department of

Education is creating a system of clarified standards and eligible content for every grade to be assessed.

This is the grade-level expectations. That is going on now and you could positively impact that. That is what I want the -- the message here is you are in a position because you are a math/science partnership, to have a very positive impact on all these things.

Here are some questions. You need to know about 157 those grade-level expectations. Do they support and promote quality instruction in math and science? Are they coherent? Actually, this next one is directly related to this thing here. You can see those.

Now, I am back to Andy -- I went too fast --

AUDIENCE: If a school is doing badly on that, are they then more likely to be identified as needing improvement? If they are doing well on those testings, are they less likely to -- identified for need for improvement or what we are going to do is -- and then we will ignore that.

DR. PETIT: Can I keep going and hope I can answer that because the latter is not where you should go?

AUDIENCE: I am talking about the government here not --

DR. PETIT: So, let's try to play this out because we have actually struggled with this idea quite a bit because it is a reality and it is not so much that they are not easily assessed, but maybe they shouldn't be assessed on a large scale. A kid actually doing a scientific design, you know, doing a design, that shouldn't be on large scale. You have to do some research. You have to go in after the research. Then you have to make a hypothesis. You have to test. You have to collect some data. You have to set up an experimental design. It is 158 not that it is not easily assessed. It shouldn't be assessed on large scale.

I want to play with Andy's grid for a moment.

Development of grade-level expectations, what if you thought about this alignment by design idea and you took

Andy's little grid and I just took one line and I said for eighth grade, the line slopes and intercepts, I want to make sure I can perform procedures. I can communicate understandings and I can solve nonroutine problems.

So, then I did what I call a 6:00 a.m., GLE -- any of you ever done a 6:00 a.m. anything? It means I wrote it this morning. Based on that, students will demonstrate conceptual understanding of linear relationships with a concentrated change by determining the slope of line, perform procedures, explain the meaning of slope and intercepts and context from a table graph or situation. That is communicating understanding and by solving routine and nonroutine problems involving the relationships between the rate of change, slope, and intercept.

Not bad for 6:00 a.m. Okay? So, the question to

Andy is would that be the kind of expectation that would be clear as you went through your process about what we are looking for. Now, this is hard work because when you are doing it and we actually have worked with some states doing 159 this. We actually use webs. We actually found it easier doing this than the work I had actually been doing. So, after yesterday's -- seeing Andy's domains there, but when

I went to do this, it actually made it easier to do this.

I thought there were some things missing and I could tell

Andy what was missing that was bothering me yesterday.

But it actually helped me to kind of clarify it.

He just said, yes, line, slope and intercept are important.

Oh, and by the way, line, slope, and intercept are important in seventh grade, too. I don't think at seventh grade we need to solve nonroutine problems. So, you can see how this idea of building this strand across, kids developing the concepts, what you assess at certain grades.

So, it is not just about what is the content. It is the interaction of the content with the cognitive demand.

So, now, how does this apply to you? You are working locally with curriculum. You have an opportunity.

Someone said how can I -- you know, I only have three more years left. How can I make a difference? Leave a legacy behind of a really firm foundation because what I know from my work after working on the math portfolio 13 years ago, working in SSI, working in the department, there are lots of things I am no longer involved in. I am thrilled to see what is still going on.

So, here is one thing you can do locally in your 160 district as you are trying to build curriculum, have teachers think about this idea of the interaction of the content with the cognitive demand. Build it right into your grade-level expectations and your local curriculum and that is a solid legacy. That is some real action that you can do.

I have got my diagram. I have had this for months. It is the same issue that there are some things that are more appropriately -- that you can assess on a large-scale assessment and there is a broader amount of stuff -- and sometimes, by the way, it is not even in the standards. It is something that you locally value, by the way, that becomes part of your specification for local classroom assessment. I actually believe it is a larger set than just those things that are not easily assessable.

We are going to work through an exercise in a minute.

I think there are two characterizations, two characteristics. If you think about the test specification ones -- and I am going to call them test specifications because that means it is on the test. Okay? I am going to think about things that must be assessable in a nondemand, large-scale setting. That is a given. I cannot do an experimental design and a real one, I can't do the real thing, right, in a large-scale assessment.

The second thing is it has got to be a 161 prioritized set. We have had a lot of talk about that.

Jim Popham talks about prioritized set and we have had a lot of discussion about that and we are going to actually try it. We are actually going to try an exercise in a minute. I am nervous, but we are going to do it.

On the right-hand side, what is in that local curriculum. Well, the local curriculum set, which by the way also includes whatever is on the state assessment, has to be -- include context still is not easily assessed, like experimental design. It includes foundational skills as they develop across the base. So, they will have to assess everything -- every year I can't do it. I don't have enough test base. I somehow have to prioritize. I have to get somehow to big ideas so that if I can get to the big ideas like proportionality of middle school and if I am not doing some smaller things, by the mere nature of those, a big idea, hopefully it will carry the rest through.

So, I gave some examples. Here is one example on the test specification. I can give a representation. They can solve problems. They can formulate conclusions. They can justify conclusions. They can do all those things. On a large-scale assessment, you cannot do a statistical study. You can't do it. You can pretend at it. You can ask questions about the design of someone's experimental study and where the flaws were and that is a good thing to 162 ask. You can't do it.

So, my set out here includes those kinds of things, plus developing kind of ideas, concepts that go across grades. So, there was a lot of discussion -- actually, I don't remember if it was Mike or someone else who said, you know, U.S. education is really great because kids know if I didn't learn it the first time, it is just going to come around again, right. We have found this over and over again because we actually don't prioritize. We are afraid to.

As a matter of fact, the groups that I have worked with over the past year, when we have experimented with prioritizing -- I had a lot of discussions with Jim

Popham and I said, Jim, what is your idea on actually how to make this happen? He said, well, go for the big ideas.

I said, well, that is one way. So, you had some ideas.

But we have actually evolved a couple of strategies that I am going to share with you and maybe you can take home and use, an actual tool.

There are two things here -- first of all, I have to have enough space in my test. So, let's go back to my

50-question test, right? That is 50 questions. Now, I know one state that has 152 objectives in a grade. Fifty questions. So, are you going to do it? I don't think so.

Someone suggested this morning that this thing kind of 163 moves around. That is a different idea. That is a different set. That is just too much.

The second thing, which is really important and a colleague of mine, Scott (unclear audio), who actually

Lorrie knows really well, he challenged me at one point as we have been working. He said, well, Marge, this is great that you are prioritizing for the test, but you need to make sure everyone could teach and kids could learn what is in there.

So, here is the deal. Here is the idea behind prioritization. This is really important because -- well, it just is. So, -- that developed the -- is the same group that makes the sampling decision when the test is developed. So, here you are. You are sitting in a group.

You have been developing all this. You are thinking, oh, this a really important concept and then it goes to the test developer and they kind of slap some things around and, by golly, the thing you thought most important never shows up on that test.

So, often, if not always, content teams put more in than you can possibly learn in a year or be reasonably assessed because they are so afraid of not having everything in. This is from experience. This is from experience when I was part of developing Vermont's standards. Oh, my God, we can't leave that out. This is 164 my experience these years as I have worked with a number of states as we have tried to work to this prioritization.

Here is another thing I do. If content teams developing the (audio unclear) GLEs understood as sampling decisions were made, they would be willing to prioritize.

Some GLEs more appropriately assess at the school and classroom level. By the way, you can do this prioritization without narrowing the curriculum. You can do it.

We developed a set of questions. We kind of went with Jim Popham. They are okay. The first one has got to be if I am going to prioritize and make choices -- by the way, you have to have two pieces of paper. One is the presentation that I am doing, that I am hoping you have, and the second is a piece of paper called Forced Choices.

Did that get passed out? Did people get that or is it still out on the table outside?

So, we developed this series of questions. We included more questions as we learned more about the strategy. So, if you didn't pick it up off the table this morning, you didn't get it. So, they are going to bring it in. So, here are the questions and I think you do have this in your overhead. As a matter of fact, I know you do.

The first one is the concept or skill part of the big idea, like proportionality? The second is the success 165 on the concept or skill in a given grade essential for success in mathematics in subsequent grades? So, do I need to understand about exponents and roots before I get to eighth grade? You better believe I better understand it because by eighth grade I am doing some complicated formulas with volume and surface area.

I am also doing some work around the Pythagorean theorem. So, I don't want till wait to eighth grade to teach that. I don't want to wait till eighth grade to know if kids understand that.

The next one is should the concept or skill be assessed at an earlier grade because success is important for success at the given grade? Now a real tendency is for groups to say this, well, let's see, at fourth grade I had better leave this in. I better leave it in fifth grade just in case I didn't get it and I had better leave it in sixth grade. As soon as you start dragging everything along, what happens to the number of expectations? It increases. As it increases, any information you are going to get is going to be more -- they are not going to get as targeted information.

So, you can kind of read through these or I can keep talking about them, either way. While they are passing this out, I guess I should keep talking about them.

By the way, is the concept or skill already assessed 166 adequately? Give it a rest. We don't need to do it anymore.

The next one, should it be done at a later grade?

And the next one, is it subsumed? -- this is the famous one. In eighth grade you have a question that is like three to the third, okay. So, you have a multiple choice.

You know, what is its value or what does it mean? Believe me, if kids can't do three to the third, they are not going to be able to do a large portion of an eighth grade test because that should be a pretty fluent thing by the time they get to eighth grade.

Is the concept or skill essential for understanding concepts in other disciplines at other grades? So, I am going to give you the big idea and tell you how you can actually do this because you are not going to be able to do in the five minutes we have left. Okay.

If you open that Forced Choices activity and you look at the list of concepts, you can see -- and this actually happened in one state I worked in this year, that they put a list and I think there are 23 or 25 different things at eighth grade in numbers, in number, I wanted to tell you at eighth grade that they felt should be in the grade-level expectations.

I said okay. By the way, you have algebra and geometry and statistics and probability and, by the way, 167 you have already told me that algebra and geometry should be more heavily assessed. So, now I am going to tell you have ten items. Remember I had the 50 items. You have ten to assess that list of 23 or 24. Which ones do you want in?

Now, you shouldn't make arbitrary decisions. The decision should be made on the questions that you have.

So, just to give you an example and I have already beaten exponents quite a bit. So, you get where this is coming from. But when the group did it and they were saying how many items, they said none. Why? And it wasn't arbitrary.

They didn't just say slash, slash. They had to give a rationale. One in the -- and the 3 and the 6 represent if you went numerically on those 1 though 8 questions.

Applying to grade when finding area and volume and one is applying the Pythagorean theorem to other content strands and, boy, understanding better and assessing earlier.

That is how you can use this for thinking about your test case. Now think about time to teach and learn.

The way we have done it is done the same thing, except we have all the content strands now. We have all the GLEs that they have. Now we are saying you have 150 instructional base. Now, I know that is not true. But maybe it is. Think about the way your school is run, the interruptions that you have, other things, other assessments you are going to do, your own test. 168 So, let's think 150 instructional base. I want you to think about where you think the emphasis in this curriculum is and now I want you to distribute that 150 base and tell me where you want to put your emphasis. That is also a pretty horrible exercise and actually works pretty well as people have gone through it, to try to help them prioritize.

By the way, in that long list of things, just as an FYI, in the group that did this first exercise -- and the first time I did it because I actually did this -- I couldn't figure out how to get them to understand that they couldn't do all this. Okay. And it didn't make sense and when they went through it, it was ah ha. What got left in was more proportion, things around percent, understanding about percent increase and decrease in eighth grade period.

Everything else was either assessed in a different way through another content strand or assessed some other way.

I am out of time. Here is a quote from Lorrie that I also didn't put in last night. Large-scale assessments are not valuable for providing detailed information about individual student learning. You know about this problem; too late and too little information and we cannot pretend and everyone keeps saying -- wanting large-scale assessments do more than it can do. It cannot do all those things. Can't do it. 169 So, what you need to think about is how you would have a balanced system. I was going to do this as an exercise. I was kind of dreaming that we had enough time.

But if you think about a system that has -- I have a large- scale state assessment. It is part of what I have to do.

I might have a large-scale local assessment. So, I have something I value. So, maybe I use balanced assessment.

Maybe I use some phase developed kind of thing that is shared with schools.

I have classroom summative assessments and I have formative assessments, which is in part a grading, which is part of getting information, which kind of go along with --

Dylan and I had an interesting discussion, which I don't have time to talk about right here. But there are different levels of formative. If you said to yourself what are the uses in your MSP and what is the level of influence on your MSP, I think this would be a really important exercise to -- not try this as a group, but I suspect that if you are thinking about what you are going to -- what you are actually going to be focusing on and you think about what you are focusing on now and how you can shift to the level of influence that it should have to have assessments for learning.

Actually I was going to do an advertisement. I actually do a thing on this next session on classroom 170 assessment, where we are doing a thing on assessing -- real formative assessment. You are welcome to come, but I do want to do an advertisement for this little book here, which you actually have on the CD, which is "Assessment in

Support of Instruction" and it really talks about -- it is kind of a distillation of a lot of the findings in everything from "Adding It Up" to "Knowing What Students

Know," for how people learn. You can go across them -- there are what, Lorrie, 12, 13, 14 documents that were used to come to that distillation?

It talks about these are the characteristics of what a comprehensive coherent, balanced assessment should be.

So, with no further ado, here are the questions you need to know. What do I know? What do I need to find out? What actions should I take? These are the things you need to think about. They are actually on -- and I really encourage you if you want to leave a legacy and when your

MSP is gone, really find out and get involved in what is happening in the development of your state-level assessments and the grade level.

That was fast, furious and too much.

[Applause.]

DR. COLVARD: We probably have time for one or two questions. I sense that you just can't leave without a 171 couple anyway.

So, questions?

AUDIENCE: [Comment off microphone.]

DR. PETIT: Sure I will. If you truly prioritize being very careful and intentional about it, it is not this arbitrary, well, let's not teach that this year. If you go through and you say, well, proportionality is the big idea in middle school, what are the related contents? Let's make sure they are. So, it is everything in number from percent to ratios, the proportions. Then I go to algebra and its linear relationships and linear equations. Then I go down to -- and I know it is about histograms and I know it is about -- I am going to brain drain right now.

But you can keep going down. You make sure the big ideas are in there. Okay? Then you go to this next prioritized set and the next questions were not about eliminating things that would never get taught. They are about things that could happen earlier, that could happen later, that are subsumed other places. So, it is not like you are taking stuff out. You are just moving things around and making sure you are putting your emphasis on the big ideas.

So, that is a hypothesis that you can prioritize in that way, but it has to be intentional and whether we have the right eight or nine questions will remain to be 172 seen. But it seems to have worked pretty effectively with groups that we have used.

Did I answer or did I avoid?

AUDIENCE: You did as well as you could.

MS. BUNT: Nancy Bunt, Southwest Pennsylvania.

I wanted to clarify. We are involved in the state-level assessment. They actually had a day -- three days at the beginning of January, the second week, that they invited math clubs from across the state to come and they used a very similar tool as to what Jim was talking about before in terms of cognitive demands and the level of thing. And we gave feedback, and whether the feedback is heard is another question.

But I wanted to ask whether in terms of prioritization -- we have developed a mathematics curriculum framework where we have used -- the knowledge web. You have got your big idea. You have got the essential learning and the key components in the building blocks, which we have offered to the state as a framework with the grade-level guides. Our state only had grade bands rather than grade levels. We did it by grade level by grade level. Really, the Japanese aren't teaching any less than we are teaching. They are teaching it in a coherent way. We do need to let go of some things at some grade levels and go on. 173 DR. PETIT: One thing I didn't mention is when people do this, they actually spend some time reading research. So, we bring research in that makes -- so, it is not an arbitrary thing. They are into cognitive research in a way that teachers haven't been in it before and actually appreciate it.

I was in one state last week and they said, you know, we have done all this stuff before and we have never actually read what the research said about what kids should be learning. So, it is not an arbitrary process. So, hopefully, it will result in a good coherent bridge.

DR. COLVARD: Okay. The agenda says that we would convene in our breakout groups at 2:15. How about we convene at 2:20, give you a few more minutes for a break.

Then at the conclusion of the breakout sessions, there will be another small break and we will meet back in here at

4:00. At that point you will have an opportunity to interact as MSP groups and to talk with any of the facilitators. Everybody will be available, except for

Lorrie.

MR. LABOV: Just very quickly, for those people who are part of the feedback panel, if you are wondering about evaluation, we haven't said much about it, but we have a model where we have a professional evaluator, who is working with us to help us improve these workshops. There 174 is a representative from almost every MSP, who is part of that feedback panel. The panel will be meeting at 5 o'clock in the Collaboration Room on the 11th floor and we will feed you for that.

It is really important that you be there if you are already on this panel because we really do need the feedback. I can tell you that a great deal of what we have changed here -- and this was based upon the feedback from our last workshop in July and it made a profound difference, I think, in helping us to know what it is that you need.

I really do want to thank all of the people who are willing to give their time to this process to help us.

One other thing is that Diane Ebert-May, who is going to be talking about concept mapping, asked me to just get a general sense of people, even though you may not be going.

How many of you are actually familiar with the idea of concept mapping? Okay. An awful lot. How many of you use concept maps for teaching tools or assessment tools? Pretty good.

If a concept mapping tool is easily accessible on the web, which she will make available to you, would that influence your decision to use them? Would you use them if they were available? Yes. 175 Thank you very much.

[Breakout groups.]