National Committee on Vital and Health Statistics

Total Page:16

File Type:pdf, Size:1020Kb

National Committee on Vital and Health Statistics

The National Academies National Research Council National Science Resources Center

Math/Science Partnerships Workshop Assessment of Student Learning

February 1, 2004

The Keck Center 500 Fifth Street, NW, Room 100 Washington, DC

Proceedings by:

CASET Associates, Ltd. 2

10201 Lee Highway, suite 160 Fairfax, Virginia 22030 (703) 352-0091 TABLE OF CONTENTS

Page

Opening Remarks

Dr. George 1 Dr. Orland 1 Dr. VanderPutten 5 Dr. Labov 6 Dr. Shuler 18

Overview of Workshop

Dr. George 20

Assessment as a Primary Means for Promoting Student Learning

Dr. Shepard 24

What Assessment Issues are MSPs Currently Confronting? Panel: Two MSP Teams Discuss Assessment Decisions

Dr. Williams and Dr. Poland 58 Dr. Bunt and Dr. Kelly 68 Dr. Kestner 84

An Assessment Exercise

Dr. Porter 95

Debriefing of Assessment Exercise and Participant Discussion

Dr. Porter 130 4

Dr. Shepard 135 MSP Team Panel 1 P R O C E E D I N G S [1:05 p.m.]

Agenda Item: Opening Remarks - Dr. George

DR. GEORGE: I’m Mel George, I’m the chair of the study committee the NRC appointed to self-organize the intellectual content of this workshop and it’s a pleasure to welcome you here. We have a lot of people still in the process of arriving but I will start on time and we’ll try to keep us on time; as Lee Allen said, half of success is showing up and the group that showed up, there’s no need to wait for the others.

So my job is simply to welcome you briefly, and then at the end of the introduction session I’ll say some more about the overview of the workshop; but first of all

I’m going to introduce the other people who are sitting up here to address you briefly. And I will start with Martin

Orland, who is the director of the Center for Education, the National Research Council, which is the parent body of the study group that has put this together.

Agenda Item: Opening Remarks - Dr. Orland

DR. ORLAND: Thanks, Mel. Well on behalf of the

Center for Education I want to welcome you to this event.

I also want to say just a couple of words from your sponsor here and talk to you a bit about what we do here at the

Center for Education at the National Academies and why we think we have some unique value in this very critical time 2 in the area of math and science education and education reform.

A little background on the Academies, as the nation’s premier nongovernmental organization concerned with maintenance and enhancement of science. The National

Academies takes seriously the motto of being the advisors to the nation. If you see our brochures and other materials, that’s what we say: advisors to the nation on science, engineering, and medicine. Now this includes education, and the work of our Center for Education under the National Academies reflects the values and perspectives of the Academies in two distinct ways and I want to mention both of those.

First and most fundamentally we at the center aim to provide a sound scientific basis for important educational decisions. That is, we believe in the concept of a science of education and more specifically that applications of principles of scientific inquiry and results from these kinds of investigations can contribute to better education decision making and ultimately improve teaching and learning.

Second we at the Center for Education have a special obligation to provide this kind of guidance and advice in the areas of science and mathematics education.

Our education system is the well springs through which the 3 nation’s future scientists, engineers, and mathematicians will emerge, and having an adequate and equitable supply of these individuals is critical to our country’s future, as is having a general citizenry with enough basic background in science and math to make informed decisions and function effectively in a society that places an ever-increasing premium on scientific literacy.

So we believe that the Academies can bring a great deal of scientific knowledge to bear on specific questions of interest to educators and educator decision makers. And these include the areas that we’re going to focus on over the next two days--that is, educational assessment and accountability.

In your materials is a CD containing 14 reports from recent years that contain particularly useful information on this topic. They cover the landscape from a variety of angles that we think are relevant to your real world challenges, such as integrating classroom and large- scale assessments, relating or aligning assessments with standards, and ensuring that information from assessments are an integral part of, rather then apart from, the core business of teaching and learning.

But it’s not enough to have great wisdom in books on the shelf or even on a CD. We believe strongly that if this wisdom is to lead to actual improvements in policy and 4 practice we need mechanisms for sharing critical insights and ideas for those responsible for implementing programs in the real world. So that is why we, with the generous support of the National Science Foundation, have created these series of workshops and why over the next two days our goal will be to foster direct communications and interactions between research experts and program implementers that unpack some of the critical knowledge about assessment and accountability contained in that CD.

Working with our sister unit, the National

Science Resource Center, we hope that our efforts will facilitate the development of both common and unique insights that you can use to make your partnership more effective. This is one of four topic areas that are covered by the Academies in MSP workshops. Others focus on issues of cognition and learning, enhancing the quality of math and science teaching, teacher education and professional development, and developing challenging courses of study; and I believe there are registration forms available if you’re interested in any of those, we will be glad to accommodate.

So again, let me welcome you to this workshop.

We trust the next two days will be both enjoyable and productive for all of us. Thank you.

DR. GEORGE: Thanks, Martin. In addition to a 5 stay-at-home parent in the Center for Education, we have a working mother of this group, Elizabeth VanderPutten, and so we’re very pleased to welcome her to say a few words about this workshop and workshop series from the perspective of funders of the MSP program.

Agenda Item: Opening Remarks - Dr. VanderPutten

DR. VANDERPUTTEN: Thank you. On behalf of the

Math and Science Partnership Program, which is conducted by the National Science Foundation and by the Department of

Education, we’re really pleased to welcome you.

Fundamentally we view the MSP program as a research and development effort. We expect that the projects, the partnerships that we fund, will be built on the literature, the extent knowledge, the great work that has been done by several NSF projects, the Department of Education projects, and other research efforts. But we also hope that it will contribute to that body of knowledge.

These series of workshops are particularly exciting because I think they are trying to take some of the most quoted literature, How People Learn, Adding It Up,

Knowing What Students Know, and doing the very, very hard work of figuring out what does this mean for practice. We sometimes say teachers should read literature and research and applied practice, to me that’s almost silly. It is such extraordinarily difficult work that it is great to 6 bring together the experts, you, the experts that the

Academy has brought together, to work on this issue, how do we take research and put it in such a way that we can use it.

So I look forward to participating and hearing all of the talks and hearing the insights that the various partnerships who are here and are going to present, and thank you all for coming.

DR. GEORGE: And thank you Elizabeth for being here and blessing our beginning. Next is Jay Labov. Jay is an old friend and colleague

Agenda Item: Opening Remarks - Dr. Labov

DR. LABOV: Some important logistical information first. If you don’t have a wireless attachment to your computer, we do have wireless in here that’s supposed to be set up but we also have a couple PC stations that you can use to check email and do other sorts of things.

Also, even though you can’t do it today, tomorrow and Tuesday your badges will entitle you to 25 percent discount on any of the reports that are located in the bookstore. As you came in the door, you may have seen it just off to the side, it’s closed today, but will be open on Monday and Tuesday and you can go in there and shop for anything that you’d like.

What I want to do is give you an overview of this 7 place. There are many new faces here, and I’d like to give you an idea of how these workshops fit into the context of the National Research Council and how we structured this and the committee and to point out the incredible work that our committee is doing.

Before I do that, there is one other thing.

There are a number of people who’ve reported to us that they had been unable to open the CD-ROMs that we sent you; particularly if you’re a Macintosh user, although there are other people using Macs who said they’ve had no problems.

So for the people who have not been able to open them, we have a sign-up sheet out at the reception area. If you put your name down there and tell us whether you would like a new CD-ROM which we’ll check out and be sure it works on that, or if you prefer just a hard copy of the things that are on that CD-ROM for the briefing materials, we’ll be sure to get that to you within the next week or so. And we apologize for any inconvenience; we really do think it’s going to work for everybody.

Let me tell you a little bit about the National

Research Council and how all of this was put together for those of you who may not be familiar with what we do here.

The National Research Council was founded in 1916 during

World War I and is part of the National Academies. The

National Academies consist of the National Academy of 8 Sciences, which was founded in 1863 under the Lincoln

Administration, the National Academy of Engineering, and the Institute of Medicine. Those three are honorific societies and they include some of the most prestigious scientists, researchers, engineers and medical personnel in the world. For example, most of the people in this year’s crop of Nobel Laureates are also members of the National

Academy of Sciences. So we have some of the nation’s preeminent science engineers, people in medicine, and we also have many members from overseas, which I’ll show you in just a minute.

Helping the agencies of the federal government ordinarily and Congress and the states to understand some of the issues, the major issues in science and technology, and we also have a special mission to advise government on applications of science and engineering policy. What is it that the evidence now shows and what is lacking and how do we then take that information and translate it into usable government policy? So this is one of the major things that we do. We also advise the government on policy for science engineering and health care, so in other words for the science enterprise itself on ways to advance that.

Now, as I mentioned to you, the National

Academies consist of four organizations. It began with the

National Academy of Sciences, which was incorporated during 9 the Lincoln Administration through an act of Congress in

1863 and in the charter that said that the Academy shall whenever called upon by any department of government to investigate, examine, experiment, and report on any subject of science or art; art at that time was equivalent to technology. However, here’s the kicker, and this is what has caused the Academy to do the work that it does, it’s both the challenge that we face and also the beauty of this place: that the Academy shall receive no compensation whatever for any services to the government of the United

States, and this is the way that we’ve been operating for the past 140 years.

What does this mean? What it means is that we have a lot of members here who are advising the government as part of the committees we put together. Here what we see are the number of members of the National Academy of

Science, National Academy of Engineering, and the Institute of Medicine as of last July, and so you can see that there are many, many people who are working with us. But the

National Research Council is doing most of the work, and the members of the Academies and the Institute of Medicine will work with us.

The National Research Council consists of the operating and research arm of the National Academies where most of the reports that you received, for example, and 10 possibly 250 other reports every year on everything that you can think of related to science, technology, or medicine, are published. And the National Research Council consists of committees that do work with membership from the members of the Academy as well as other preeminent experts throughout the country. Very often these committees are highly interdisciplinary and they look at a project or a problem holistically. And the important thing, given our charter, is that all people who serve on

National Academy committees do so pro bono. So all the members of the steering committee, for example, who helped put this together, over the course of two years were doing so without any compensation whatever. They are doing this because they believe in the process, what we do is to compensate them only for travel, for lodging, we give them what we hope are really good meals and a lot of cookies while they’re here. And they actually do incredible work because our committees will typically range in time commitments anywhere from six months up to two years, sometimes more then that, and people are doing a tremendous amount of service to the nation and to you as a result of this structure and the fact that we can’t compensate people.

So as a result of this, we are not part of the federal government, we are completely independent, we are a private nonprofit corporation and we have a number of very 11 specific rules in our charter and are also recognized by the government who, for example, exempts us from the

Federal Advisory Committee Act, which enables us to be independent completely in development. We strive for balance in all the committees, we have complete control over who goes onto those committees, and we are constantly looking for evidence of bias and conflict of interest, which has to be reported and understood before any committee does its work. And the idea is that we are going to be looking at any issue that the government asks us to look at or that we propose objectively. We ask people, despite their biases based on their particular expertise or discipline, to hang up those biases at the door and to look at what the evidence says. So what we focus on is the evidence, or the lack of evidence in certain cases, that needs to inform policy and future research. It’s incredibly important that we have this autonomy from the government as we do our work.

Our unique strengths are the stature of the

Academies memberships; again, many Nobel Laureates and preeminent scientists, engineers, physicians, people from the medical professions, behavioral and social sciences are members of the Academies. The ability to get the very best people to serve and as you sit here today and for the rest of this time if you meet the people who are on the steering 12 committee I think you’ll see that that’s absolutely true.

The pro bono nature of community service and the fact that we look for conflict of interest so that there can be no evidence that people are trying to gain financially is very important.

We have a special relationship with the government; we are not allowed, for example, to apply for

RFPs that are competitive. We have a separate agreement with the National Science Foundation and the Department of

Education and many of the other federal agencies, master agreements to be able to work with them directly and not compete with others. We have quality assurance and control procedures; as I mentioned the government does not have any say on a member of our committee. They can certainly recommend people but it’s ultimately the governing board here and the president of the National Academy of Sciences that makes the final decision.

As we go through preparing our reports we are allowed, whereas most other organizations are not, to go into executive session--the government sponsors are not allowed to be present as recommendations are formulated.

We have an extensive report review process that will involve many of the people who should be on those committees serving as reviewers, and this is completely an internal review process, very, very diligent and dedicated 13 process, and typically the sponsors receive reports that are going to be released to the public maybe one to three days ahead of the time that they’re released to the public.

So they don’t always know what the reports are going to say, most often they don’t.

Sometimes they’re not happy with what the reports say but, because of the process, because of the pro bono nature of service to the country, those people are able to hold up these reports as the best available evidence at the time for making various kinds of policy decisions. So we’ve maintained fiercely your reputation for independence and objectivity.

Here are various methods of operation. I’m not going to go into them; most of the things that we do are consensus studies, but you’ll see that we have a number of other convening activities and various other kinds of operational programs such as fellowships and associates, research surveys, and these sorts of things.

Just to give you an idea of the scope of this place, these are data through 2000, and the upper line in magenta represents the total number of people who have served as volunteers on committees in any one year, and the lower line in blue shows the typical number of committees that are working at the National Academies at any one time in any given year. It’s enormous. And the staff to 14 maintain all this, we have about 1,100 people who work at the National Academies to support the structure of the committee process, along with many other service organizations like the National Academies Press, and as I said we publish about 250 full reports every year; that’s about one report every working day. It’s a very large enterprise.

We would encourage you to visit us at the

National Academies website (nationalacademies.org) so you can get a lot of information; go to any place that’s in the

National Academies that you need to go. I’d also recommend to you the National Academies Press website www.NAP.edu where we have all of our books online available for reading without cost and a wonderful search engine to help you find topics and reports that you need. If you want to download them you can actually do that but you have to do it page by page, and that’s not really --

On the other hand the CD-ROMs that we’re giving you contain some of the reports that we think are most important for you as MSP grantees to know about and so you can read about them. So the two CDs that you should have received, one on the assessment of student learning and the other one we decided to send all of you, which we’d given to the other group on learning, is one that deals with cognition and learning also because there is particularly 15 one report that’s on here that’s not on the other one, it’s called Adding it Up, How Students Learn Mathematics, that we thought was important because there are also some sets and issues in there.

As you go through these CDs what you’ll see is there are many topics that overlap with each other and are interrelated such as assessing and learning and we wanted to be sure that people understood that these things can be considered separately. And many of the reports actually consider all of them together.

We do have other workshops coming up this year in

2005 and we have a form outside that you can read about that.

Now very quickly, the structure of the workshop is to obtain information from you about your background in dealing with assessment and what your issues are; it’s a pretty broad spectrum, people who are in here, as far as your knowledge and understanding assessment issues. And so what we’re tending to do is to make the plenary sessions more general, and for some of you this may be some review but we understand that because there are many people for whom it will not be review, but the breakout sessions that we’re having will provide you with a great deal of additional depth in a particular area related to assessment. 16 And we have worked with our speakers and facilitators prior to this to be sure that they remember a few things. One is that we’re going to be asking them to cross reference their material with things that are being said by other folks. We had a conference call so that every one basically knows what everybody else is going to be talking about and so there will be some of the cross referencing. We’ve also asked them to reference the reports on the CD-ROMs that you received even though we may be focusing, for example, on what students know, there are many other reports on those CD-ROMs and our speakers have been asked to refer to those where appropriate. And also take into account the responses from you on the surveys that we gave you asking you what your issues are, how much you know about things, and the kinds of things that you’d like to have here. So we’re trying to tailor this as much as possible to your needs.

Some of the sessions will be plenary, others will be interactive, and we are going to record everything here because it’s all going to be on workshop proceedings that will be available on the web for MSP and for everybody else. So I will ask you, I will probably be nagging you, if you forget, that you need to speak into a microphone; we’ll bring one around to you if necessary and we have them on each corner, and also you need to indicate who you are, 17 your name, and institutional affiliation when you do speak.

I want to thank a couple people, a number of people who have agreed to be in a focus group here that will help us to evaluate this workshop and help us make future workshops better; they’re going to be meeting a couple times during this process and we appreciate the extra work that they’re doing. Mel will be thanking the staff, too, but I also want to let you know that they have done yeoman’s work in putting all of this together, you will get to meet them, Janet Garton, who’s sitting in the back, and Terry Holmer who’s probably around the other side, and Linda Depew is helping us today, I thank them very, very much.

So I will now turn this back to Mel, and if you have any questions don’t hesitate to ask me. Thanks.

DR. GEORGE: Thanks, Jay. We want to hear a brief word from the co-principal investigator; there is a partner to the Center for Education in this effort and that’s the National Science Resources Center, which is a joint offspring organization of the National Academy of

Sciences and the Smithsonian. Sally Goetz Shuler, who has worked very closely with us to help put this together will say a word. Sally?

Agenda Item: Opening Remarks - Dr. Shuler

DR. SHULER: I’m just going to speak from here, I 18 want to be cognizant of the time and I know we don’t want to get behind. But I think what’s important for you to know is that we’re an organization that was started in 1985 by the National Academy of Sciences and the Smithsonian

Institution, and our mission is a rather ambitious one-- it’s to improve the learning and teaching of science in the nation’s school districts. So I was one of its cofounders

18 years ago; I’m now in the position of directing the center and we’re in the process of reorganizing some of our activities.

But we’ll tell you that our philosophy and our goals and our values all embrace those of the Academy but in addition the Smithsonian, and I think what is critical for you to keep in mind as you think about our center is that we’ve been eating, drinking, and sleeping partnerships since we were founded in 1985. I have people on my staff who are from the Academy and the Smithsonian, I run on two different fiscal years, I work directly with the Division on Behavioral and Social Sciences with Martin Orland, and I report to the Under Secretary for Science at the

Smithsonian. Our goal as an organization is to leverage the prestige, credibility, research, and best practices of both of those institutions to change the way that science is taught in the nation’s school districts.

We’ve developed a theory of action and this 19 workshop you will see reflects not only that bottom bar, in which we believe in the districts with whom we’re working and currently we’re working with 700 school districts that represent 20 percent of the children in the country. We add an additional 50 to 75 school districts on an ongoing basis, the vast majority of those are not MSPs so you might say that we represent the scale-up side of this formula and that we believe that your work with the MSPs will inform our work as the way to translate and transfer what you’re learning through the MSPs to the broader field that do not have this opportunity.

One of the hallmarks of our work is in the process of building an infrastructure. We’ve been developing with the field expertise around five critical domains that we call a reform model, that we’re learning the core principles of reform, not only in the United

States but we’re learning it’s also applicable throughout the world.

And then finally we can share with you as you begin to work some of the rubrics that we’ve been organizing in partnership with some of the academic institutions that are researching our work and evaluating it, the stages that people undergo as they proceed to implement.

So I’m going to stop there and turn it over to 20 Mel. You have by the way a packet of information about us.

Agenda Item: Overview of Workshop - Dr. George

DR. GEORGE: Thank you, Sally, and I want to take just a couple of minutes and say a little bit more about the overview of the workshop so we understand where the steering committee was coming from.

When we first started talking about the series of workshops that we wanted to put together to carry out the charge that came to us in the grant from the National

Science Foundation, it was very clear that assessment had to be one of the topics we covered, and I want to tell you though that despite the pervasive nature of this topic in today’s America, the committee has come at this from the perspective that what we are primarily interested in is not to increase assessment as such but rather to increase learning. And that has always been the basis on which we have tried to think. That’s why our very first workshop in the series is based on how people learn, and you will hear frequent references I hope to things that come out of how people learn as we talk about assessment because we take the view that the ultimate purpose of assessment has to be to help people learn and so assessment has to integrated into what we know about how people learn. So it’s very appropriate that we will start in just a minute with Lorrie

Shepard’s first presentation, Assessment as a Primary Means 21 for Promoting Student Learning, and that’s certainly an appropriate way to begin given our predilection.

I also want you to know that the steering committee has kept in mind your partnerships; as Sally mentioned, the role of the MSPs consists both of school people and higher education people, and we’ve asked our presenters to try to keep in mind that assessment is actually something that occurs in all educational levels and to try to help make connections wherever possible between the schools and higher education and we hope that they will continue to point out those kinds of things.

The steering committee has met several times in person and by conference call; and I’d like to ask the members of the steering committee who are here, who have contributed their expertise and experience to the design of these workshops and who are incidentally a joy to work with, to please stand and have you thank them for their pro bono work. Many of them are presenters in this workshop, so you will get to hear them in more detail later but those who aren’t will have an opportunity to contribute at the last session. So could I ask my colleagues on the steering committee to stand and let’s thank them.

[Applause.]

Speaking of pro bono work and -- on the other side of the room, I couldn’t help thinking, Jay, of the 22 Peanuts cartoon that was in yesterday’s paper in which

Snoopy is on one side of the tree and Charlie Brown on the other and they’re lying there and Charlie Brown says I wonder when it was in history that dogs first became man’s best friend. And Snoopy thinks to himself, probably soon after the invention of cookies.

Our overall objective in this workshop is to help you do the important job that is contained in the expectations of the grant that was funded by the National

Science Foundation for the Department of Education for your work. We are part of the technical assistance part of the

MSP program, so our entire objective is not to hear ourselves orate but to help you do a better job. That means it’s really important for you to speak up.

Presenters expect to hear from you the things that are on your mind. We appreciate very much your having filled out the surveys which were very general, but if you don’t say I don’t see what that has to do with me, help me, you’re likely not to be able to, so we really encourage you to feel that you are fully participants and we welcome very much your input.

The whole notion of why we are here and why we are doing this is certainly pointed to at the very end in

Mark Kaufman’s presentation, Planning for Change in

Assessment. How can we use together what we discussed here 23 as we leave today to help us do better jobs of assessing to improve learning when we go back home?

So please do speak up, in particular I want to add my thanks to the feedback panel which will help us redesign in any way what’s necessary the next time we do this workshop on assessment. And is Patty Bourexis here, our evaluator who will be working with the feedback panel?

I wanted to introduce her but apparently she is not here yet, but she will be.

So let me close by thanking Janet and Jay and

Terry. If they would come in here please, I really want you to thank them for what they have done; the only task in which they failed is to get the lights on out in the lobby.

We have all learned something that will be an important part of our lives and that is that we’re not all in control of everything. For Janet Garton in the back of the room, where is Terry? --

PARTICIPANT: She is working somewhere.

[Applause.]

DR. GEORGE: Thank you very much and I’ll thank the introductory speakers here and now I’d like to turn to our first plenary session. Lorrie Shepard is professor of research and evaluation methodology and Dean of the School of Education at the University of Colorado, Boulder. I’m not going to read her brief resume which is in your 24 materials; I am going to work under the assumption that all of you can read and will do so. I’d not met Lorrie before, but I was delighted to learn that one of her research interests is on the use and misuse of tests, and most recently she has been focusing exactly on the topic of her talk today, Assessment as a Primary Means for Promoting

Student Learning. So will you welcome Lorrie, please.

[Applause.]

Agenda Item: Assessment as a Primary Means for

Promoting Student Learning - Dr. Shepard

DR. SHEPARD: I’ve not tried this so we’ll have an experiment with the first slide. I’m very, very pleased to be here, it’s really an honor and I want to thank the steering committee especially for the invitation.

DR. LABOV: This is the National Academy of

Science, not of technology. I should also tell you the reason the lights are off out there is that this place is so computer driven that the lights over the weekend are controlled by a computer and we can’t change it.

DR. SHEPARD: I was befuddled when I saw that the clock didn’t tell the right time, so you can imagine it’s just downhill from there.

The purpose of my talk is to provide an overview for the remainder of the workshop and also to focus on classroom assessment but in parallel the changes that 25 needed to occur in large assessment. My focus will be primarily on classroom assessment but you have to think of the two as being coherent, which will be a theme. I’m also hoping to anticipate and set the stage for other presenters, so I’ll try to refer to some of the things you can expect from other players at the workshop, and try to make explicit the connection of things that I will say as a matter of principle, but they come from research findings that are quite extensive in some instances, so I’ll try to make it clear what the basis is for the assertions that I make.

To support learning in classroom assessment has to change in these two fundamentally important ways. We have to change the character of the tasks and assignments that students undertake that will be then the form, if you will, of what we look at to make our judgments about their knowledge and understandings and then we also have to change profoundly how assessment is used in classrooms.

There are traditions in American classrooms that are very well established; the older the children are the more established in their expectations about being evaluated, and those normative understandings have an effect on how assessment influences learning.

Knowing What Students Know asks for, excuse me,

I’m going to have to just take a second, I can’t see the 26 screen from here. So let me make an adjustment.

In parallel to the changes in classroom assessment, there has to be a change that corresponds to how the substance of science and mathematics, each of the disciplines is represented in the large-scale assessments.

These things have to happen in a way that represents or embodies the assessment task in the same way. Knowing What

Students Know uses the term coherence to talk about that kind of alignment for shared representation of what it is that is important to know, and that will be very important in essence for all of us to be on the same page in what students understand they are aspiring to and what teachers are teaching for.

In this country assessment reform began in the late 1980s and 1990s and into the early 1990s in response to evidence of the negative effects of high-stakes testing.

Teacher survey data showed declines in the teaching of science and social studies. Teacher survey data and curriculum studies showed close parallels between test formats and instructional materials. Test score gains as a consequence were shown not to generalize to independent measures of the same achievement. Research on teaching the test effects of this type can be tied back to evidence from the psychological literature on children’s inability to transfer, they can learn it one way and not be able to do 27 it another way if they’re taught in a very mechanical or rote way, and this seems to be exacerbated by certain assessment formats. You probably know of Phil Adler’s films, A Private Universe and Minds of Our Own, the films about after the MIT graduation when someone couldn’t make a circle to make the circuit even with prompting from the interviewer, couldn’t light a bulb with a battery and a single wire.

To give you a sense of the research base for this, I have several slides. This report, issued in 1992, is not an NRC report but it did come from the then operative Office of Technology Assessment that reported to the Congress. And according to the research summarized, it’s a very nice history if you’re interested in the history of testing in the U.S.; and when they talk about the ‘80’s, they report the literature that I’m referring to that shows that a pervasive effect is that test score inflation is a consequence when there’s too much teaching to the test and all the curriculum distortion that I alluded do. It’s important here to realize that it’s not just that science and social studies get left out when there’s too much teaching to a narrowly construed test, but that the way that even the things that are tested, the way that mathematics is conceived, it is only thought of in the format of the test, it is altered in a way that effects 28 learning. And I’ll also cite then the Lake Woebegone effect, where everyone was above average, likely evidence of inflation.

In that same era George Madaus and colleagues from Boston College published this NSF-funded study. I won’t go into all of its details except that it also showed us that these phenomena were worse in urban centers and most effected children who needed schools providing them a rich curriculum and instead we’re getting a very dumbed down curricula. This is a useful reference work.

I can’t believe it, I don’t have control over my own slides but you are being very patient not to comment if you couldn’t see anything.

This is one study that I like to cite because it was done for another purpose, actually Marilyn Koczor who did this study was an avowed behaviorist so she actually thought it was a good thing that came out of this study and offer it to you as evidence of the fact, so there is an interpreted lens here that I worry about.

In this particular randomized experiment, students were taught to learn to translate from Arabic to

Roman Numerals, or from Roman to Arabic, in one group or the other, and they were taught always the same way so one group was taught in one order, the other group was taught in the other order. They never mixed what they were taught 29 until the posttest in this randomized experiment when they then further subdivided the two groups again randomly in half, each of them; and for one group in each of the sets they reversed the order on the posttest. So the groups that had the difficult generalization to make were those that had been learning one way and had to do it the other way. And the effect sizes from this experiment where each was compared with its own half, did they do it the same way or did you do it the other way then you had learned it, were more then a standard deviation. And it was done by different ability groups, but we never in education see effect sizes on the order of a standard deviation and yet this was a profound effect and it replicated across all the different stratum of the study design. So that is what we think is going on when we see test score trajectories that we don’t believe. So when we say that there can be test score inflation as a consequence of too much teaching tests, to me this is a laboratory study that shows you the character of what’s going on when students are not taught for any kind of conceptual understanding but are just taught by rote, in that case to make certain substitutions.

This is a modern day and at one point sort of politically heated finding from a study by the RAND

Corporation, I’m actually comparing here a similar study that was done by my colleague, Bob Linn, where what you see 30 here is over the period 1994 to 2001 trends in Maryland, the bottom line, and Texas, the top and steeply rising line, a report of the test score gains on the respected state assessments. And what you see here in the next slide is the corresponding years on the national assessment for the two states. And this is a different kind of evidence if you’re thinking about the research literatures again, to support the point that we don’t think that the Texas steeply rising gains generalize in the way that the

Maryland test score gains did appear, they were obviously more consistent with what we saw. There’s a whole story about that, about what we think is the difference in the tests or whatever, but these are the kinds of studies, it’s not just one, that make us worry about kids not really being able to do what it appears they can do on certain kinds of assessments.

And the last slide of this type, and this you might have difficulty seeing but you have it in your hands, is just to show you a snapshot out of a study that I was a part of along with Dan Koretz, Bob Linn, Steve Dunbar, this particular paper was done by Bobbi Flexer who did the math part of the larger study where in school districts around the country where there were high-stakes tests we redesigned item by item parallel tests tailored to the same objectives and then equated the test, because open-ended 31 tests are more difficult then closed-form tests, and then evaluated the effects of those test differences in the high-stakes district.

And what you see here is that in the equating district, that’s the E district in the graph at the bottom, there’s naturally a drop off from how those students did on the problem on the left versus how those same students did on the problem on the right. But the point of this study findings across all items was illustrated here by the drop in the B district, which was a high-stakes district, and that loss, that great, relatively greater loss from left to right is the teaching the test effect. I said that really fast so you can go back and puzzle over it; I don’t want to make too much of this but I want to show you that there is a body of work that tells us that some children some of the times in districts that taught only to the problem types on the left could not do the problem type on the right and more kids can do that when neither test was taught.

So still talking about the era of changing assessments, because this has been going on now for a decade, reformers talked about performance assessments, direct assessments, authentic assessments, as the counter to this phenomenon, and that’s been something that I would say people have been working on hard in this country for a while; I mean it’s not very easy to do or we would have 32 made this reform so complete we wouldn’t need to read about it. But these ideas of creating more performance-based assessments were in reaction to the teaching the test literature that I summarized, and these words are intended to convey the idea that assessments need to capture the real learning that we intend if we want to avoid the distorting effects on instruction.

So the single most important shared characteristic between large-scale and classroom assessments should be their alignment with curriculum standards. Here I do not mean, and this is a little jab at commercial test publishers, but it’s an important one if you’re sort of out there in the marketplace I’d like to caution you. Not everyone means the same thing when they say alignment, and Andy Porter is going to talk in much more detail about this topic, but be worried about the commercial test publishers who just do an exercise of creating a matrix of usually the process by the content categories in mathematics let’s say, and then if their items all fit in some cell in that matrix they say they have an aligned test. And I think we need some different language then and I suggest something at the bottom there, maybe the word embodiment, to represent the fuller representation, the fuller connection between what it is you want kids to know and how you engage them in tasks day 33 after day and also in a summary way test that for the extra known -- test. And if it is an impoverished representation, we will continue to replicate the distortion that I’ve outlined in my first few minutes.

So just a sense of what is this embodiment for authenticity or more direct performance-based assessments look like, and I love this example from Pat Thompson at

Vanderbilt, this is one I show over and over and over again and it’s an example of a problem type that you can use as an instructional activity, or you could use it as an assessment device, and kids will learn and kids will think when they do this kind of task. Here’s another, and it comes in two parts. Kathy Comfort was part of the team at the Department of Education in California that developed this task about earthquakes, it’s hands on, the kids have to shift those two tectonic plates nailed in hard board and figure out what’s going to happen when the San Andreas

Fault does its thing. And figure out why in the next instance real physical features in California have certain characteristics that they share but are this much different in where they’re located in California.

A key and consistent feature of these kinds of problem types is that kids have to say what the answer is and they have to explain their thinking. If I could pick one thing that’s happened to assessment as a consequence of 34 the reforms that happened starting in the early 1990s it’s that we ask children to explain their thinking more not just to get points from us but so that we can understand what they’re thinking and that they can get practice in that kind of deep intuitive understanding.

And here I think maybe Marge Petit had something to do with this one, this is Carmaliticus, these are phony creatures that have some shared characteristics and some different characteristics, and the kids have to come up with a phylogenetic tree and explain in some detail -- that go with this. And here’s one student’s paste up, these kids are old enough so it’s safe to use scissors in the assignment and they cut up those creatures and arrange them in ways that have to defend --, a great example of the kinds of problem type that I’m talking about.

So an important point that I said that I want to reiterate, notice that good assessment tasks are interchangeable with good instructional tasks. You can pick any of those that I’ve shown by illustration and use them as a teaching activity, or you could have had a related teaching activity and you could have used these as the culminating assessment where you checked in on kids knowledge but in parentheses, be careful here, because this has to be said because when behaviorists did this they didn’t remember this rule. The task should not be the 35 exact same one because you do want to know if kids can transfer, this is an important point and it goes back to sort of what I said. Marilyn Koczor had a different conclusion as a behaviorist when she saw that the Arabic to

Roman didn’t necessarily teach you the reverse direction, she said well, we’ll do this two separate teaching tasks then and what we say, and I think this is what you come to understand from the cognitive literature on transfer, is that only when students have a principle understanding can they transfer and there is a different way that you teach if you want to support that kind of knowledge generalization.

We also have evidence that teaching to problem types such as these improved learning, and I have just a couple slides about that. This is a problem from the

Maryland assessment that we borrowed from Maryland with their permission and they let us administer it to third graders in a Colorado project where we worked with teachers for a couple of years introducing these kinds of performance assessments in classrooms where kids had only done prose form text problems before and to me it’s obvious, but in the pretest in this district the kids would not do something like write those answers below the table, that was not an instruction that they do that but obviously the space was provided and kids after we had spent a year 36 in their classroom doing lots of such problem types would construct a pattern the way you see it here and they could give explanations.

So these students are as third graders to explain the pattern they see in the table, and I’ve given you two different student answers. We had the same number and some examples, 6 + 6 = 12, and then the next student said if you put one scoop it will make two, and then if you have three scoops it will make six, so every scoop can do, you have to double the number. And this is production that in this working-class neighborhood was not there before we starting asking kids to explain. So in some ways it’s pretty dramatic that you can teach kids to explain and do it for a year they can explain their reasoning.

At a much different level this is a problem from the New Standards Project which Scott Marion, whom some of you know, just finished his dissertation at the University of Colorado, Boulder, and used a whole bunch of different tests, AP tests, SAT 1, SAT 2, PaceSetter tasks, and Open-

Ended Tasks like this, and administered them to AP students and to PaceSetter students. Now anyone who knows those two tests knows that the PaceSetter was developed by ETS to be used with a much more heterogeneous and average population and taugh in open-ended and reform-based ways, so the

PaceSetter students are much less “able” then the AP highly 37 select group, and yet what I’ve strung across the bottom are the effect sizes, differences between AP students and

PaceSetter students on several different measures. Scott actually had more. I’ve just shown a couple of them there for you.

So on the SAT we’re not surprised that there’s a

0.64 standard deviation difference between the PaceSetter and the AP students, we’re not surprised indeed on the calculus test, which the PaceSetter students had not even been exposed to, they’re all juniors by the way. There’s a standard deviation and a half essentially difference between the two. But on problem types like the one I’m showing you about coming up with a way, a thinking way instead of an algorithmic way to talk about the number of shoe laces on shoes of different sizes, the difference between those two groups of students is only 0.21 standard deviation units. So I offer that as evidence that how you teach can make a difference in the amount of difference between students that have typically or historically been in very different strata.

So now I’m switching to the topic I said in slide number two, I’m switching to the idea that in addition to the reforms I’m showing you there with the character of tasks, we also need to think about the processes, the tone, the purpose, the culture if you will of how assessment is 38 used in classrooms.

So back in the late ‘80’s and early 1990’s when in this country assessment reformers were thinking about changing the substance of assessment, I would say that our colleagues in other English-speaking countries were focusing, they were tackling the same problem but they approached it in a very different way, focusing more on this interaction or how assessment is used in classrooms, and trying to come up with a deeper understanding of how formative assessment might be useful to support learning.

So I quote here particularly from Crooks, who wrote a really important review of educational research, review on motivation and cognitive effects, he was among the first rapprochment between those two groups because it didn’t use to be that cognitivists and motivationists ever talked to each other, it’s useful when they do. And Sadler, this is

Roy Sadler from Australia actually, not Phil Sadler from

Harvard, whose model of formative assessment has become very important and it has both a cognitive feedback and a motivational or affective taking responsibility component to it.

The sort of slogan that’s been taken up by these reformers in other English-speaking countries has been to distinguish between assessment of learning and assessment for learning, which is really the purpose of this workshop, 39 how would you change assessment so that it can be for the purpose of learning? And that’s been sort of almost at the

T-shirt level, the message that they have worked very hard to get out in all of the countries in Great Britain.

They also engage the affective aspect of it;

Sadler, for example, said that the long-term exposure of students to defective patterns of formative assessment has developed embedded coping responses that will take ingenuity and patience and time on the part of evaluators to reverse. Perrenoud in Switzerland actually says that every teacher who wants to practice formative assessment must reconstruct the teaching contract, and he takes this up with high school students who have some very bad habits by the time you’re trying to change their minds about the purpose of assessment.

The research literature has made an enormous contribution to this effort; they were self-conscious about it, and the Nuffield Foundation sponsored the undertaking of this massive review that Paul Black and Dylan Wiliam did and I’m just citing several of the main points from their

1998 review. If some of you have seen the little black box that was the popular brochure that they distributed, because they were trying to get inside the black box of what actually goes on in classrooms that supports learning.

They define the formative assessment as occurring when 40 evidence is actually used to adapt the teaching work that is done to meet student needs. Formative assessment experiments, which they summarize in that review, created effect size improvements in learning on the order of 0.4 standard deviations to 0.7 standard deviations. They point out that that is larger then typical interventions.

I myself am a little skeptical of the 0.7, frankly because I think they put everything into the meta analyses that they did, and they sometimes put in things that I would characterize as behaviors such that the outcome measures would have been an awful lot like the pretest, that looked an awful lot like the practice materials, in which case they’re sort of capitalizing in that claim on some of the kind of practices that I was disparaging in the first part of the talk. So I would sort through it and pick the studies, I’ll show you a couple shortly, that require the generalization that we want, and still there are impressive effective size.

The formative assessment model, it’s not surprising that Paul Black along with Mike Atkin were a part of this NRC report that summarizes for you, you have this in your materials I believe in a compendium, Sadler’s formative assessment model and it’s pretty simple. Figure out where you want to go, figure out where you are, and decide what you have to do to get there. In fact it’s not 41 quite that simple, especially since there are issues about how to engage kids in doing that, but that is certainly the framework that you start with.

One thing I would like to do to elaborate that model is to suggest that the formative assessment model outlined there is very like what you know from Vygotsky, how many know Vygotsky? Okay, we have to do a little bit more work. If you don’t know Vygotsky you don’t know the

ZPD. So Vygotsky is a long-dead Soviet psychologist who’s foundational to social cognitive theory, you don’t know that either but it is I think just a theory that takes us further than cognitive psychology in helping us understand how social supports help the internal cognitive development. So it really engages the development of children’s intellectual abilities by accounting for the social supports and then also allowing for the development in the brain or in the head, etc., and this zone of proximal development is a strategic way of looking at how that support is most effective or how it actually happens; and it is this imaginary space that we’re all working in whenever we respond to a child who wants to help with the dishes but we’re afraid they’re going to break them, or whenever we sort of without even stopping correct language, so this model for learning does not have to be formal instruction but we’re always supporting them from what they 42 can do independently to what they can do with assistance until they can do that thing independently. And then ZPD just keeps moving all across the development, it’s just meaning that the nearness to what they can do next with our support.

And if you think about it, you’ve never heard of this before, you think about your own child raising practices I think you can think about how you do this, how you support kids to be able to do independently something that’s out of reach. And Jerome Bruner developed what’s called the term scaffolding to describe that kind of instructional support, and I’m offering to even those of you who know Vygotsky and know the zone of proximal development and scaffolding, think about how that formative assessment paradigm is like scaffolding, because you’re working right in the region of what you don’t even know what the kids know, but as you work with them you figure out what they know and what they don’t know, you build on their strengths and you expand or have them reach so that they can finally do something that they hadn’t been able to do before.

This is a set of bullets that refers to specific assessment strategies within that zone of proximal development that correspond very closely to bullets that come out of how people learn and that Jim Pellegrino will 43 undoubtedly have a chance to elaborate.

And that’s right where, I want to go back, I can’t see that. Let’s just go back and leave the points up there because Jay and I discovered that another technological thing that the Academy had was some kind of barrier to a file that I’d like to hand out but was way, way, way too dense, and as a consequence I had to put my talk into seven sections and one of them didn’t make it.

Now I don’t know how it made it to your printed handouts but it did not make it to the overheads. So if you haven’t been looking at your handouts you might want to look at your handout because I’m going to elaborate on several of these points and just tell you a little about each since I think it will be probably later in the workshop I’m going to be relatively brief.

Prior knowledge is something that you know from how people learn is an important contextualization of any new learning, that the way learning occurs is to connect new learning to old understandings to reinterpret what the new material means in light of existing understandings, and that’s usually captured under the topic of prior knowledge.

And there is an important assessment element to making that connection and supporting students to learn it and to be explicit that that’s what needs to happen. So classroom practices should include assessments of students relevant 44 knowledge and experience, not only to inform teaching but also to get kids in the habit of thinking. When they see something new and challenging, what do I already know that could help me solve this problem. Or how is this like what we’ve already done before. Something that adept learners do and kids who are sort of frightened or ill at ease with challenging curriculums don’t do, but it’s an example of how when that becomes the normative mode of interaction in the classroom you can get many more kids to be able to engage challenging material.

On the next slide, I’ve shown you just some more particulars. It’s interesting to me that where good teachers do this they frequently do not note it as assessment, so especially in the literacy literature there’s a lot about prior knowledge techniques that is done as a healthy part of the start of instruction but not where the kids think they’re being assessed. So a question I would ask is well, couldn’t you note that you were assessing them on that, beginning with cultural shift, to encourage kids to think of assessment as a thing you do to help learn, not to catch them in what they don’t know but to make it visible and shared in the classroom what you’re working on and how what they know or don’t know is going to help you help them learn. So prior knowledge assessment is one occasion to try that, and at the bottom of that slides 45 I cite Louis Moll and his colleagues, this anticipates Bill

Trent’s talk, when you talk about equity issues, Louis

Moll’s funds of knowledge is a strategy that he and his colleagues use to help teachers formally elicit from kids what they already know, from carpentry, from the kitchen, from things that their parents do at work, that are relevant to the problems that they’re going to undertake in clas--again, as a way of creating context and also using as a resource things that kids already know.

The next slide is about feedback; this was hugely reviewed in the Black and Wiliam 1998 review and we have quite a bit of evidence that feedback supports learning, although not always; there are many studies, although as a result of feedback, they are not helpful and so you have to go deeper in that literature to see what really makes a difference. And it makes a difference interestingly enough when you go into that Sadler model and the feedback is about particular features of work that can be improved to help you get there. So if you think about always having in mind where you’re trying to go and then using feedback to specifically improve, that’s what that literature shows us how to do.

And the next slide I picked as an example; this is not only my effort to show you the research base but I pick it because it kind of shows you how you would get 46 started in a practical way. Like if you bought this argument that you had to change the character of how assessment is done in classrooms, what would you do? Well,

I don’t think you would install a huge system for formative assessment. I think you would start by having teachers think of some specific things they could do and here is a study where all they did was help teachers learn how to mark papers in such a way as the advisors, not just good, great, wrong, but here’s how you could improve this particular thing. That’s the intervention. And they saw significant improvements in learning and more importantly they did some gap closing between boys and girls, so the girls who had historically been doing less well then boys benefited the most and that’s something that I cite from

Black and Wiliam in general; what we see is that low- performing students are helped the most by the kind of formative assessment and strategies I’m describing.

Let’s skip quickly over the slide that shows student work. What I’m alluding to here is that all of these practices benefit from the display of thinking, they benefit from making it customary for kids to have to explain, and for the way we interact with students to be about some either verbal or product-based display of what they understand. Not so we can take away points but so that we can engage them with their thinking, often around 47 some artifact, and use that as a way of helping them improve.

The next slide is about transfer and I think a lot of what I’ve been saying about knowledge generalization, which is what we want to support, connects to the idea of the kinds of tasks we use and also to making it a part of classroom practice to extend, so that a student is comfortable doing it this way, I say, well, what about this, and I ask them another way. And I have. Now we’re ready to actually get back into the examples that are on the screen.

I have a couple of examples for you; the first one not quite on the screen yet are a whole bunch of things from Assessing Mathematical Understanding. I think Marilyn

Burns had some things to do with this 1989 where there are a whole bunch of ways of looking at one half. I think the very first time I ever taught, I noticed that kids could do addition, this was in a remedial math class, could do addition if addition problems were separate from subtraction problems and could do subtraction but when they just had to pay attention to the sign they couldn’t do any of them.

I think it was the first time a million years ago that I realized that if you want a kid to be able to do

Arabic to Roman, and Roman to Arabic, and answer questions 48 about how the numbers are put together intact, you would do all of those things and you would keep extending and you would connect. You wouldn’t do decimals one week and fractions another week, and then be surprised if they didn’t understand how they were equivalent or not. And so teaching for transfer, and assessing for transfer, should be a normal part of instructional practice. And that is really a change from how most instruction is organized, and even I’ll submit because of the long effects of behaviorism even from what issues of fairness are considered to be.

There are contracts, implicit contracts with students about what’s fair to have on the test that argues against surprise. And so what you are negotiating, it’s very different from classroom to classroom, so some teachers establish an understanding of transfer and generalization that is different from what other teachers negotiate with their students. And in the class where its been narrowly negotiated there will be less transfer and knowledge generalization and there will be less deeper understanding, so my point in trying to make this cultural shift is to make it more common that you teach and assess the transfer.

And I’ve just shown you two things, this is from an MSEB publication, that just shows you two problems that would be a good example of parallel tasks, whereas as soon as the kids can do it one way you change the pattern 49 slightly so that they would have to think about what do I already know that helps to do this and what do I have to do differently.

Another one of these things that comes from the cognitive literature is the use of explicit criteria, this is part of the psychology underlying rubrics. In measurement literature rubrics are just a method to handle open-ended problems, they didn’t have how people learn underpinning. But if you connect it to the idea that internalizing the features of good work is part of learning how to produce that good work, it’s not just to debate with the teacher about scoring, it’s part of understanding the features of a well-written paper. And if we’re trying to model kids toward excellence, you’re trying to help them internalize those criteria, then please don’t make it just about grading.

So assessment is a strategy that helps with that internalization and there’s also then a literature that helps us see how when kids engage in self-assessment, not to save the teacher from grading, but to actually see what those criteria on the culture on the wall mean in the context of their own work, and also in trying to make their work better, then they come to understand the criteria and that’s the same thing as coming to learn these discipline area perspectives. And there are a quite a number of 50 studies, and I show you one on the next slide by White and

Frederiksen, this is also quoted in Knowing What Students

Know, where that was the intervention.

So in the control group kids just had more time talking about the quality of the materials that they had looked at and the treated group to study the effects of self-assessment grade their own work and had to defend why they gave themselves a certain score in terms of their work and they learned more. So on the outcome measures they outperformed the control group and the only difference was they had practiced evaluating their own work. And the best part of this, because there were quite a number of different outcomes measures in this, is that there were, this is a science class, less duplicate reports submitted in the class that practiced self-assessment. Now how many science teachers know what that’s about.

And lastly in my sort of running through on my fingers the different aspects of what we know from the literature is formative assessment has to also be used to evaluate teaching so as to improve teaching so as to target it, modify it, adjust it when we see what kids know and understand, but also to convey to students this sense that being evaluated is what we do to do our work well, it’s not threatening, it’s not embarrassing, it’s part of changing this cultural expectation. 51 I have just a few more slides and I think about five minutes, so let me go a little bit quickly through this last section, but I do want to highlight for you that there’s a motivational aspect to this as well as the cognitive literature that is so foundational to how people, to knowing what students know and how people learn.

There is a concern that our current grading practices are threatening to the formative assessment model that I’ve outlined. If tests diverge from valued learning goals they can make the same mistake as those external accountability tests and they often do. And when they do students focus on the graded portion of the curriculum.

And they do so in narrow ways. The use of grades as rewards and punishments also has a negative effect and can undermine intrinsic motivation to learn, and in the next two slides what I’ve done is give you from Deborah Stipek’s review, in the Handbook of Educational Psychology, a rundown of the features of extrinsically motivated students and intrinsically motivated students on the next slides, and what I’d like you to realize since some of us think oh, there are intrinsically motivated students and extrinsically motivated students, as if that was some innate characteristic they carried around with them or had learned from their parents, the important thing about this literature is to understand that these things can be 52 experimentally induced with very subtle interventions. And so if they can be experimentally induced in the lab, what do you think a teacher’s intervention for a year does to shape or encourage either more extrinsic motivation or more intrinsic motivation. So it is in our hands to think about this and do something about it.

Extrinsically motivated students work to please the teacher, to get the grades, to look good, and if you’re as old as I am you might remember John Holt telling us about how kids pretend to know and hide what they don’t know in such a stance. They focus on what Jean Lave calls the exchange value of learning. Performance-oriented students also in these studies pick easy tasks and are less likely to persist once they encounter difficulty; these are in experimental studies that are very well controlled, we can see the effect of having made them into an intrinsically motivated student. And girls are overrepresented in this category.

In contrast, intrinsically motivated students attribute success to their own efforts, work to become competent because that’s what feels good, they want to become, master something, they want to be good at it, they focus on the use value of learning and learning-oriented students are more engaged in school work, use more self- regulation, and develop deeper understanding of subject 53 matter. So motivational aspects are equally important.

In closing then, to be mutually supportive, formative and summative assessments have to be conceptually aligned, they should represent important learning goals using the same broad range of tasks and problem types to tap student’s understandings. Summative assessments should not be repeats of earlier formative tasks but should require students to use their knowledge in ways that generalize and extend what came before. Summative assessments should be thought of as milestones on the same learning continua that undergird formative assessments.

On the next slide I have too long a list to read of the things from the cognitive literature and from the motivational literature that we need to do to improve student learning. It turns out that the things like being specific about features of the task that need to be improved, which comes from the cognitive literature, is also a finding in the motivational literature, that that’s less damaging to ego involvement then when you do normative sorts of comparisons. So there are wonderful ways that these two literatures can be made congruent.

Lastly, and this is just a speech and it’s from the paper that I provided in your background materials, our goals should be to create a learning culture where students share an expectation with their teacher that finding out 54 what makes sense and what doesn’t is a joint project, a worthwhile thing to do and essential to taking the next steps in learning. Trying to have that be the feel of why we ask questions and why we ask students to tell us what their current understanding is. To do this we have to make assessment more useful and at the same time change the social meaning of evaluation in classrooms.

Thank you.

[Applause.]

DR. GEORGE: Thank you for a wonderful and very appropriate beginning. I am going to point out to you that you have many opportunities to interact with Lorrie during the rest of the program, she has another session tomorrow morning, she has a breakout session tomorrow afternoon, she’ll be on the panel sessions this afternoon between 5:00 and 5:45, but on the theory that there may be two burning questions I would ask her if she’d be willing to take two quick burning questions. Yes, Don, and you must identify yourself. And is there another one who could come up to the other microphone so we can save a minute?

DR. LANGENBERG: I’m Don Langenberg from the

University System of Maryland, I come from another field, far field, so I hope you’ll forgive my elementary question.

I’m still trying to learn the lingo. Is there a distinction between assessment and evaluation? Or are they 55 the same thing?

DR. SHEPARD: I have made a distinction between assessment and evaluation a million times to my science colleagues on campus and they can’t remember it, no, it’s jargon right, that there would be a particular, there are several different distinctions that people make.

Historically in the measurement literature people talk about measuring as if it would just be the quantitative summary and then to evaluate you would have to bring in some additional set of criteria about expectations, so there was a difference between measurement and evaluation because it was the valuing or interpretation of the basic measurement data that distinguish measurement in evaluation. Some people still observe that difference between assessment and evaluation. But notice when I start talking about and how are we going to get there and qualitative judgments into a particular features, an essay for an example, I’ve already begun to --

The other distinction that I tried to make with my colleagues on campus was another distinction that exists in educational research between assessment of student learning and evaluations of educational programs, and those are just habits, those didn’t have a definitional importance and I notice that most scientists use them interchangeably so I just try to figure out from the 56 context whether they’re trying to judge the effectiveness of a program; and by the way you use student assessments in your judgment of the program, but you do many more things in an evaluation of a program. For example, you do care whether high-achieving and low-achieving students are each benefiting, so that’s more then just measuring the outcomes of student learning. You care whether students who are engaged at the computer are only boys. So program evaluation is broader then evaluation of student learning.

But I just think we have to be careful with context’s to what we’re talking about as if we could always read on those words because too many people from too many areas are coming together, so let’s just think of all those types and not be too hung up the specific word.

DR. GEORGE: One other question? If not I’m going to thank Lorrie again for a wonderful talk --

[Applause.]

And I have unilaterally without consulting anybody decided that no normal human being can sit here until quarter of five without getting up so I am going to declare a six minute break, and we will start promptly in six minutes.

[Brief break.]

DR. GEORGE: I am certainly aware that there’s still people out there doing things and that’s okay but 57 we’ll keep going so we don’t interfere with the program too much.

Your agenda now asks what assessment issues MSPs are currently confronting. And we have two MSP teams and one other person up here to respond to that question and also in a way lead into Andy Porter’s presentation, which is next on the agenda. The two teams are from first of all

Stark County, Ohio, Deborah Poland and Wendy Williams, who are, respectively, the county science coach and the county math coach if my information is correct. And they will be followed by the Southwest Pennsylvania MSP represented by

Nancy Bunt and Kevin Kelly from Southwest Pennsylvania, and they are joined by Michael Kestner who is with the

Department of Education, has worked with the MSP program, and he is going to comment on the kinds of assessment issues that these MSPs either are anticipating or have found, because I think one of you is a cohort one MSP and one of you is a cohort two MSP, and Michael is going to I think comment on those as someone who is a practitioner of

Andy Porter’s tools, which he will introduce in the next session. So Michael is going to be the glue between these two MSP discussions and Andy’s presentation, that’s the theory. Did I get that right?

DR. KESTNER: We’ll see how it goes.

DR. GEORGE: Well, without further ado this is a 58 good example of trying to help you really put what you learned here into practice by having real live colleagues of yours talk about things they are finding or have found in hopes that some of what we have learned will be helpful to you as well. So without further ado I’ll call on Debbie and Wendy from Stark County, Ohio.

Agenda Item: What Assessment Issues are MSPs

Currently Confronting? Panel: Two MSP Teams Discuss

Assessment Decisions - Dr. Williams and Dr. Poland

DR. WILLIAMS: Before we start and explain this, please understand that this presentation was put together ten minutes after we arrived here, so keep that in mind as we speak.

My name is Wendy Williams, I was a math teacher,

I was in the math classroom for 28 years, and when our county got this MSP grant, part of that grant was to release teachers from the classroom and we now have the title of, we’re called county coaches. Our local districts are still our fiscal agent, the grant reimburses those local districts, but we are now housed at our county educational service center and we service, there are 18 school districts in our county and so we service those 18 districts as part of this grant.

The idea behind the grant was to improve math and science education and has a four-pronged approached. One 59 is there are urban districts, four urban districts in our county, and those urban districts, their goal is to close the achievement gap. And in doing so and in helping that particular process each of those urban districts have a specific math coach and science coach like Debbie and I but they only work specifically for that district.

For those of you that are into the Super Bowl tonight, we are from Canton, Ohio, home of the Football

Hall of Fame, and the large school district in our county is Canton City School District and it’s large enough that it has two math coaches and two science coaches working just with that urban district. And then we have three other urban districts within there and they have each one math and one science coach that works specifically with their district with a variety of initiatives that they have going on within those urban districts.

Debbie and I are county coaches, I’m a math county coach, she’s a science county coach. There are four county coaches, two math, two science. When we took this job, the job was not defined. We’ve talked over the last four months about what we thought we were going to be doing and what we are doing. We’ve ended up providing a lot of in-services, going out and working with teaching staffs on aligning content standards, as well as we’re now getting into the assessment issues. We’re from Ohio, and we have a 60 graduation test that will kick in next year and so we are in the process of going from a proficiency test to a graduation test, and all of those issues Debbie will speak to, particularly in what we’re dealing with there.

We also have another component of our grant that is through the colleges, the local colleges, and this is communication between colleges and public schools. That’s been the hardest component to bring about. We thought we would have money for these college coaches to get involved and they would flock to us, and they have not been flocking. We had hoped, and not to say that this part is not out there, it’s just slower at developing. We were hoping for more involvement of those college coaches to bring tutors into our schools, particularly with the urban districts, that was a particular tie that they are just building on now to get students at the college level to come to the public schools and tutor.

The fourth part of our grant is a business approach in which there is communication between the businesses and the public school. For part of that, last summer we had a particular person who develops internships and there were I believe 20 internships last summer where teachers over the summer were hired by a local business, although the business did not pay it, the grant paid for the two weeks that the teacher worked with that business. 61 Some of the teachers were hired to stay on afterwards but it was for teachers to see and develop lessons dealing with business in the real world context.

Now during the school year, the second approach through businesses is a ten-week session going on where teachers are going out and each week they’re visiting a different business, and then from that they’re to develop lessons that they put into practice in their classrooms.

So our grant has those four prongs are out there and assessment is now really where we’re going next. Last year, I wasn’t released from the classroom until the beginning of this year, everybody else was released, well,

Debbie wasn’t either, she started this year. Everybody else began last January, I taught two AP classes so I couldn’t leave my class in the middle of the year. So we just began this year but everybody else started last year and it took awhile just to get it organized, which I’m sure you can all relate to. Getting the personnel in place and figuring out what you’re supposed to do and the forms and the papers can be a headache. So now I feel like we are really getting into the meat of accomplishing something.

DR. POLAND: We had in place the Seeds Program, which was a local systemic change grant for the elementary and the Saturn Grant, they’re both NSF, so this is just kind of a continuation in the science area. So we already 62 had a lead teacher format in place where each of our districts at least had one lead teacher, many of them had one per building, and that was a model that was expanded to the other areas, so as of two, three years ago we had the

MVP Program for math and it also went to the Emerson for language arts and the social studies, so this was a model that was developed through an elementary program and has really expanded through the county.

As Wendy was alluding to, we’re ending our Ohio proficiency test in fourth, sixth, and ninth grades, we’re in the middle of changing to the tenth grade, Ohio graduation test as well as varying levels of achievement and diagnostic tests. So as far as our high-stakes assessments we’re kind of right in the middle and of course the teachers are in distress. Especially at the high school level the format is going to change from a strictly multiple choice test to more open ended, and this was a big stretch for many of our junior high and high school people.

Wendy also mentioned that we are basically in the end phase of the alignment and we’re doing a lot of curriculum mapping, so as part of that we’re looking at the assessment piece. We have districts, buildings, teachers that are all over the place from just starting to dabble into assessment to already having common assessments and are doing test item analysis and those types of things. So 63 one of the problems is getting everyone on board, getting everyone to that same level, and providing opportunities for all the different levels to proceed.

We do have DSL, which is Data for Student

Learning, it’s a county-wide database, which is a great tool for data analysis, and a pinnacle is the software that we’re using and we’re looking at how can we build the standards into the pinnacle system so that we can align the testing with the standards, and that would allow the teachers to have more assessment date, more formative data along the way.

DR. WILLIAMS: Pinnacle is our grade book, it’s the electronic grade book, and not every school in our county is using it yet. If you’re familiar with electronic grade books when you put it out there for a whole district, there’s a lot of issues with that, and that’s just kind of getting ironed out now as we go along.

DR. POLAND: As the coaches, we were trained through Nancy Love using Data Project, which was really valuable, as far as getting a process to analyze data, give some common terminology, common processes to use, and we completed it last summer and are having some ongoing sessions with that group.

Within the county there was an initiative three or four years ago to get some Stiggins training and that’s 64 the kind of assessment model that has been pretty much adopted throughout the county and we have a couple of administrators who are every year getting ongoing sessions so we had core groups each year that are taking that information back to their districts. So in some districts all the talk of formative assessments and those types of things is light years ahead of some of the other districts who did not participate in that initiative.

Through our other projects that we’re working on we’re doing a lot with looking at student work or critical friends, whichever term you want to use there, our differentiation that we’re working on. We have a new teacher network where we’re targeting first- and second- year math and science teachers to bring them on board, get them off to the right start. We’re using assessments as an integral part of that.

And leading into what was alluded to earlier, the grading issues, we’ve had some sessions with McTighe and

Guskey on the grading issues, so that’s something that’s on the forefront as well. And looking at how to if we’re aligning all of these assessments with standards, how do we use that information and how do we communicate that with and to the students and parents.

I’ll just end with the problems, which are that we have so many people at different stages of their 65 understanding and implementation, looking at how you balance between what the state is saying with OGT and with what we know is good assessment for and of learning. And with our study groups in connection with the Seeds and

Saturn and now into the MSP, this is volunteer participation so for the additional things we’re not getting the message to 100 percent of our staff.

DR. WILLIAMS: To clarify a little bit more of what the study group and lead teacher meetings are, there is a planning committee in each of the disciplines for math and science and English and social studies of about five or six people that are leaders in our county. They plan the lead teacher meeting. The lead teachers, theoretically we have a lead teacher from every middle school and every high school for each of those four disciplines, so for our math lead teacher meeting it meets once a month, we have about

55 people there. It meets after school, 4:00 to 6:00; teachers are given stipends from our MSP grant as well as all the workshops that we put on during the summer teachers receive stipends, and when we have workshops in the evening after school teachers receive stipends from that, so we are extremely fortunate to have money to pay teachers to come to those things.

Debbie and I come from two different backgrounds.

Hers is more of an elementary and middle school background, 66 mine is high school. I have learned so much from working with her because, as I listened to Lorrie talk about formative assessments, I believe strongly that’s foreign language to our high school students. They are focused on content, they are focused on a test, and if kids don’t get it, well, son of a gun, they don’t get it, on we go. And we’re going to pay the price for that with our new graduation test. So for me, as a personal goal getting the idea of the formative assessments and all that Lorrie talked about, that is going to be crucial for our high school teachers because I just don’t see it. Debbie’s background with more elementary, she says because it is more developmental, teachers have maybe been brought up to think that way. I don’t believe that’s true for high school teachers, they teach unfortunately the way they’ve been taught. So that is a big problem I believe.

DR. POLAND: I think even though it’s a middle school, we have ways of gaining a lot of data; we’re doing quarterly assessment, those types of things, some of the staff here are at the stage, okay, we have this data, what do we do with it. So now it’s looking at the differentiation, looking at the intervention strategies, so that is a really key piece because we have people who are ready, we have those things in place, we have the data, so what do we do with it. 67 One last thing I want to mention. Because of this initiative, we are starting a material center, which is really exciting with math and science materials. We’re starting and targeting the middle school level with more inquiry-based instruction and then we’re going to be moving toward both ends and that is going to be required staff and professional development before they receive material and it will be ongoing support through sessions, through co- teaching, or whatever, so a lot of things have to be put in place to provide that support and ongoing development.

And just lastly as we’ve gotten into more of this it seems like the more data we gather the more questions we have, so those of you who are at the initial stage and you’re thinking okay, here’s the data that we want, we’re going to get the answers and we know how to fix it, it just doesn’t work that way. The more you get into it, the more questions you have.

DR. GEORGE: Thank you, we’ll cross the state line now --

Agenda Item: What Assessment Issues are MSPs

Currently Confronting? Panel: Two MSP Teams Discuss

Assessment Decisions - Dr. Bunt and Dr. Kelly

DR. BUNT: We are close but not that close, I’ve often wished we were in Ohio. But we’re in Pennsylvania actually right now and we began in 1994 with the formation 68 of a math and science collaborative, which were districts, we have all these local control school districts that individually select their curricula, but knew that it made sense to come together about some things. And they identified in 1994 that they wanted to do that and the math science collaborative was formed.

And in 1999 we participated in the TIMSS (Third

International Mathematics and Science Study) benchmarking projects as though we were a country and we did it as a workforce region, and we didn’t say who wants to do this, we said these 11 counties make up our workforce region, our corporations paid for us to do it, and then Westat picked

50 places that 8th grade math and science were taught out of a hat, not really, but I mean out of their computer and we had to go out and try and persuade people that they really wanted their 8th graders to take those tests. And we were lucky enough that we really did meet all of the very, very strict criteria that TIMSS places on being able to report that.

And our interest in that was because in our work from 1994 on we had been very excited about what insights we felt the national TIMSS had brought to our region. Now

I don’t know about your region, any time anything it’s said that’s something national it’s like oh, but we’re different, they must not have gotten everybody or we would 69 be something else. So we did TIMSS benchmarking to bring it home to our region and we learned that we’re just like the nation, that in a matter of fact we’re a mile wide and an inch deep in terms of our curriculum, and actually the chart that they did for us on our curriculum when you’re doing all the topics and all the grade levels had a dot in every box at every grade level. So we can verify for you that the mile wide inch deep is here and present.

And we also verified that the focus on procedure rather then deep conceptual understanding that I think

TIMSS helped us see is also something that’s really present in our area. And I found most interesting out of Lorrie’s talk this morning that the challenge we’re facing is that with the No Child Left Behind mandate, which gets the attention of educators in a way that we perhaps didn’t have their attention before, can end up having very unintended consequences if we aren’t able to persuade our educators that it’s the in-depth learning will enable kids to do well on any test. And that’s the message that I’m wishing and wanting to track Lorrie down and see if she’ll come talk to our folks because that’s not what teachers are thinking right now.

And what we’re looking at in our state is we’ve had a change in administration, a year ago we got a new governor and a shift of focus, which you know the way most 70 teachers feel is well this too shall pass, so now that we have a different administration does that mean standards still really matter, and what will the assessment be like.

And again, their having the No Child Left Behind at the federal level has built some sustainability in there.

So the issue that we’re looking at, and I should explain that our math science partnership is comprehensive, we just got the grant this year, but the math science collaborative has been working to bring people together for a number of years. We have included in our

NSF partnership 40 school districts, Pittsburgh, while it’s the center of our metropolitan area is a supporting partner, not a core partner, they had their own prime plus

LSCI (Local or Urban Systemic Change Initiative) urban systemic initiative change so they’ve been a model for us and actually moving in great directions and we’ve accessed a lot of their knowledge about what to do. But if you picture we’ve got these 40 school districts out of our 138 in the region who have agreed to be leadership school districts, meaning they’re tracking their data, we are supporting them in participating in our interventions. And in exchange for that we’re counting on them leading others as we go along.

We are located in the Allegheny Intermediate Unit, which is like Stark County’s Service Agency, we have a 71 state department of education and 501 school districts in

Pennsylvania. And our intermediate units serve as the intermediary between those two so that we have as core partners four intermediate units and we will in years four and five move into two more. So our interest early on was looking at sustainability because we’re not going to be in the business of drive-by-professional development or a drive by project, we’ve watched too many come and go. So we are very minimally staffed on this front.

We were having interesting conversations with Stark because our model is that we are bringing together teams from each of these districts in what we call a Leadership

Action Academy. And they’re the ones that are looking at their district-wide data and that are doing strategic planning about how to use the resources of this MSP to strengthen math and science instruction. So we have focused very heavily on data, so then we look at what data do we have, and the only data that cross all of these school districts are our state assessments, and we have in

Pennsylvania right now the math PSSA (Pennsylvania System of Student Assessment). We were going to have a science one field tested this year but with No Child Left Behind not requiring it until 2007 they’ve now put off developing the science assessment until then. So as an MSP we are going to use the TIMSS assessment, partnership with the MSP 72 that is in Ohio and Michigan, not Stark County but the other one, and we’ll be administering that in the spring.

But we started last fall, we brought out leadership teams together in September for a full day of what we call data mining and we really used Nancy Love’s work as well to look at that and ask our districts to look at themselves on several different measures. One was the state assessment and the staff that we did hire we call MSP coordinators, and Kevin Kelly sitting next to me is one and you’ll have a chance to hear from him. We call them coordinators not coaches because there’s no way that they could be coaching in 40 school districts. What they’re doing is coordinating the partnership and supporting the teacher leaders who we’re counting as our major change agents within the districts.

But they worked with these teams using their PSSA score results to do bar graphs where we looked at breaking it down by standards and by subgroups so that they’d have an idea of not looking at a plain score but seeing how the test was weighted differently, and wonderfully our PSSA was very heavily weighted in problem solving. And most of our districts not surprisingly ended up being lower on that problem-solving end. We also had a number of open-ended questions on our PSSA and that, too, is where the districts were having trouble. 73 So that allowed an opening for us to say hmm, trouble with problem solving, trouble with really getting into those open-ended questions, you have to understand that. Because what TIMSS had shown us as well about our teaching was that we were focusing on isolated procedures, which I think comes back to the good example with only being able to do it one way and that direction, and heaven help you if they ask you to start at the other side. It was focused on procedure and focused on the memorization, the vocabulary being most important.

So data needs, we had them look at those graphs, we had a series of questions we asked them to reflect on as a group. We had them do what we called the SWOT Analysis, the Strengths, Weaknesses, Opportunities, and Threats. One piece of evidence that they were to look at was the PSSA scores. Another piece of evidence was the science field test that had been done statewide that we could look at the state results and it could be broken down by bars of topics so again it was getting them more into the content.

The third thing is something that really is based very much on the concerns-based adoption model. We developed from our own experience what we called a district development matrix that moves from the idea of becoming aware, initiating, implementing, and institutionalizing, and moving through different levels of first being aware of 74 that, then committing to it, and then taking action on it.

And I have journals, 25, that Kevin brought today, where we include that district development matrix in there and we asked each district team to place themselves on that matrix at the elementary, middle, and high school level in mathematics and in science.

The last thing we asked them to do was to gather information on district context, and that’s who does your professional development planning, what kind of time do you have accessible to you, what assessments are used, all of the questions that they’re going to need the answers to when they come back together in April and May to do the action planning for next year.

There are two other days that we are bringing them together and we call those days Network Connections, and they are major sort of mini conferences only it’s not a voluntary attendance, your district has appointed you as the leadership team to come together, and that’s what I want to see if I can get Lorrie to come. But anyway, we had one in October, one in February, and the agendas are in here, and our goal was to expose them to proven professional development tools that we can use by helping train teacher leaders in their use so they will go back and have that capacity within their districts to use them.

So one of those at the elementary math level is 75 the Developing Mathematical Ideas, DMI, which is a seminar series for teachers that I think gets into a lot of the issues we were talking about this morning. At the middle school secondary math level we’re lumping 7th through 12th together, we’re going to be using Westat’s Video Cases for

Mathematics for Professional Development. Again, we’re looking for modules that with scalability we’re not going to lose it, it’s not dependent on a wonderful presenter, but it’s a tool that those teacher leaders can engage their colleagues in discussing as they move through. In secondary science we’re using the S.C.I. Center at

Biological Sciences Curriculum Study’s (BSCS) National

Academy for Curriculum Leadership, which again follows the same building consistency.

Our goal is to have these leadership action teams identify two people from each of their districts to attend each of these academies we will be having in the summer and over the next two years, where we’ll be training them and using this during the summer. They’ll be going into the districts and doing it with their colleagues during the school year, we’ll bring them back for a week in the second summer, follow them for five and a half days the following year. That’s our teacher intervention part.

At the same time we’re using Lenses on Learning for principals and finding, I wish we’d had it to start 76 with all the way back in 1994, if we were we’d be way farther I think. But if you’re not familiar with it, it really engages principals in doing mathematics so that they discover the challenging and exciting nature of actually puzzling with mathematics, which most of us have lost or never had in the procedural way that we were taught. And as they discover that they also learn how to look for it in the classroom and it helps them know how to supervise and support their mathematics teachers in the classroom.

Our principals have found and have said to us that they see that applying to other disciplines equally well, that it’s a shift on what kind of student learning you’re looking for but the focus on student learning is a very different focus from the kind of procedural transitional management issues that most principals were looking for before.

So we’ve got those interventions at the same time so they’re going the two days in October and February to experience just these options in a morning so that they’re dividing up their team and are able to see what a DMI seminar is like, what a Lenses on Learning session is like, what a [define?] VCMBT session is like. And then they’re asked to appoint their colleagues and identify at least two, and we’re doing the redundancy model because we’ve seen too many times that one gets pregnant, retires, moves, 77 whatever, so it’s two from each district having the same experience, and they’re appointing them, they’ll begin, and we’re having them then come in April and May to play how they will find the time for these in district professional development experiences for the next year, so that’s when, they’ve done their final analysis, they see the opportunities that are available to them, now they plan how they’re going to use that newly found capacity in the following year.

So the data that we’ve been looking at in terms of their planning is having them look at those, we’re lucky to have a tool that was developed by the Allegheny

Intermediate Unit called the Comprehensive Data Analysis

System, which is a software that enables a district to put all its different pieces of information together so that you can look at attendance compared to your achievement test scores. You can look at your own achievement test scores compared to the PSSA. You can look at all of those different things. It’s just been developed and they’re just beginning it, so we’re subsidizing the use of the districts, an incentive for them to use that as an organizing tool.

I forgot to mention the other data analysis piece that we were doing and this goes all the way back from 1996 and 1997. When we began the collaborative we said, how 78 will be know whether we’re making a difference. And most of us felt like there weren’t any tests out there that we really thought told us. We were interested in the new standards reference exams but they never quite were ready when we were ready and then the state came in with their

PSSA. And we agreed as a steering council that the more students who find success by the time they graduate in a higher-level math and science courses will be an indicator for us because we know right now that, and we didn’t know,

I should say we asked the districts that, none of them could tell us, that’s not the way they were putting their information together, which comes back to a data question and the more that get raised. So we created an instrument that we simply asked the district how many students are there graduating this year, of those students that are graduating how many got a C or better in algebra 1, in algebra 2, geometry, in biology, chemistry, physics, and therefore you could work out a percentage of the graduating class who by the time they graduated were finding success in those courses.

And we have tracked that over time, again, we’re not able to mandate anybody doing anything so we started out with 44 districts that thought that might be a good idea to gather and now we’re up to about 85 who are gathering that. And we’ve gone from seeing that in 79 physics, we’re barely 20 percent as a regional average. In algebra we reach an all time high this year of 75 percent of our high school graduates having found success. That means one out of every four is not finding success.

So our challenge now, and especially with NSF asking MSPs to report on course completion, is how to manage that data, but we fed that back to each of the districts too with reflection questions for their own saying what do you think is causing this, could it have anything to do with the courses and how they’re being counseled into things.

So that’s our data that we’re looking at, we’re looking at trying to pull together now the message to districts as to keep paying attention to deeper understanding and it will get you better results in even these short-term tests. And most recently our PSSA is rolling out this year, and there was a session to let all the districts know about the new rules for administering it. And there were five questions that had been posed to the teachers to answer and one of them was so, if you know that these are the assessment anchors on this test and this is the content, how can you use this information. And we chose to try and offer some answers to that because our concern was that the way that they would think to use it is start teaching those procedures every day between now and 80 April when they’re going to be on there, and to try and make the case instead to not be the mile wide and inch deep all over again but to hold the faith with going deeper in terms of your content.

So I didn’t know I was doing this today, either,

I thought I was just answering questions. And I think I’ve talked too long now but I want to let Kevin have a chance because he’s been working with us; one of the tools that we’re using to give districts feedback is wanting to know the level of content comfort that teachers have with the various topics they’re being asked to do because we’re providing subsidy for content deepening seminars. And we’re working with, again the MSP out of Michigan to take part of their teacher survey to ask teachers so, what college courses have you had in the following, physics, etc., what is your level of confidence with each of these topical areas, and we’ve been in the last week trying to pull that together so that we can get it to the districts for them to use as a formative assessment, that we can give them the results back by April and May so they can plan which areas they want to build more competency in in their district, more capacity, not remedial, but saying hey, kids aren’t doing well in physics, physical sciences at the middle school level, we need to send two teachers to learn about these particular things. 81 So Kevin?

DR. KELLY: We were working with the TIMSS questionnaires and if you’re familiar with them you know that there are essentially two parts of them, one’s a curriculum questionnaire, the other one has to do with teacher confidence and their background. We felt that, as

Nancy alluded, we have a challenge in Pennsylvania because of the nature of the organization of the school districts.

Curriculum is a big part of what our MSP is about. We have that challenge because if you have 501 school districts in

Pennsylvania and some 143 in our area that means you have

143 different answers to the question, well, what do you teach your kids. And it’s true, so that’s why you end up with the TIMSS information that you get where they’re teaching everything all the time over and over again.

That’s fine to know but how well are the teachers prepared to teach that becomes another question we need to answer and thus the questionnaires.

The science curriculum frameworks and math curriculum frameworks that we have developed are our answer to that problem of the 143 different answers to the question. The science curriculum framework is really a vertically articulate K through 12 science framework that really focuses on essential learnings rather then have the mile-wide and inch-deep problem that TIMSS tells us we 82 have. We focused on a very few number of essential learnings in each of the grade levels and sought out really scientifically based research-based instructional materials that foster deep understanding of those learnings and our work is really to direct the teachers in our MSP toward the use of those materials, how to use them correctly, and feeling that if they use those things two things will happen. First one that we won’t have 143 answers to that question, we’ll have one, so there’s a better chance for collegial dialogue, people will be on the same page. And two, hopefully that students will have the benefit of that unified vision as they approach it.

I’m the science teacher, I spent 20 years teaching chemistry in Pittsburgh Public Schools so I’m going to speak from the science perspective. The math teachers are also doing that as part of our MSP, and the math curriculum framework also does that same thing.

The questionnaires, the TIMSS questionnaires really are designed to have us work with the teachers in identifying those areas of strength and weakness in their curriculum understanding so that we can develop these content deepening seminars for them and we feel that one of the things that folks find out is that the deeper their understanding of content the better they have an opportunity to develop that deep understanding in their 83 students. So if there’s a big deficit in one area, we want to find out what it is and we feel that the questionnaires will allow us to do that. And then fortunately we have a real short turn around time because we’re hoping to get that back to them in April so the questionnaires came out, we’ve been crunching them, even as we speak there’s somebody working on getting them out there so that we can distribute them on February 12th.

Gathering that information is going to be the challenge; it always is when you’re dealing with as many districts as we’re dealing with. As you said, not everyone is on the same page developmentally, our district development matrix tells us that as well. Directing people’s attention to the importance of data gathering is something that we have to work on so that we can have a meaningful sample of data to work with.

DR. GEORGE: Thank you very much, we’re going to turn now to Michael and hope he can discuss some of the trends within their thinking about and -- also these people may not realize we’ll be back at 5:00 to be on the 5:00 to

5:45 -- right on the agenda, so you will have a chance to ask them questions then. So Michael, we’ll hear any other comments you may have and we’ll move right into Andy.

DR. KESTNER: Are you giving me about five minutes? 84 DR. GEORGE: ’m so generous I’m going to give you seven minutes.

Agenda Item: What Assessment Issues are MSPs

Currently Confronting? Panel: Two MSP Teams Discuss

Assessment Decisions - Dr. Kestner

DR. KESTNER: I think one of the reasons they’ve asked me to come sit up here is I’ve had a little longer to deal with these kinds of issues then the people who were just starting the MSP projects. I came out of a state education agency where back in the last 1980s we decided to make a difference in our curriculum and standards and that’s North Carolina, I came out of North Carolina where I headed up the math science curriculum area for the state for ten years.

And in the late 1980s we decided that we wanted to move away from arithmetic and more into mathematics, and at that point we moved into an era of standards and accountability, and so we had been at this for a long while. And one of the first things we learned was that alignment was a key issue, and when we started out standards and accountability, we need to align those. So we moved away from a California Achievement test and designed our own test to match up with the curriculum standards that we set.

Now whenever I went to national meetings and 85 talked to colleagues from other states, I knew that in

North Carolina we were in a different situation than a lot of other folks because we did from the state level put grade-by-grade objectives into our standards. And as a result from that, our accountability became a grade-by- grade testing program. And so when we put that out into states everything happened like you would expect. Oh, you can’t expect us to do that and those people in Raleigh got better resources then those people down on the east coast of North Carolina. But the first thing we found when we started putting assistance teams out in the field to help those low-achieving districts in schools was you had to pull those standards off the shelf, get them out of the shrink wrap, and let those teachers know what was being expected. And then we found remarkable results on this assessment end of things when people started teaching toward that.

Another problem arose though; we found real quickly that there was something missing, and what it is that’s missing between standards and accountability. It’s the instruction that goes on between and that’s what Lorrie led us out with and even Sally’s pyramid had that piece involved there. And so we found out real quick if we wanted to ratchet those standards up, in North Carolina we have a five-year cycle, every five years you review your 86 standards and you look to decide whether you need to revise them or not.

What do you think, in five years you need to review your standards? Sure, in areas like math and science, technology is moving so quickly that you have to.

When you get results out of TIMSS studies that say hey, the rest of the world is moving ahead of you, you have to change and revise to keep up. And as we ratcheted those standards up, what we found was that teachers in the classroom didn’t understand. I looked at the last quotes that Lorrie put up and she wants to have this culture of learning, and she’s included students in there. You got to hit the teachers first because the teachers are not there.

We’re moving too fast for those teachers in the classrooms and that’s where things like this MSP project can really help. The places where we made the quickest changes and the biggest differences are where those outside resources were able to come in and make a difference in some of our districts.

We created our assessments by pulling teachers in and writing those items and they would start with the objective from the standard and write an item to that objective. And what do you think those items looked like?

Back to those process and procedure things. Even though every time we would reiterate our standards we would put 87 terminology in there that said understanding, that really said those kinds of things, and have kids, the objective actually said kids would have to explain things, but what kind of items would we get when teachers came in to write them. That was one area we figured out real quick that teachers aren’t there with us. So what we decided to do was work with those teachers, and you have to get them to understand what the curriculum standards really mean and how to address those in their classrooms.

Now just like we said, those high school teachers are the worst people to work with. We started with our K-2 assessment program because we had legislation on the books that said you cannot use pencil and paper tests in grade K through 2, and so we started with a K-2 program, we created classroom assessments that were observational, we created profiles for every student that showed here’s a student, here’s an area that we expect in the standards, teachers can go and mark along during the year where students are on that kind of a topic. And the greatest use we got out of them was teachers would say we take these into parent conferences and here’s where we’re trying to work with your child and here’s where they are, and they have a continuum running from, well, the child doesn’t demonstrate knowledge or understanding at all to being consistent at that. So we took our K-2 assessment and we moved it up through 3-8 and 88 we created the same kinds of materials for those teachers.

And then in the high school, what do high school teachers want. You got an assessment program in place, they know that test is coming at the end of the year, what do they want? They want item banks. You give me an item bank I can teach to that item bank. So what we created were some sample items along with sample activities. So if you look on the North Carolina website and it’s not at the

Department of Public Instruction website, we actually pulled to the university that gave us a web page, it’s called Learn NC; you’ll find tons of resources that are created out of the department that are helping teachers to understand what the standards are. And if you can get teachers to understand what the standards are, the next step is for them to move along into finding out how to teach in a manner which develops that deep understanding of the content.

Now one of the things that happens is teachers want materials, so we tried to create materials and it’s hard as the dickens because half of our time at the state level was spent working with the testing people rejecting those process items that were there. We wanted to make sure we hit the breadth of the objectives that we had. And that’s one of the things we learned by working through a consortium out of the Council of Chief State School 89 Officers, and that’s where Andy’s materials came into place.

We had a group of state-level math and science folks sitting around the table deciding how can we determine what goes on in the classroom. Standards are easy, you can look at standards, they’re on paper and pencil. You got assessments, you can look at those, those are concrete things. How can we determine what’s going on in the middle? And we started saying well here’s some kinds of things we’d like to know and so we started creating this survey. Well, the survey got to be five hours long and so we had to try to narrow that down. But

Andy came into the group and started working and he had a matrix set up and he’ll share that with you I’m sure. And it got to more then just a content mapping, or a concept mapping, it got into the levels of where you get.

Now, we got a product that we were able to go out and field test and pilot, and the way I sold that to people, they tell you it takes a little over an hour to do that survey. Well, I hadn’t found a group of teachers that it took less than two hours to do that. First thing I did was I went to them, I didn’t mail it out to them, and I sat there while they took it for two hours. Second thing I did was I paid them while they were taking that so they’d at least sit there and be conscientious about doing that.

And so through that process one of the most 90 significant things I found was that when teachers have to go through and process what they’re doing in their classrooms in relationship to something that’s written on a paper that goes beyond do you teach kids how to add fractions, then that causes a situation where those teachers start thinking in a different manner. And to me that’s the way I sold that was this is a conversation tool, and in one of the districts I worked with, it was a district where I’d built up a relationship because I had worked in that district, they bought into that, they took whole faculties and sat down and worked through this survey. They got the data back, they come back and they sit with the faculty, they analyze the data where it is.

Now the other thing I’ll say about data, when you look at state assessments and everything, what can you really glean out of that information? I’m going to tell you what teachers want. Teachers want a diagnostic tool, and I’m going to tell you what you don’t get with a state assessment, and that’s a diagnostic tool. So when you look at the data and you go down and you talk about the data mining, that needs to happen, that needs to happen at a school level, at a district level. But where the work needs to happen is with those individual teachers and with these kinds of tools that Andy brought to us so we were able to sit down with groups of teachers just like a lesson 91 study, you sit down with the data in front of them that they’re looking at what they’re doing in their classrooms, and it gets them to thinking about, well, maybe this does tie to our results on our state assessment, and we want to focus on, is this what we want? If it’s want we want, find and dandy, if it’s not, how can we get what we want. So that’s the kind of dialogue that goes on.

But the district I was referring to is Winston-

Salem, and of course they’ve got an urban systemic grant where they’ve got a lot of extra outside resources, they put this in place with their schools, they go through the data, they added on another piece, which is a walk through.

Now they’ve got the data in front of them what the teachers say they’re doing in their classrooms, and they go through with a walk through with protocol and they’re able to align that and match that up. So the more you can do with getting into those individual classrooms and getting to work one teacher at a time the more results you’ll be able to get in moving people toward teaching with understanding in mind rather then just those processes and procedures.

Now I was real skeptical in North Carolina because our scores on the NAEP you couldn’t put up there with Texas and Maryland because we showed same kinds of gains. I used to tell people all the time we were doing except 8th grade, there were three states above us in 8th 92 grade, I don’t know how, they all started with M, and they were along with the Canadian border. If we moved our state up and called ourselves North Carolina we could join them.

But people would always call me in North Carolina and say what is this ya’ll have done in North Carolina. Well, it wasn’t something that happened overnight, I got a call one time from somebody that said what’s that $500,000 dollars program ya’ll had that changed things? I said I don’t know what you’re talking about, it’s taken 15 years to get to where we are. We had a governor who didn’t pay his teachers well but he put money into schools and he was able to come in for four different terms. But the policy issues were a lot of what had to do with that as well.

And I’ve used up my seven minutes but that’s the kind of things that we found you needed to do because when you talk about changing education it has to be in the classroom, and when you talk about standards and accountability people are going to focus on accountability as those tests that are given. And it is a hard sell, we had a whole big initiative for five years dealing with classroom assessment based on those materials I talked about that we ratcheted up from the K-2 assessment. We started with administrators because we wanted administrators to do that. Now there are things like

Windows on Learning that are great. 93 The other thing I’ll say before Mel pulls the hook on me is when you’re working with teachers and you know they don’t know what they need to know it’s real touchy, and one of the best ways we found to work with teachers was to use student data. When they can see student work in front of them it’s not quite as frustrating for them to say I don’t know this, but they can learn a lot from looking at student work.

DR. GEORGE: Thank you very much, all of you.

[Applause.]

DR. GEORGE: There have been a couple mentions that this has been put together at the last minute and I want to acknowledge that, and the reason was that we waited until we got your survey results back to see what you were thinking about and questioning and then picked MSPs that might raise some of the same questions that you did. So I appreciate your being good sports about this and so we’ll see you back here at 5:00, another opportunity to serve your fellow human beings. Thanks so much.

[Applause.]

It will take us a few minutes to get set up for

Andy so I’m going to suggest we simply stand up, turn around for a few times to get your blood circulating and sit down again, and there is a break --

Thank you very much, it is our great pleasure now 94 to introduce Professor Andy Porter. Andy is a member of the steering committee that put together the workshop and is doing double duty as a major presenter. It’s a good thing that I looked at his bio because forever and ever he’s been at the University of Wisconsin and I would probably have confused him in that way, but he went south last summer and is how at Vanderbilt University in a named professorship and is also director of something called the

Learning Sciences Institute, which sounds fascinating in and of itself. But I know enough about Andy to know that whatever it is he’s doing at Vanderbilt is worth doing, so it’s a great pleasure to introduce him and the title of his presentation on the agenda is An Assessment Exercise, so I think you are going to get engaged in a way that you will find unique. So without further ado, Andy Porter.

Agenda Item: An Assessment Exercise - Dr. Porter

DR. PORTER: Thanks, Mel. Up until now I thought the toughest assignment I ever had was to talk about testing after the Reverend Jesse Jackson talked about his

Push Program through a convention of the National PTA, and that was a tough assignment. Now this is the only thing that really stands between you and the Super Bowl --

Also I see some familiar faces, a few of us were here over at the Hyatt doing the NSF Math Science

Partnership Meeting, so you’re very brave, or crazy, just 95 like me. So it’s good to have you here. I do one of those

MSPs as well, I co-direct the one in Wisconsin, so we’re all in this together.

Now like Lorrie said, well, she didn’t say this exactly but I extracted it from what she said. If you’re going to be around achievement testing the most important thing to know is what’s on that test, what’s this test about. And it’s surprising to me by the way that a lot of people will talk about tests and they’ve even taken a test.

And that just seems like kind of wrong headed to me. Of course I have to admit I love to take tests so I take a whole bunch of them. But I do think you ought to start there; but even taking a test, it’s a good thing to do but it’s not, you can do it in kind of a nonanalytic way, it’s possible to take the test and really end up still not knowing exactly what’s on the test, and perhaps more importantly what’s not on the test as well.

So what I want to do today is to give you some machinery to think about. Now I loved what you were saying and I have to say that some of the early words through this work are identical to the early words in the TIMSS work and the reason for that is that I had a research team at

Michigan State when I was a professor there and Bill

Schmidt who was on my research team at that time, so you’re going to see some similarities here. And one other 96 connection I make is you also mention Nancy Love and we’ve been working with her and Diana Nunelly to use some of those procedures so she focuses on how you can use data to think about your instructional practices and focused a lot on testing data and we’ve formed a partnership in putting with the testing data some data on instruction as well.

And that data on instruction will look a lot like your

TIMSS data, so you might be looking for some of those parallels as I go through this.

Now this is going to be an exercise, there’s a test in your packet, and Andy made this test but he made it by getting the public release items from me and just pulling together some so I figured if you haven’t taken

NAEP, here’s a change you can take a NAEP like test, so you might want to find those materials. And also in there is another thing that looks like, it says coding procedures for mathematics and science content analysis, so pull that one out as well. So after I bore you for a little bit with a little talk to get you started, then what we’re going to do, this is not the greatest room for it but we’re going to break into small groups and actually content analyze some of these NAEP items so you can get a real good sense for how this works.

I’m going to now talk a little bit about how you can measure the content of the test, or how you can 97 actually use the same tools to measure the content standards, or how you can use the same tools a little bit differently to measure the content of instructions. So that’s assessing the content of the intended, the enacted and the assessed curriculum if you want to think about it in those ways.

So before I do that, though, I want to recognize that, oh, I want to talk about then the extent to which there is alignment in content among status and standards and instructions. So knowing what’s on the test is important but knowing how what’s on the test relates to what you want to have on the test or what teachers are teaching is important as well.

But when No Child Left Behind came out we had a skyrocketing interest in being able to measure at least the alignment of the test, the content standards because it’s the law. Actually it was the law from earlier, but it didn’t get quite as much attention because the people in the Department of Education didn’t hold your feet to the fire in quite the way they do now. So there are a number of people who have procedures for this and I want to mention, my good friend Norm Webb and actually his office and mine were just adjacent to each other when I was at

Wisconsin, and he’s got a procedure, I have a procedure, and you can get access to that because we collaborate with 98 the Council of Chief State School Officers and also the

Intel Lab in Chicago. And I’ll give you some websites later.

And then there’s Achieve, the organization that got put together by the guy who was the CEO of IBM, what’s his name? So they created Achieve, it’s located right here in D.C., they had some procedures, actually Lauren Resnick worked with them to put their procedures together. And then American Association for the Advancement of Science

(AAAS) has got some procedures where they don’t look so much at alignment of standards but they do look at alignment of textbooks to standards, and they’ve done some pretty interesting work. And then the Council for Basic

Education. I see a lot of reference to their procedures; I have to say I never have been able to find a description of their procedures so I’m still clueless as to what they’re about.

I just mentioned a little bit, obviously I’m going to spend more time on my own because, hey, I’m up here talking, I get to decide what’s important and what isn’t. But I’m going to show you just a little bit more about Norm Webb’s procedures, and I’d say two, three dozen states have probably used these procedures now and he has these four things that he talks about when he asks the question of the test along the state standards. And his 99 procedure starts with a particular, so your tests or whatever, you have your standards, you have your tests, and you say alright, let’s look at these standards and let’s look at this test and we’ll see whether they have categorical congruence and basically that’s what you might think are the topics in the standards tested. And what he does with his procedure is he has some standards for how much alignment is enough to call it aligned. I have to say that I think those, well, any time you have standards they’re going to be somewhat arbitrary, that’s true for

Norm’s as well, but states do have to decide how much is enough, so he says if there are six items per standard that’s enough.

Depth of knowledge has to do with what I call cognitive demand or I think TIMSS calls it performance expectations, and, if the topic is linear equations, what is it you want the students to know and be able to do in their equations. You want to be able to pick it out of a line-up of things that aren’t linear equations, that’d be one. You want to be able to solve the linear equation, that would be another one, those kinds of things. So then

Norm’s procedure asks what the depth of knowledge consistency is, the cognitive demand and the standards the same as on the test. Range of knowledge corresponds, that’s do you test all of the stuff in the standards, and 100 balanced representation really talks about do you have a balance of things that are most emphasized in the standards, or are they most emphasized in the test. So what he does is he has experts go through, make these judgments and give you a report.

There are a bunch of people who have done different applications to this, Karen Wixson did it in

English and Language Arts, the Buros Center has done some of this, [define?] CRESST did as well. They’ve made little different sorts of changes; Norm doesn’t really ask whether the item is any good, like it could be a bad item because it doesn’t test what it’s supposed to test because its got a bad format or something like that, the Center for

Research on Evaluation, Standards, and Student Testing

(CRESST)CRESST people add that.

But Achieve, they call it content centrality, and again, it’s the same thing, there’s the topics, then there’s standards, the topics that are tested, performance centrality, again, that’s like cognitive demand, source of challenge. This is the idea again of whether you’ve got a good item or not, is the source of challenge due to the content or is it due to some extraneous part of the item.

The level of challenge has to do with the range of cognitive demand and the balances relative emphasis of the content. And anyway, the big difference here is that he 101 has teams of people do it together, not independently, but a group gets together and comes up with what I’d call a school solution and they report out in a narrative sort of a fashion. So I think it’s a pretty good procedure, it hasn’t been used as much as Webb but it makes a lot of sense.

The American Association for the Advancement of

Science Project 2061, like I said they focus on textbooks, they do a lot of not content sorts of things but actually they look at the pedagogy and they’ve got seven different dimensions of pedagogy so they ask whether the pedagogy in the textbook is like the pedagogy in the standards, but they look at content and they look at the accuracy of the content in the textbooks as well; do you have bad science in your science textbook is a judgment that they make and they’re aligning it to the NRC science standards, so hopefully there’s no bad science there.

And they use two-person teams and they have two- person teams, one practitioner and one, and a pointy headed professor type, and then the two work together to come up with an agreement and then you see whether the two teams agree. But they don’t take the average across the teams; what they do is they use 2,061 staff to say who’s right.

So just a little different.

Now my procedure is different then these, 102 actually I didn’t start this measured alignment, that wasn’t my idea, I was studying teacher’s decisions about what to teach and I wanted to have a good measure of what those decisions were. And Bill Schmidt, like I said, was with me and some other people, and we came up with, I just call it a language for describing the content, and we’ve done in math and science and just recently some people created a language for reading language arts as well. I didn’t think that could be done actually because I don’t think that reading language arts is really a subject but they did it and they like it, I don’t know. So I say more power to them.

But the basic idea here is you have a two- dimensional matrix and the rows are, I’ll just call them topics, different kids of math or science that we might or might not teach, and across the top these cognitive demands, like I said, it’s the different kinds of things you might want for any topic kids to know, be able to do, and so content then in this language is defined as the intersection of the particular row in a particular topic.

We can have about 150 rows; I’ve tried three cognitive demand, I’ve tried ten, three was too few, ten was way too many, so I think five here is kind of my

Goldilocks solution, not too many, not too few, it’s just right. But you can argue about what these distinctions are 103 obviously.

Now the language is meant to be exhaustive, all of the content that might be taught is supposed to be represented. It never quite is but we’ve worked on it, that’s the way to think about it, it’s meant to be exhaustive. So what you can do is you can use, and we’re going to do this in a few minutes and try your hand at it, is you can use this kind of language and you can take a test and you can say, well, I’ve got an item here, what cells does it fall into. Now you can put an item, some big fat items and some little skinny items and a little skinny item might fit in one cell and a big old fat item might fit in a bunch of cells, big old fat item might have ten score points associated with it and you might just distribute those four points across those cells. You get the basic idea. And then when it’s all said and done you can say what’s the relative emphasis in this kind of content or the other by dividing by the total score points and you can get some, we’ll walk through that. The way we do it is we have people do it independently, then we take the average across them and that has some nice measurement properties as you’ll see.

So anyway, that’s the basic idea. So I just don’t want to make you guys go cross eyed trying to see that 150 x 5 matrix so I just got a little 4 x 3 matrix 104 here and I’ve got a pair of them, one of them I call assessment, I just made this up and one of them I called standards, just to make sure we’re all on the same page here. The idea then is if you want to look at alignment you can have an assessment and standards, you content analyze the standards and the assessment and you get these matrices and proportions and across rows and columns are supposed to add up to one. And then you know you can just say, well, I’ll compare cell by cell these portions to see if they’re identical. If they’re identical that would be alignment, 0.3 and 0.2 aren’t identical so that’s not perfect alignment but you can see some cells, take the one over to the right, that looks like perfect alignment. So you can say I just want an index of overall alignment here and one is pretty intuitive; I just wrote down here at the bottom you take and go cell by cell, take the absolute value of the differences of these proportions so like in that circled one there the absolute value of the difference would be 0.1 right, and then you can sum them up and if you divide it by two and subtract it from one you get something that ranges from zero to one and it will only be one when there’s perfect agreement across all these cells.

I’ve fooled around with a lot of difference indices, I like this one because it’s so intuitive, you could just take these rows and columns and make them two 105 single long vectors and do a Pearson correlation coefficient. I’ve done that for a dataset and they correlate up to 0.9 something or other. That doesn’t prove they always are like that but they tend to be conceptually similar. And then I’d get some less intuitive ones, but that’s the idea.

Now you can measure, I mean you can go crazy with this alignment stuff, anything that you can figure out what the content is you can, what you do is with teachers you have a survey, and Mike was talking about it, and the survey is just mind bogglingly long, but you have them go through all this matrix. They go, they say did I teach this, did I teach this, you can pretty quick because a lot of stuff hopefully you didn’t teach and then if you taught it you say well, when I taught it, what cognitive demands was I trying to get across. So you can imagine what that survey looks like and teachers will do it. Actually all the teachers will do it if you do it at a faculty meeting and you do it at the beginning of the meeting, you hand them the survey and at the end you pick it up, that’s the way to get really great returns. The way not to get such great returns is to mail them. But even then if you’re persistent you can get 75 percent. And if you do a study like this and you don’t get 75 percent you probably didn’t do it well; I’ve done a whole bunch of these studies and I 106 always get 75 percent.

But you can have this horizontal and vertical alignment at the state level and your standards and tests agreeing, or you do a district test and a state test, do they agree, you get the idea.

Now I’m just going to move along because once you do this then you can display these data in some pretty interesting ways. These, I’ve got states B, D, E, and F, you probably don’t recognize them by those names but these are real states and I’m going to tell you right now Texas isn’t one of them. But what you can do is you can, see now since I didn’t start with the standard and then say does your test agree to the standard like some other procedures do, but rather I took the standard and mapped it onto a language and I took the test and mapped it up to a language. I could make comparisons across states as well as within states if I want. And that’s what this matrix shows, and these are numbers, these alignment numbers that

I mentioned earlier, and I just shaded in the main diagonal so you could say, well, how well does the state’s assessment agree with the standards and for state B that’s

0.37, are you with me here? So within state alignment is just that main diagonal. With standards-based reform the idea here is that over time you want alignment within your state and between states. 107 Now I will admit this, if everybody, every state’s standards were identical to every other state’s standards you wouldn’t expect that the main diagonal would be any greater then the off diagonals. But that’s not the case, states do differ in the content that they emphasize and I’ll show you that in a little bit. So you expect these main diagonals to be bigger. Are they? No, they aren’t, so right now you’d have to say standard base performed for these four states is not working the way we thought it would. The average is 0.4 versus 0.39, I just stuck the National Council of Teachers of Mathematics

(NCTM) standards in there for fun, and that average is about 0.39, too, so everything is about equal or less, not as well on mine. However, notice those numbers are just curiously a little larger, the reason for that is those states had grade-level specific standards rather then grade band standards, so they presented a little crisper target, it was easier, I mean what can alignment be if your standards say math, teach math, well then things can’t be too tightly aligned because you’re not being very specific about what kind of mathematics you want. So you get the idea.

You can do these different things. I’ve got instruction here; we had some states that put out these surveys and I asked well, how aligned is instruction to the 108 tests, a lot of people say you teach what’s on the test.

In fact when I got into looking at teacher content decisions, I started with that simple idea but then somebody else told me they taught what was in the textbook so I said well, geez, is the test and the textbook the same, I found it wasn’t. So I knew there was going to be a more complicated problem. Twenty-seven years later I’m still studying how teachers made these decisions, but that is another story.

Anyway there’s some kind of curious results, look at those numbers. There is a state where their instruction is fundamentally not aligned with their test. We showed the state these results and they changed their test.

Because if you look at NAEP, I put NAEP in there, and if you look at NAEP the instruction in tests, or state O is pretty aligned, maybe it was the instruction in all the other states. But you get the idea, you can use these data in different sorts of ways.

I was beginning to get nervous about whether alignment could ever be very large so I thought, well, notice by the way the instruction is more aligned to NAEP on average then it is to the state tests themselves. Kind of a curious finding. But take a look at this, I was thinking I couldn’t get any big alignment numbers but yes I could. What I did was, I said, well, how much is 109 instruction in one state and instruction in another state and yep -- instruction may not look like what’s in the standards but it sure does look like the same between Ohio and Pennsylvania. So anyway, I’m just trying to give you a feel for the different kinds of ways that you can use this.

But now if I can get everything down to one number it makes me really happy; there are two kinds of people in this world, simplifiers and “complexifiers.” I’d make the world’s worst anthropologist, I’m just not a complexifier, but it is nice to know what goes behind those numbers and you can get these pictures, think topographical map here, topics are north and south, cognitive demand is east and west, and we’re looking for mountains and valleys.

A mountain would be content emphasized and a valley would be content nonemphasized. Now Lorrie is probably going to nail me here so I’m just going to say that while every point in these topographical maps is absolutely accurate, the area between them is not because this is nominal scale data so you can do these, any of these little software programs, Excel or whatever will allow you if you got the proportions to build these maps. And I tried things that would be better, which is what you could do is have little bar graphs and your three-dimensional bar graphs and you have little bar sticking up, sticking up like a mountain, trouble is it gets so busy it’s hard to see the forest for 110 the trees and working with a lot of people, as far as I can tell people get a quicker sense of what’s emphasized from these topographical maps. Even though we all know it isn’t just perfect, but all the data points are perfect, that’s where the lines intersect.

Now you say I really see something going on in a number sense but I really want to take a closer look at that, well you can do that. Because number sense is just a general area and you’ve got a lot of specific topics underneath it so I can just give you a map of just number sense now and you can see, these are real states and these are real state standards so we have STE (auditory unclear) and math and the NCTM and you can say well, are you like a state, are you like the NCTM, well, it kind of is and it kind of isn’t. Of course you can’t find a single that will declare I’m not like NCTM but then I guess everything can be done in degrees, so some people are more NCTM-like then others. So use these maps when the answer isn’t one and you want to know why you can go and look and see content that’s similar and content that’s different.

So that’s the basic idea. Now back to the general idea of measuring alignment, you can look at alignment among instruction materials and assessments and standards, anything that you can figure out how to measure the content and get it down into this matrix with these 111 proportions, then you can look at the alignment and you can get these topographical maps if you like them. And we give those maps with that Ruth Love program to teachers, you can do things, you can even say what’s the content progression from one grade to the next, you got a lot of content redundancy; I can tell you in elementary school mathematics that you do that unless you’re different then everybody else in this country. The message to kids must be don’t learn it next year because you’ll be taught out of your school next year, they get the same thing all over again.

But I’m editorializing there.

Now people look at alignment, my stuff is all based on just content and you know what I mean by content, topics like cognitive demand, but other people have looked at other things as I mentioned and one of them that I talked about is the consistency of the philosophy. I wasn’t exactly sure what they meant by that but what is pretty intriguing to me is the philosophy of the standards represented in what you would infer is the philosophy, I think that’s a good thing to worry about, and I’m sure my procedures don’t do it. If there, should we have independent individual experts or teams, it is good you know to ask what the quality of these data are and you have four raters and you take the average across them -- there you go, the reliability of that average is right around 112 0.85 give or take a few points and it’s pretty much like that for tests and it’s pretty much like that for standards. Now that shocked me, I thought it would be lower for standards.

And by the way, this is pretty cheap to do so you can have nine of these folks doing this if you want but you’re not going to improve your reliability, you reach diminishing returns after about four of them. But I will tell you this, in all of the times that I’ve done it when

I’ve looked at the individual raters and kind of looked at their data, there’s usually one odd person in there, kind of marches to a different drummer, I would say, I’ve not done this but I would say throw that person out, it doesn’t hurt or help your reality though I suspect it’s probably going to help your delivery as well. But in any event.

Like I said with my procedure, the standard for how much is enough isn’t in there although I will say this, if alignment isn’t any better within your state then it is between your states probably you ought to work on getting a little more alignment.

Now there are lots of issues here about when you’re trying to measure the content of instruction you want a language that uses labels that teachers find have a shared meaning, this respondent thinks the labels mean the same things as this respondent does. And that’s very 113 difficult to do. And then you have to say what grain size, how much detail do you need, here by the way the answer is a sad one, if your goal is to predict gains in student achievement you got to go topics by cognitive demand, you can’t throw one or the other away. If you have a data set that has topics and cognitive demand and if you’ve got gains in student achievement, you can then measure the alignment teacher by teacher of their instruction to the test and use that as a variable to predict gains in student achievement. And when you do you’re going to be surprised, between the 0.4 and 0.5 category, which argues as you’d expect it, that what is taught is a strong predictor of what is learned. But if you just take the data and say I’m only going to look at cognitive demand and see how well I can predict gains in student achievement, the correlation is just about zero. But what is really surprising to me is if you use just topics the correlation is down around 0.1.

So unfortunately you’ve got to go the whole nine yards and use both of them.

You’ve got to recruit and train good experts and we’re going to do that in just a minute, you’re all going to become good, you’re recruited, you’re going to be good experts at doing this. You can take this skill, it’s great. But when you get to measuring, the hardest thing is to measure instruction and there are all kinds of issues 114 here. How frequently should you ask people, do you ask them once a year or should you ask them every day, and obviously the more frequently you ask them the better the information you get but the more irritated they get at you, so you have to figure out what’s the right thing to do there.

You actually have to say when you’re measuring the content of instruction what is the period that I want to be describing, is it the full school year. And now let’s say I’m going to do it, I want the full school year but I’m only going to sample half the days. You know for this stuff that probably isn’t going to work because if you look at that matrix and if you define content as a particular cell it means all content is going to be relatively rare event. Let’s say you sample half the days, you do a block in the spring, a block in the winter, and a block in the fall. Now you’re down in elementary school and maybe geometry is taught for two weeks consecutively.

You might miss the whole thing, you might miss the whole thing, so you have to be careful there.

Well, I’m just going to skip over some of these things. And then if you’re interested in this stuff and want to know more there’s some websites that you can write down.

Now what I want to do with you, what I think, I 115 hope you’re going to find it fun, is pull out that test, pull out the test and pull out these instructions, and what the instructions have in the back of them is they actually have one of these languages and find the one that says high school math, and then on the very last page it gives you some finer definitions of these cognitive demands of student performance. If you go to the very last page, you see that there, it says expectations for student performance in math, you find it? Does everybody find this thing? We’re going no where if you can’t find it.

Now go to the last page and there you’re going to see, read that, because that’s going to tell you a little bit more. If you didn’t know what we mean by memorizing facts, definitions, and formulas, that will give you a little more information on that. Just read through those things.

Alright now pull out the test, these are genuine

NAEP items so, if you were always wondering what NAEP items looked like here, they are. Now why don’t we make three- or four-person teams, so get yourself squared away and we’ll just do this so you can see how easy or hard you think it is and why don’t you just do, why don’t we just do item one, we’ll start with item one as a little warm up item.

Now does everyone here understand what you’re 116 going to try to do? You’re going to go to high school math, there’s this big long list of topics, and then you got the cognitive demands, you’re going to take that item one and you’re going to figure out what topic is the cognitive demand. Just make sure that everybody knows what we’re doing here. After you go through the text you’re going to find a list of high school math topics, it’s on page three or four. You find that? Alright, now here’s the task. You take item one and you go through that big long list of topics and you say this is the one that I think it represents. And you also then go to the last page which is cognitive demand and you say which cognitive demand. And just control personality types, we ordinarily limit it so you can only put things in up to three cells, because some people want to put an item in all cells, so you can put it in just one if that’s what you think it is or you can put it in up to three and that would be a particular topic by a particular cognitive demand less a cell. So that’s the catch.

Let’s check in to see where we are right now. Is any team bold enough to tell me what they thought -- for item 1? Anybody ready to do that? Okay, what do you think?

PARTICIPANT: 203, which is convergence --

DR. PORTER: They picked 203. Anybody want, did you stop there? 117 PARTICIPANT: No.

DR. PORTER: Okay, 203, and what did you say for cognitive demand with that 203?

PARTICIPANT: B to C.

DR. PORTER: B and C. Okay now this is the point because that’s two different cells, it’s 203 B, that’s one piece of content, and 203 C, that’s another piece of content just to make sure we understand how it works. Now if they did that I said you could only pick three, you kind of have to say what is the most important stuff here, otherwise it kind of gets out of control. So what’s your next one?

PARTICIPANT: 103 which is fractions and C.

DR. PORTER: 103 C. Is there another team that wants to say what they thought or do you just want to say yes, that’s basically what we got too?

PARTICIPANT: I thought it would be C and D.

DR. PORTER: C and D.

PARTICIPANT: What about 212? -- out of time.

DR. PORTER: What about 212 he said. You’re all looking at the same items right? How many hours are equal to 150 minutes?

PARTICIPANT: I put it in C because you have to multiply a fraction.

DR. PORTER: Put it in C because you have to 118 multiply a fraction. Now okay, it sounds to me like one, everybody basically gets the idea of the task, this is a good thing, I mean this is a good thing. And I think you’re probably also recognizing that there is some expert judgment involved, and you’re probably recognizing that even these simple-minded items when you’re measuring content at this level of detail can cover arguably several different cells. So why don’t you do another item, or have you already done another item? Or are we out of time?

I’m just going to say, now do you understand what you’re doing in this process is I might have been working with, in fact recently I was working with California and they wanted, they got their content standards and their assessments and they want some content analyzing. So what

I did was I got a bunch of bright people like yourselves up there in frozen Wisconsin, and they went through and they did this task and they turned in their sheets. Now, I take the sheets and I create the proportions in those matrices, so that’s where you are, you’re on the front end, I got four to five of you, I take the average across you and then

I create those proportions. So you’re at step one, what you’re doing is step one, but that’s the real measurement part of it and after that it’s all analysis.

So that’s it. We’re done.

[Applause.] 119 PARTICIPANT: -- from Duke University, it’s a pretty basic question, how long would it take to do something like this if I were to actually do some of the writing?

DR. PORTER: His question is how long does this take and for the normal one form of an achievement test, like the 8th grade math test that’s used in Kentucky say, it takes you about, it takes a person about an hour to do it, about an hour. So this stuff ends up to be pretty cheap, it takes longer for the content standards just because they’re longer, tend to be.

PARTICIPANT: I wonder is there a need to do any exercise, any training, before you get someone to do it? I think in our group it was maybe a little harder or a little easier then another group and I felt like it was beneficial to hear different opinions about what’s a C, what’s a D.

DR. PORTER: The answer is it’s good to use training, we found that training is a good thing to do.

PARTICIPANT: If you do it separately, at some point you want them not to collaborate apparently.

DR. PORTER: That’s exactly right. What you do is what you might think you would do. You get folks together in a room and you start out just like we are right here, and then you start discussing it and then people say, well, what exactly do you mean by this. And one of the big 120 discussions you want to have is what are the boundaries on these levels of cognitive demand, especially between C and the other ones; that’s the hardest distinction to make and have agreement on. So basically what you do is just agree that it’s going to mean X, Y, and Z. Now obviously by putting those little definitions under them I tried to push it in that direction, but you want to have some conversation. It’s good to have that conversation in the context of having actually content analyzing in both items so that you’ve had to struggle with those decisions. And like I said it’s also good to recruit people who are thoughtful, somewhat anal for this particular task is very good, you can see where I’m coming from on that can’t you, and also it’s very helpful if they know the subject, very, very helpful.

DR. LABOV: Our reports proceedings will be reporting things out verbatim so you want to think about the terms you might be using --

DR. MOSES: Debra Moses -- and we were working with question four and we came up with 201 C but what we thought was interesting is question four, the reason we went for it is the first impulse that any student would be doing would be the opposite of what the answer would have been. So is there any weight giving something a “trick question” aspect to it? It’s not really a trick question 121 but you can see, I guarantee that a student and half the room will go to 25 just looking at it without really reading the question. Is there any emphasis putting on there or --

DR. PORTER: No, good question though. In a way though that borders on that issue of is it a good item, which is such an important question. I believe, I’ve done a lot of this stuff of course, and I believe I’ve looked at items where I think I know what the content is but I also know it’s a bad item and that people might get it right or wrong for other reasons then knowing or not knowing that content. And this particular procedure doesn’t do anything about that. Of course what you’d like to do is throw it away.

DR. MOSES: No, I wouldn’t want to throw this away --

DR. PORTER: But if it is an item that is misleading in some way.

PARTICIPANT: Yesterday you talked about the importance of having alternate forms, talking about variation and types of items not really exactly the same but have about -- but the process here seems to be look at the test form that you’re using, that was just used or is going to be used, I mean -- you’d have to have all the alternate forms in front of you to get a sense of how will 122 they map --

DR. PORTER: You understand this perfectly, that’s absolutely right. What he would do, now let’s say you’re in a state, like Kentucky, and you want to understand the alignment say of your assessment program to your standards. Now let’s say in Kentucky you do a different form every year. What you would want to do is not just content analyze one form, you would content analyze all of the forms, and have a description of that aggregate form. And you would not expect, I was just saying to Lorrie a minute ago, you would not, if you’re a state, if you just do one form, even if that form is a really good form of the test and a very nice representation of the content standards, you would not expect the alignment in there to be one with the content standards.

And the reason is that any test, one form of it, is a sample of items from a domain of items, and the content standards they represent that domain and then you’re hoping this is a good sample. But it is nevertheless just a sample; however, if you had ten forms, and you looked at all ten forms together, now the sample is becoming more like the population. And at that point if you don’t see your alignment taking up on one then that’s something to be concerned about. So thank you for that question, David, it was helpful. 123 PARTICIPANT: Could you say a little bit more about how you do this process with content standards? Are we actually looking at each content standard and identifying in a topic area and a cognitive demand area?

It might already be defined in a content area?

DR. PORTER: She says, how does it work for standards. Now the first thing you do is you try to get the most detailed statement of the standards that is available.

And they do come in kind of macro and micro sorts of forms, and then what you do is you identify each specific objective or whatever, the most precise thing. And then you content analyze that. Now when you do the standards, we’ve never put any movement on for a particular piece of a standards, how many different cells it might go in because at that point you really are, you’ve got some pretty general statements that can cover a lot of cells. And what you find by the way is right away that some of these standards are very, very imprecise, I mean they are very general, but that’s the way it works. And that’s by the way why I thought you wouldn’t get a very decent inner rater agreement because of their imprecision. You could have knocked me over with a feather that actually people agreed. It’s vague but they agree on what the vagueness is.

PARTICIPANT: So what do you do about cases where there are maybe two different, sometimes very different 124 ways of approaching a problem? An example that just struck my eye here is question five, average weight of 50 tomatoes is 2.36 pounds, what’s the combined weight in pounds.

Well, one way to do that is simply to multiply 50 by 2.36.

The other way is to observe there’s only one possible answer, which is consistent with the fact that the first digit in 2.36 is 2, so it has to be above 100 pounds. And you might classify those two ways of approaching that problem in somewhat different ways.

DR. PORTER: Right. So you do both.

PARTICIPANT: So you do both.

DR. PORTER: Yes, absolutely.

PARTICIPANT: I have a question for you in terms of what you and your graduate students may have been doing.

Have you ever used this process and this method to compare and contrast K-12 assessments with entering undergraduate math and science assessments?

DR. PORTER: No, it would be a good exercise to do though. These procedures, you can use them in lots of different ways, and if you use them in some new way, send me an email, I’m just curious to know how people are doing it. I want you to make sure you recognize this one thing because of all of the nice properties I think this is one of the nicest ones, and that is you can get a measure of alignment of instruction to something else, whatever it 125 might be attached to the standards or the textbook, defined at the teacher level so that each teacher has his or her score on alignment. Now what this allows you to do is to define, you can now study between teacher variance in alignment. And you can say what influences the degree of alignment of the teacher’s instruction with say the content standards. And that’s very powerful, that allows you to start to address some questions about what kinds of standards-based performance and another kind of standards- based performance.

PARTICIPANT: I have a sort of outside the parameters of our discussion so far thought which is this is primarily we’re looking at this as a kind of policy and district level -- but have any thoughts about using this in a way as a formative tool, thinking about work with students and the fact that students are often better explaining this to their peers because they more clearly understand what they did to solve this problem, but using this kind of technique with students to hear how they think, what they think of the cognitive demand of these questions.

DR. PORTER: That’s an interesting idea and no I never had that thought before. I will say that we have,

NSF gave us some money to do a middle school math and science professional development program, this is this Ruth 126 Love thing, that what we do is the teachers fill out this content questionnaire and then we give them back data that gives them pictures of what they’re doing and what the other teachers are doing and then we see alignment to their state standards. And then we just give them this information and they can do whatever they want with it.

And it’s kind of an interesting intervention because it’s real hard to evaluate the effect of it because one goes this direction and one goes that direction, obviously you want to know whether in the long haul it improves student achievement and you can also say does it improve alignment of instruction to the standards and things like that, but sometimes they look at it and they say, you know what?

We’re not teaching any -- statistics, we didn’t realize we weren’t teach any -- statistics and we want to. And so what they do is they start to teach -- statistics. Well, if you’re just doing an achievement test for mathematics that can get lost in the shuffle, so you really would like to know, okay, this school decides they want to do this and then let’s just evaluate them against their own specific decision of where they want to go. It’s more of an empowering, I’ve never used this word, but it’s more of an empowering sort of thing, put this information in the hands of bright dedicated professionals and then they’ll go with it in all kinds of interesting directions. 127 DR. GEORGE: Thank you very much.

DR. PORTER: You’re welcome.

[Applause.]

DR. GEORGE: It’s now 4:45 and there will be a 15 minute break, after which at 5:00 Lorrie and Andy and our

MSP panelists and Michael will be available and the main item on the agenda is an opportunity for you to ask questions or follow-up on conversation and that sort of thing; there’s no real structure to this, it’s an opportunity for you to speak your piece and get your issue out before them. And even if it’s not to get an answer tonight but to get some of the other presenters to be thinking about it tomorrow that’s fine too. So we’re taking a break and then start promptly at 5:00.

[Brief break.]

DR. LABOV: You may notice I’m not Mel. I wanted to say a couple of words about the dedication of this committee to what we’re doing here. The reason that Mel is not going to be here and he won’t be here tomorrow is because he has a course, he’s president emeritus of the

University of Missouri -- he’s teaching a course for -- in mathematics and music, and he is going back to Missouri tonight hopefully getting through the weather to be able to teach his course tomorrow and then he will rejoin us on

Tuesday. 128 Another person who is a member of the committee who will remain nameless for now is from Boston and was actually offered tickets to the Super Bowl but is here.

[Applause.]

Just a couple of other things. A couple of you have asked about getting the complete citations to Lorrie

Shepard’s presentation. You have most of them in a paper that she wrote that’s already in your packet and she’s given us another one that we’re going to photocopy to have for you tomorrow, so the combination of those two should cover just about all of them as far as she can tell.

The other thing that I wanted to let you know is that in addition to the MSP work doing here we also have another MSP under the RETA program, which is actually a more traditional NRC study in which we’ve been asked by the

National Science Foundation to look at science assessments to help people think about implementing science, developing and implementing science assessments, not only because of the requirements of No Child Left Behind for 2007, 2008, but to think about these more generally so that we advance science education improvement. So that committee is working this year, there should be a report out later this year and there will be a series of regional meetings and we’ll keep you posted about that on our website so that you’ll know about it. We are not developing tests, what we 129 are trying to do is to develop frameworks and some design principles and we actually got three different teams that are developing different kinds of frameworks to give people the best opportunity to think about these things from your perspective and from your specific needs. So we’ll let you know more about that.

For this session what we thought we would do is first of all if our presenters and panelists have any brilliant flashes that they heard from other people, I should have said that, here is a chance to say that in the first couple of minutes, or if you just want to make a comment about something that anybody else said we’ll do that and then we’ll open the floor to questions and discussion. Thanks.

Agenda Item: Debriefing of Assessment Exercise and Participant Discussion - Dr. Porter, Dr. Shepard, MSP

Team Panel

DR. LABOV: Anybody on the panel wish to say something first? Questions, discussion? We’re going to ask you to identify yourselves again and I’ll get the microphone so hang on. There are also microphones on each side if you just want to go to those, it would probably make it easier.

DR. SHULER: This question is for Andy. I was intrigued by your correlation of standards assessment 130 instruction. One of the things that we’ve learned from the field is that state standards are more characteristically representative of a traditional --, the vast majority of it. How are you, or are you I guess could be the question, are you or have you been asked to assess the integrity of state standards with respect to national standards or the

TIMSS framework?

DR. PORTER: Yeah. And we, well I haven’t done the TIMSS framework but we have content analyzed the NRC’s science standards, we have content analyzed as you saw the

NCTM standards, and we can show degrees of alignment between various states’ standards and those national professional standards. We’ve done this in about I guess

30 some states now but, so you know, yeah.

DR. SHULER: Has that assessment in any way influenced them to change their state standards?

DR. PORTER: Well, I’m probably not up to date, I do know that that one state where their instruction was totally unaligned to their test changed their test, but no,

I think by the way, it’s my view, if you looked at those maps, I was focused on the procedure rather then the results, but if you looked at those maps you noticed that the standards didn’t do what we thought they were supposed to do. They were supposed to focus our attention on a few important things that we would take to depth and mastery. 131 And what you saw was basically if the languages represented all possible content, all possible content was basically somewhere in those standards. And I was telling some people last week but when the NRC standards came out I was asked to give a talk at a place called the Society of

Scientific Society Presidents, something like that, and it was a bunch of big time scientists, representing, and all of them were saying my science isn’t in there. So there’s a great push to just put everything including the kitchen sink into these standards and that’s a problem because obviously they’re meant to focus and if they’re not focusing they won’t work the way we had in mind. So you could use this, I thought you could have this language and you could have your committees and you give them 100 red poker chips, they can only put 100 chips on the matrix, no more, so you’re going to have to make some tough decisions.

I don’t know, maybe it should be like roulette, though, maybe you could put a chip on a corner, I haven’t thought about that.

DR. KESTNER: I’ll make this comment that in

North Carolina one of the things I did not mention is that there is a state policy that says we will align our standards to NAEP whenever possible, and as a result, that was my role to set the standards in North Carolina. Of course we brought teachers in to do that but Andy’s group 132 had done analysis of NAEP as well and even if all you do is look at that quick contour map and compare it to where your standards are you can readjust and we did a little bit of that.

DR. LANDENBERG: I want to make an observation, please forgive me because it’s not a very clearly articulated one. I’m Don Landenberg from the Maryland,

Montgomery County MSP. This was inspired by something in

Nancy Bunt’s presentation, she was talking about the 143 school districts that her MSP deals with. I was thinking about the contrast with my own; we’re dealing with one school district, which is an entire county. In Maryland counties are school districts, and this county has a population larger then my entire native state. We are dealing with all the high school science teachers, there are more then 400 of them. And I was thinking about the difference in the way those two MSPs have to function and that led me to thinking about what’s going to happen when the MSP is over and finished and we’ve done our 50 odd soon to be more grand experiments of various kinds and it’s time to take our successes and spread them and somehow you have to propagate it whatever you want to call it. It strikes me that that process is going to be very heavily dependent on scale, on institutional organization, on public policy, on politics, and on local cultures at all levels from the 133 local school district to the state to the national and I think it’s going to be very complicated and very difficult.

And I just wanted to observe that it’s probably time we all started thinking about that, what the NRC can do to help us, what we can do to help ourselves other then pray.

DR. LABOV: I think this is a great topic for general discussion if there are people in the audience who would like to say something about your own experiences with your MSP and what you’re planning to do that would be great. Anybody on the panel first?

DR. KESTNER: I’ll tell you one thing, I think that building leadership within schools and within districts is a key for a lot of these MSPs, and in my experience I’ve seen that that is something that can carry over whether the resources from external resources are gone or not if you build that capacity and leadership within a district. Let me give you an example. In one of our districts, North Carolina put money in for teacher assistants. One district decided they were going to take their teacher assistant money and buy math specialists for their elementary schools. Well, that money dried up, it didn’t dry up, it got put into a big ball of money and became more flexible for districts and so they no longer have those math specialists but those people that they trained are still there and there are champions within that 134 district who continue to do good things.

PARTICIPANT: -- One of the things that struck me doing the MSP -- Lorrie mentioned it and I’d like to have more discussion around that, you were talking about sustainability and change, it’s been really noticeable that the NSF really has not laid out what they mean by systemic change for education reform, it’s been very loosey goosey, it’s been interesting, I’ve had dialogues with some project officers and there’s a real reluctance to talk about that and as I’ve listened to different presentations by different MSP projects it’s not always clear to me that the sustainability might mean looking at an organizational unit how at the county level or district if you have to hit certain areas, and as I listened to projects I find one or two areas -- might be professional development and leadership and I’m looking at the policies and I’m looking at all kinds of -- it doesn’t seem to be pulled together.

I guess the question would be as the challenge I’m trying to take a look at, how you approach looking at all the different aspects of schooling that need change and how they’re all fitting together so that if the political winds do change, if there is a change in Administration there’s enough systemic change that’s occurred that will weather and there will be some continuity group of people. I guess if you could comment on that it would be helpful. 135 DR. SHEPARD: I can say a couple of things.

There is some cause for optimism and pessimism. The optimism is it is quite interesting that in my experience having watched from both the University and a state department level of experience many waves of reform, that the standards movement is the longest living so far of the decades that I’ve seen. It’s also the one interestingly enough that has a connection to a research base so my first experience when I was working in the California Department of Education in 1972 was minimum competency testing and had had no research base whatsoever except that it was a response to a test score decline that had been observed.

But the intervention was invented by legislators, it didn’t have some grounding in research. So the standards movement has a more solidified political base and it has a more grounded basis in research though there are many simplifications that occur that are dangerous, so it can be corrupted often because people don’t have a deep understanding. So that’s the good news.

The bad news is that it is as hard to change teacher’s understandings as it is hard to change student’s understandings, perhaps harder because the older the student the more there is that has to be not undone but rethought. When we learn we’re having a bigger struggle then a six-year old to learn, and I think an example was 136 given of the fact that it makes more sense to teachers when they actually can engage these things and see what it really means, which is exactly like the slide that I showed that said that when students do self-assessment they can internalize and come to understand the things that are expected and what quality work would look like.

So I don’t think that the politicians who are imposing change have any idea of how challenging it is to try to provide those kinds of opportunities. And where people can create those kinds of opportunities and where they can, to me sustainability means how many of your colleagues are willing to stay and do the same thing with you. So it’s finding communities where people make those kinds of commitments and where we can get past the isolationism of individual teachers.

The other thing I’ll say that’s encouraging is we are training new teachers to know how to do this when they come, and in the fall we hosted a workshop for legislators where they go to one on one, every teacher was paired with, every legislator was paired with a first-year teacher, and they got to interview the first-year teachers. And what they found out was that those first-year teachers felt very confident that they were the people in their schools who knew how to implement standards, who knew how to plan lessons that focused on standards, who knew how to do these 137 kinds of formative assessments, etc., and that they were actually then looked to by veteran teachers in their schools to make change.

So one thing I would say to MSP organizers is I hope you’re talking to your local teacher preparation institutions and I hope that those teachers are, I look for opportunities for our teachers not to have to go out by themselves, to go into a good induction program, to go into schools where there are two or three other like-minded folks and that to me is what sustainability is.

DR. BUNT: I realized when I finished my description that I hadn’t said anything about what we were doing with higher education, which is part of the wondering about that. We have four higher education partners that are smaller private colleges, all of whom are teacher prep as well, and I wanted to preface that in responding really to your concern about the NSF/MSP. I think the National

Science Foundation is showing wonderful restraint in not defining how each of us should end up approaching this and so you’re going to end up with a lot of different models and our models are all different and I think are intended to be different, and they have to be rooted in what we’ve learned locally.

And I think the sustainability question becomes one and I wanted to correct, it’s 40 school districts that 138 are part of our MSP but we’re in a region of 138, is rooted in drawing from the best research that has become more accessible to us so that we understand the idea of needing to build these communities that will sustain themselves.

And will sustain themselves like children get excited about learning and want to learn more, that if we give them an opportunity as educators to engage in a learning experience that excites them and they know it’s going to be applicable to their classroom our long-term strategy is we begin with

(auditory unclear )DMI in order to facilitate a curriculum that is the reason the colleagues are coming together and engaging in a common learning experience.

But we plan to move to lesson study and if none of you have read the Teaching Gap you should, it ought to be on the list up here; it’s not a National Academy of

Sciences publication but I think it lays out for all of us the challenges that we’re facing and it’s coming right out of research that shows how our culture is one that fosters the idea of not getting in that proximal learning space.

We believe we need to jump in and not allow that to happen, if any child is having difficulty we need to immediately solve it or we’re not good teachers.

And I think the engagement at looking at research that way and building communities that are building together so that that capacity, I mean I’m looking at 139 saying gee, if we help these teacher leaders go through this and when we say teacher leader we don’t mean math specialists being paid by somebody else, we’re not paying any extra for any of these teachers, they’re subs that are being supported for this training time, but when they go back and are working with their colleagues, the districts are finding the time and the money to do that so that we’re believing that we want to try to begin with the minimalist support externally so that we are building a change they can all sustain. And different folks will have different strategies but we’re not about going away, we’re about using these resources to build capacity that will continue to grow.

DR. LABOV: Thank you. Don.

DR. LANDENBERG: I’m inspired by what Lorrie just said. How do our 400 high school science teachers, about

10 percent, about 40 have been identified as master science teachers and they are leading their colleagues through our

MSP. I suspect that that group consists of a bunch of old geezers like me, long experienced senior geezers, the fact is more then half of them I think are very young and I think there’s a lesson to be learned there.

DR. EBERT-MAY: I’m Diane Ebert-May from Michigan

State. So based on what Lorrie and Nancy just said, let me ask you what the penultimate assessment would be. And I 140 have to make some gross assumptions, so the gross assumption is that every child effected by, some, most of the children, many of the children affected by MSP will go to college, some will not but some will. The majority of them I can say with a bit more certainty will go to state institutions in our states. So the penultimate assessment when they come into my large 400-person science class as freshmen I will be able to identify an MSP student from one who was not in a part of the country who didn’t have one.

And I say that philosophically but also whether this connection concerns me greatly between what’s going on and we have a history with the state systemics and all the other systemics that have occurred for the last 15 or 20 years in the higher education gap. But would that be an assessment you’d like to see?

DR. BUNT: I’d like to see there are an awful lot that are never getting into your classroom right now and never taking a lab-based science course, so the more that are choosing to actually enroll and take a lab-based science course, and I was having a conversation actually last night with my daughter in law who just told me that she’s been informed as a graduate student that she will teach the lab for the freshmen because she’s one of the few that could speak English, that could teach the lab. And I was trying to tell her what an important task she had 141 because there were so many who were taking it only because it was a distribution requirement, they were going to get through it as fast as they could and out with the least amount of work, and I think you, if you’re teaching those freshmen students, have a tremendous opportunity and I’m not sure that I’m seeing higher education change in the way of making it inquiry based so that they’re learning the love of science and that they can do it at that point. But yeah, I hope there are lots more, but I think I’m not sure you’ll be able to recognize our students as much as you might hopefully have to change the way that they’re being taught at that level too.

DR. EBERT-MAY: So then the students could start their own illusion, they can walk into our classroom and say this is not what we’re paying for, and that would be the penultimate. But also I would say Nancy that there are a lot of large classes that aren’t precise measures of general distribution credits that extend besides the measures that we would like to see.

DR. LABOV: I think Diane raises a really interesting point, and one of the things that we’ve heard in doing our MSP workshops is that one of the hardest parts of this is getting institutions of higher education involved in the entire process, and the question is what is it going to take to do that and I’d like your thoughts 142 about can assessment be the place where that happens, in other words by doing the kinds of exercises that we did here today, where people in higher education working with their colleagues in K-12 and trying to think through what these assessments mean and ultimately what they mean to their own students in college, in introductory courses and future teachers who are in those courses. And also the articulation issues that colleges face with K-12 districts, can assessment be the thing that begins to bring people together or do we need to look for something else in addition.

DR. SHEPARD: Let me, this is Lorrie Shepard, let me give you some examples. First I credit Bruce Alberts with this because his vision, when he became president of the National Academy of Sciences, was to change K-12 education and to have scientists have something to contribute to that. And so I feel like he recruited several of my science colleagues on the Colorado campus to be partof national committees, etc., and so he sort of converted Dick McCray and Carl Weiman who won the Nobel

Prize and Bill Wood in biology, etc. And they’ve all served on various committees and have now decided that they’re going to transform undergraduate education. They are teaching freshmen and sophomore level courses, they are teaching small sections so that they can teach them in a 143 fundamentally different way, we’ve invented a strategy of having students be learning coaches who are sophomores, reparticipating in a course they took as freshmen and they are getting funding from our NSF grant to come into the teacher preparation program.

It is important that his vision change at all levels, it is harder to do it at the college level then at any other level because the students are not calling for reform because they haven’t had it, they’re really annoyed, and it’s one of the things I warned my colleagues about because we’ve had the experience of the resistance to this kind of change because it’s very threatening to be asked to really learn and to explain and to work in a group, etc.

But we’re having some really exciting successes.

Your question, Jay, was whether or not assessment can make a difference. I really think that the physicists on campus really got a lot of from Eric Mazur’s experiences at Tufts and it has then become a focal point on our higher education projects that conceptual assessments are what you administer pre and post for the evaluation of this study but it’s also what you administer to get what it is that your students aren’t getting. And the contrast between the procedural items that they can do and the conceptual item that you assumed they had to have understood to have done the procedure is being very instructive to the professors. 144 And Eric Mazur gives a great talk about that, he’s got a good book that helps and it really dovetails with everything we’ve been talking about.

DR. DONNAY: My name is Victor Donnay, I’m from the MSP -- and the math faculty at Bryn Mawr College and I wanted to respond to your last comment with what I guess the NSF people would refer to as a nugget. And the way I ended up over the last three or four years to get involved in our MSP was started by one of my undergraduate math students at Bryn Mawr who had been a high school student in

Philadelphia in one of the first trial projects of the

Interactive Math Program that was supported by a previous

(auditory unclear) LCS grant and she came into my class and started saying, what you’re doing in your class is a lot like what we did in my high school math class and it sounded very interesting; I invited her professor to come speak to our group of faculty and that had various connections that led me to join the project. She ended up being an outstanding student, got an honors thesis with me with a piece of math research that we’re working to get published, stayed on to get a master’s degree at Bryn Mawr and what struck me about her was she was much more independent and self-teaching then a typical student who would expect you to kind of give the answers in the book, and it’s a small sample of one but it made a big enough 145 impact on me that here I am today.

And the other point about sustainability, one of our goals in our MSP we are targeted but we have 48 school districts and 13 colleges and universities in our grant and we are getting $12.5 million, which does seem like a lot of money, but with a state education budget that’s a rather small amount, and so we are hoping to work with the state to have them continue to fund our next project and other ones in the state after our five years are up.

DR. LABOV: Thank you. Other questions, comments?

DR. SAYLER: I’m Ben Sayler from Black Hills

State University. Both Lorrie and Andy talked about getting large getting assessments and classroom assessment and classroom instruction and standards, which are probably state standards but could be district standards to all be in alignment. It seems like a great utopia but for our MSP we have a limited amount of time, and making changes to get everything in alignment seems like it probably won’t happen before our funding is over. So I’m wondering in the absence of having that alignment, do you have any comments about what to do, the things that we do have some control over are classroom assessments instruction, district standards conceivably. So if you have any comments about what to do when you can’t get everything aligned.

DR. PORTER: Well, I don’t know whether having 146 everything aligned would be a utopian, you’d want to make sure that whatever it was aligned to was something you thought was important, kind of a scary concept that it might not be. But I did want to comment that to me classroom assessments and instruction are indistinguishable, classroom assessments are instruction, I think Lorrie said this, but not only that they may be at least at the high school level the most important instruction in terms of where students place their effort.

And I did one study in which I collected all of the teacher’s tests in the schools and teachers that I was studying. I also had them describe for me in these procedures I told you about what their content was that they taught. And what they tested was only a small and very biased fraction in most cases of what they taught.

This is understandable. Doing good assessment activities, especially for reasoning, proving, problem solving, making connections, is not easy work and most teachers have not been trained to do this, one, and most teachers are extraordinarily busy. High school teachers see like 150 students a week, a day, well, over and over and over again, whatever. And so what I’m saying is that if you can’t have everything aligned one place that I would focus my energies on having these assessments that are these day in day out assessments that the kids experience as a part of 147 instruction and teachers use to make grades on and kids care about these grades, get those to be aligned with important worthwhile content. And that could make any one thing you could do a big difference.

Having said that now I recognize it would be hard to do that because what you’re talking about is probably very intensive, probably teacher professional development, there would need to be some incentives, you would need to have some way to get people to understand that there is a problem there in the first place. So it wouldn’t be an easy project to do but if you could do it, it’d probably be one that paid big dividends.

DR. SHEPARD: let me just say amen and add just a little bit more. An exercise that I go through with groups of teachers is to really explicitly figure out what is missing from the test, so it’s just a Venn diagram and it’s just trying to illustrate what’s overlapping. Andy’s exercise is much more detailed but the idea is what’s unique to the test that you don’t care about very much, what overlaps between your curriculum and the test, and what is important in your curriculum and your standards and expectations for students that is not captured in the state test, so this goes to your question about when the state test is not aligned, what should you do.

And I would say that projects should make a 148 commitment to formally assess that important content that is left out of the state, and I’ve encouraged schools to do this and present it to their PTAs because a lot of teachers feel that they have to keep paying attention to the test in an inappropriate way because of the pressure from the community and because of the pressure from parents in the schools.

So I try to make explicit how much the test does overlap, and in Colorado we have an open-ended assessment and it does a pretty good job. But in some places if it’s all multiple choice, if you’re from Virginia for example,

I’ve looked at that test pretty carefully, and it’s an impoverished assessment. I guess it’s going in the minutes. So teaching to that test has a different effect then “teaching to the test in Colorado.” So I think what you want to do is have a formal way of representing that other content and seeing that it’s assessed at least once a year with a district or an MSP joint assessment.

DR. BUNT: I also want to suggest the message that Lorrie sent in her earlier one, at least the one I took away with, is that if students are really engaging teachers in deep conceptual understanding, developing the deep conceptual understanding and spending enough time learning it they will do well on any test. So the best investment of their time is in that instruction and if they 149 need to understand what that instruction needs to look like then those rich assessment tests that are open ended that she said could be used either for instruction or assessment, give them that idea of how much deeper that understanding is. And there’s a lot of research to back that up I’m thrilled to see based on Lorrie’s report to us earlier.

DR. LABOV: Any other questions, comments?

Rebuttals?

PARTICIPANT: -- from Duke University. I teach biology at the University and work one on one with students who are having trouble with science and math learning and it’s been kind of an interesting world to come out from both ends because in the teaching side you’re trying to get students to think critically at the university level, which is most of my research these days, and it’s amazing how resistant they are and that speaks to what Lorrie said and other things. But on the other hand we discovered that the only way to actually help them do better when I meet with them one on one is to get them to do things like assess why they’re in that classroom, what are you doing with this problem, what are you supposed to be learning and that kind of gets to what Andy was saying and somebody I think Dave

Smith up in front might have mentioned, uses the students, yes, that’s exactly what I do, I get them to deconstruct 150 the problem and really figure out what are you doing this.

And I guess my comment would be that these problems are becoming more pervasive in my opinion and ultimately the universities are going to have to take responsibility for the fact that, if we keep getting students that are so unprepared to learn information the way that we expect them to, we better start demanding that they get prepared before they get there. And this is at Duke, I can imagine it can’t possibly be better at NC State or a variety of other students in the area that get a broader swath of students.

So maybe that’s the answer, maybe the higher education institutions really have to push for the reform once these kinds of programs run out.

DR. SHEPARD: But not just reform K-12 because the teachers who are teaching in K-12 were taught science by your colleagues.

DR. SHULER: I wanted to come back, cycle back to the issue of sustainability. I know this is not a criteria of the MSP but in the 18 years work that we’ve been working with school districts throughout the country I wanted to throw this out for you to think about. We have found that the active engagement of business and industry as a part of your work is absolutely critical to sustaining your work.

And a small investment with business leaders and getting them to understand the culture of school, what effective 151 teaching and learning in mathematics is like, will yield what we call an ROI, return on your investment, that’s quite extraordinary. I just want to give you my case example. There are 22 textbook adoption states in this country, coincidentally most of those are below the Mason

Dixon Line with the exception of California and Oregon.

And what we have found is that most of the instructional materials that the MSPs are using or plan to use are NSF research-based math and science materials, many of them.

However, most of the state adoptions are not in favor of those kinds of materials. I’ll give you an example of one state and I had a coalition of three business, three major businesses, that went to one of the seven states and demanded that these materials, both math and science, be replaced back on the textbook adoption. There is no state agency school district or university that’s been able to accomplish that. So they become critical in the change of policy at the state level and local level. Also they’re instrumental when the superintendent disappears after 2.2 years, that they come and advocate for sustainability. So

I may be speaking from afar, but for those of you who haven’t engaged them I would recommend that.

DR. LABOV: Thank you. Don and then one final question after Don.

DR. LANDENBERG: Don Landenberg with a modest 152 suggestion. Over the past several years as I’ve been learning some about learning, I have learned that quite a lot is known about learning and I found it very helpful. I also observed that much of what I have learned has been from sources that seem to be aiming at their professional colleagues or perhaps at teachers and professors, and that’s great. But wouldn’t it be nice if somebody would aim something at our students, your students at Duke for example, and I’ve been thinking it would be nice if somebody would write something that might be titled using a brain to learn, an owner’s manual, so that our students, perhaps in high school or middle school or somewhere, could learn something about how they strategically can guide their own learning process. And the impression I have is they know almost nothing about that, even less then their professors. It would be nice if we could prepare them to do that.

DR. LABOV: Of course how people learn talks a great deal about the concept of meta cognition, of understanding your own learning and the learning process and we definitely need to have more people understand how that operates and what it involves in K-12 and higher education.

There was one more --

DR. SIMONIS: Doris Simonis from Kent State, I 153 just wanted to comment on that and kind of reinforce it because I was at a meeting this week for a business leader coalition -- we have the best supporters a university can have, we need to have good science and math skills and we’re not getting them and we’re tired of them -- up from other countries --. And I think we should play on that, they are aware of assessments in the United States not being number one or number two or even number three. And they are aware that our state -- we would cut back on funding for education -- we have to compete in a global economy, so that may be something we should -- that we do need to consider as we try to professionalize the teaching profession, try to make the kids be the best they can be so they can help their children to enlist some of our colleagues --

DR. LABOV: Thank you. It’s 5:45 now and I would like to thank all our panelists and presenters for a very rich discussion, please join me in thanking them.

[Applause.]

I also want to thank all of you for your participation, I find personally that this has been a wonderful set of discussions this afternoon and we look forward to many more tomorrow and Tuesday.

A couple of logistics. We will be having dinner for you on the third floor of this building in about 15 154 minutes and then we will begin with breakfast tomorrow morning at 8:00. At about 7:00 this evening for those of you who are staying at the Wyndham Hotel we will have a shuttle service that will take you back there, most of you should be staying there, I think a few of you may be staying over at the Hyatt from having attended the MSP meeting. We didn’t know about that so we only have a shuttle back to the Wyndham.

Also on your way out, so that you can think about tomorrow, we have a time period set aside for breakout sessions and there are five different breakout sessions and we have a handout that describes to you what those different breakout sessions will be. Please pick this up at the registration table on your way out so that you can take a look at this tonight and tomorrow morning and decide where it is that you would like to go. These are the places where we’re really going to be offering a lot more depth and detail about many of the more general concepts and issues that have been discussed here.

So thank you very much and we look forward to having you join us for dinner.

[Whereupon the meeting was recessed at 5:47 p.m. to reconvene the following day, Monday, February 2, 2004.] 155

Recommended publications