ETS® Interview with Mark D. Reckase on Educational Assessments and Achievement Measures, March 29, 2017

On-screen: [ETS®]

On-screen: [You discussed two types of tests – the continuum model and the domain model. What are some distinctions between the two?]

On-screen: [Mark D. Reckase, University Distinguished Professor Emeritus, Michigan State University.]

Mark Reckase - The continuum model is basically the one that we're used to. This is what we use with our kids. We stand them up against the wall. We make marks on the wall to show their height and see how much they grow. Then we have a continuum of height and we see how much improvement they've made over time and how much they're growing.

The domain model is more — I was trying to think of a good example for this. If you're going to have a neighborhood garage sale and you have hundreds of things that people are going to sell and at the end of the day people say, "How did we do?" You say, "Well, we sold 80% of the stuff we had." It's a whole collection of things, and some people may be happy because we sold 80%, but the person who didn't sell anything will be unhappy because the domain is not a continuum. It's a big collection and we're just saying how much of that collection was accomplished.

On-screen: [What are the advantages of one model over the other?]

Mark Reckase - The big advantage of the continuum model is, we've developed a lot of technology to support that model and it's easy to understand. It's easy to think about growth and things when we're on a continuum model. People have moved from one place to another. We learn about it in school.

The domain model is also easy to understand because it's tied to instructional models. You specify a curriculum and you say we want students to learn all these things in this curriculum, but the curriculum can be very complex if it's in social studies, or science, or in some of those areas. It's more like that garage sale. There's a whole series of ideas, and it's hard to put those onto some single line to go over and say how much people have acquired.

On-screen: [What does the continuum of achievement mean?]

Mark Reckase - The continuum of achievement is closest to when you think about the continuum we have for time. Time doesn't really have a true zero point. I don't know where zero time is. We usually measure time from some reference point. I could measure it from midnight and say this is what time it is in this day, or we can measure it from some major event in life; your birth or sometimes an event in history. We go over and talk about a distance away from these particular points. That's what this continuum is like. We have this scale, this line, and we're locating things along this line. We say people who are at one end of the line are performing well and people at the other end of the line are performing less well. We'll have reference points that are defined by either standards, or averages, or things like that to go over and show the location of a person relative to some point that we know what that point means. This is the continuum idea.

On-screen: [What are some of the ways the policy makers are trying to have it all?]

Mark Reckase - The having-it-all, I've seen this many, many times in meetings. First of all, they would like to have a short test. I actually heard this earlier today. They said, "We've got 30 minutes or we've got 45 minutes for a test." Then we want to have that short test, but we want to measure growth over time. We also would like to have sub scores to go over and give details. We'd also like to use big items. These are items that are good targets for instructions so that we can make a good argument for why we want to have people teach towards the test. We want to do all these things, but we still have this 30-minute time limit to try and do it. This is where we're trying to compromise to give as much of that as we can, but this is where we run into problems.

On-screen: [Could you define big items?]

Mark Reckase - Big items are like classroom activities. If a teacher gives an assignment that says, "You're working for McDonald's. The cash register is broken down. You now have to start figuring out the cost of meals by hand instead of using the cash register. Here's somebody's who's ordered these things. Figure out exactly what that cost would be and how much change they would get back from the amount of money that they give you." This would be considered as a big item. It's supposed to be realistic. It's supposed to motivate students because it's like what they might have to actually do in life, but they take time. That's why we call them big items, and they've got a lot of parts to them.

On-screen: [And they are good targets for instruction?]

Mark Reckase - The idea is that they would be; that this is the kind of activity that a teacher might actually use in their classroom, so that if you're trying to teach towards the test then you're teaching towards what you would normally do in a classroom anyway. This is a good model. It may even be good things for a teacher to try if they haven't tried those kinds of things before.

On-screen: [What are some of the compromises that the makers and designers of achievement tests have to make?]

Mark Reckase - When we do these things what we try to do is, if somebody says there's 45 minutes, we'll try and make a test that will fit into that 45-minute timeslot and then go over and do all the other things that people want to do. They have to make compromises when that happens. They might say, "We want sub scores." We'll say, "We can give you one or two sub scores, but not five because there's not enough time," or, "The sub scores won't be as reliable as you'd like because we don't have enough time, or we'll score them in two or three categories because there's not enough time. We won't be able to give you growth measures because there aren't enough items." We'll have to make compromises to try and achieve all the things that people want.

On-screen: [How can testing companies respond to demands from states or districts who want less testing, but still want the validity and reliability?]

Mark Reckase - There are a couple models that are out there that people are working on. Unfortunately, these things are research models. Myself, I worked on one which was a portfolio model that tried to collect the actual activities from students that they produced during the year, take those and organize those into a portfolio, and then score those in a rigorous way and use that for scoring without giving any tests at all. That worked very well, except it was expensive and it took a lot of organization to go over and collect all those materials from the schools. Even though I thought it was a success from a research point of view, it never turned into a product that was actually marketed.

On-screen: [Why the interest in sub scores or diagnostic classifications?]

Mark Reckase - When I was a graduate student I tutored a lot of students in statistics, those who were having difficulty in statistics. The first thing I would usually do when I was tutoring them is, I would sit down with them and try and figure out where they were having problems so I knew how to fix what their problem was, either through better explanations or by giving them assignments that would get them to be mastering some kind of material.

In order to do that, you have to go over and do diagnosis. You have to figure out where their strengths and weaknesses are. That works very well in tutoring. When you have classrooms full of students and you have a standardized test that applies to all students, it's more of a challenge to try and produce sub scores or diagnostic categories that are going to be useful for that. I think people feel that they would like to do that because they're thinking about this tutoring model. They're trying to figure out how to best educate each individual student and they think that they can get information from the test to do that kind of improvement of instruction.

On-screen: [You call for more sophisticated models and procedures to give us information about what is learned. How do you think that that can be accomplished?]

Mark Reckase - This is a pipedream of mine, and I've been trying to think about how to do this for years. Now that we've got computers embedded in the classrooms; most classrooms have a lot of computers. If you get to the point where every student has their own computer and they're doing all their work on computers, we could collect all of the work that they've done over a full academic year and we could take that information and aggregate it. This is why there's a lot of interest in what they call big data, to go over and do data mining of all this information.

There's also, you need instructional models and you need models of cognitive processing that will show how this information that you collect from all the students all the time shows what it is that they know and they can do. If you can put all that together, then we won't need the formal standardized test. We'll just be collecting everything they do throughout a whole academic year and then we'll know. We don't have to worry about domain coverage, or scaling, or anything.

On-screen: [Do educational assessments yield educational achievement measures?]

Mark Reckase - In most cases the answer is that educational assessments do not yield achievement measurements in the sense of formal measurement, like the wall chart for height, where you can go over and have a zero point and figure out how far away from zero; or even the time analogy that I was using before from a reference point. The more nuanced answer is that, in some areas that are hierarchically arranged like mathematics, you can get a pretty good approximation to an achievement measurement. In other areas — the sciences, social studies — that are more amorphous in the way their content is organized, it's harder to think about how you're going to put all this on a scale, and there are a lot of different concepts that are being included in the instruction that students have.

On-screen: [ETS®. Copyright © 2017 by Educational Testing Service. All rights reserved. ETS and the ETS logo are registered trademarks of Educational Testing Service (ETS). MEASURING THE POWER OF LEARNING is a trademark of ETS.]

End of ETS® Interview with Mark D. Reckase on Educational Assessments and Achievement Measures video.

Video duration: 9:12