angle-up angle-right angle-down angle-left close user menu open menu closed search globe bars phone store

The Good and the Bad of Game-Based Assessment

Focus on R&D

Issue 1

April 2016

By: Hans Sandberg

What is game-based assessment (GBA)? Can computer games provide evidence of what students have learned and can do? Are there good and bad GBAs? Focus on ETS R&D asked three experts in ETS’s Research & Development (R&D) division to discuss this new type of assessment via email.

What is a Good GBA?

Robert Mislevy: The words assessment and game both cover a lot of territory. When people talk about an assessment, they could refer to a certification test, a state accountability test, or a simulation game where a player’s every move can be observed and evaluated. When they talk about a game, they may think of many different things, from Madden Football™ and the World of Warcraft™ to Angry Birds™ and the car race in Mavis Beacon Teaches Typing®.

For example, if we want to assess "argumentation" as a skill, it would not do to create a game that involves mostly running, climbing and jumping.

How we view games in an assessment context, or assessments in a game context, depends on how and how well they are designed. There are well-designed games that can work well for assessment, but there are also games that are not at all suitable for assessment!

One big reason for the interest in GBAs is that they let the players act and react to feedback in a dynamic setting where the situation is constantly changing, depending on the players’ actions and in-game feedback. It will draw the players into the situation and let them know whether they are better or worse off and closer to the goal or farther away from it. The feedback and their actions in the game allow the players to learn by doing, which — besides being a very powerful experience — improves the information coming out of the assessment.

A good GBA requires a story, a mission and competing goals, all working in concert with the learning objective or the skill being assessed. Bad assessment happens when the game elements are distracting, overshadow the targeted skills or just look silly or boring to some students.

All in all, well-designed games can be great for developing knowledge in useful ways, explaining concepts in tangible forms and providing an assessment that gives feedback on learning. But we should be aware of the paradox that complex and complicated games, which may be great for learning, can work badly in assessments with higher stakes, unless students already are very familiar with them.

Assessment Goals vs. Game Mechanics

Andreas Oranje: I agree with Bob, but I’m less comfortable with the notion of GBAs as a separate genre. On the one hand, there are assessment goals, which contain the claims we want to make about someone or something. On the other hand, there are game mechanics, which are rules or methods that regulate how players interact with a game, including what actions are possible, how the game reacts and where players can go.

In a good GBA, the assessment goals and the game mechanics work in unison. The things we ask the player to do in the game have to connect directly with the claims we want to make about the player. For example, if we want to assess "argumentation" as a skill, it would not do to create a game that involves mostly running, climbing and jumping. Instead, the game would be better off if you had to interact with various virtual characters and needed to convince them of something.

The reason that I would rather not see GBAs as a separate genre is because a lot of the game mechanics we could consider are not specific to games alone. There is a lot of overlap between GBAs, simulations and scenario-based assessments and often it is difficult to determine whether something is a game or a simulation or something else altogether. A physics simulation may contain lots of game elements (e.g., discovery mode, direct feedback, goals) and a good game may contain lots of simulation (e.g., avatars, operating vehicles). The reason that we focus on games is that they are designed for sustained engagement in a number of ways, which could be critical to meet particular assessment goals. One such goal might be that we want students to be highly engaged and give it their all during a test so that a claim based on an assessment reflects their best performance.

Matching Genres with Learning Goals

Malcolm Bauer: We know that games can be used for many different kinds of learning and assessment. We also know that different genres of games may be better matches to different learning goals and assessment purposes than other genres. Let me give you two contrasting cases.

The first example is drill-type games where kids practice symbolic addition or subtraction and the level of difficulty increases (harder questions or less time per question). Such games, which are already replacing the traditional worksheet in some schools, may increase engagement, provide immediate feedback and select tasks adaptively. This can help kids develop well-defined component skills, fluency in math and other academic areas.

The second example is a simulated marketplace where kids can engage in discovery learning by playing a buyer or seller. They have to come up with their own approaches and pricing strategies and change prices and quantities in their heads without using numeric symbols or scratch paper. This kind of game can support mathematical reasoning and problem-solving and lead to robust learning that transfers to many situations if it’s carefully designed to structure the learning in specific ways. This is an approach to learning that Keith Devlin, a mathematician and math education researcher at Stanford, advocates for many areas of mathematics.

These two examples represent very different kinds of math competencies. In both cases, the learning goals are part of existing standards (e.g., Common Core) and they are advocated by existing organizations of educators (e.g., the National Council of Teachers in Mathematics, or NCTM).

Not All Activities are Good for Assessment

Robert Mislevy: Great examples, Malcolm! They illustrate Andreas’ point about the need to match activities (some of which may have game-like parts) with goals. You really can’t say in isolation whether a game-like activity could be good for assessment. A simulated marketplace could, for example, make kids explore, learn about pricing strategies, practice mental math and have fun, all while getting assessment-like feedback. But to use such an activity in an accountability assessment for every fourth-grader in a state could result in an assessment with low reliability, since the outcome would depend on children’s particular experiences in their respective classes and families. One child may do well on one activity while doing terribly on another, even though both were designed to assess the same proficiencies. This "low generalizability" problem occurs when tasks meant to measure "rich" performances are administered without regard to what students have been studying (i.e., "dropped in from the sky"). This is, of course, a lesson we learned back in the 1980s.

Games and simulations could allow deeper learning and assessment experiences for students to explore in a wide range of academic subjects.

Worse, even activities that can be effective for learning and formative assessment can lead to unfair assessment when used for high-stakes purposes. We do know that variations in kids’ performances depend in part on their skills in exploring, doing math and figuring out strategies. But they can also depend on other things that shouldn’t play a big role for the results, like their familiarity with the game setting, the language and the representations that are used, and the familiarity with the cultural aspects such as the marketplace setting. The results also depend on how kids react to the game mechanics, the open-endedness, the storyline and additional challenges of the game itself. The same features that make the activity a more valid assessment for some kids as it engages them can make it worse for another kid who hates it.

The Role of Familiarity

Andreas Oranje: Familiarity with the assessment environment is an interesting aspect of GBA. We never think of this in a paper-and-pencil test, probably because it is such a uniform mode that has been practiced for a long time. We have seen that familiarity plays a role when we have compared constructed-response items (which require test takers to construct a written answer) and multiple-choice items across states. The effects appear related to the kind of state tests the students are used to. It will be harder to deal with such familiarity effects in game-based and other virtual performance assessments since they come in so many variations. Designers of such assessments must therefore create robust tutorials, pretest the materials extensively across diverse groups of students and avoid limiting the design to niche devices.

Bob mentioned the "low generalizability" conundrum in these kinds of assessments and its connection to how well "rich" tasks are aligned with what students have been studying. One could add that we should consider the role context plays in making an assessment compelling. Successful games tend to rely on things like compelling narratives, facing adversaries and characters that players can identify with. If we rely less on such context, we could undermine the reasons for using such assessments.

Needless to say, GBAs are currently not ready for summative use (for making decisions about grade advancement and graduation, admission to an educational institution, or certification) and for now look most promising in formative settings (which directly inform learning and teaching). This might, of course, change if mini- and micro-game-based assessments begin to flourish. Those are typically very small game units that can be used to provide highly targeted evidence. Maybe we could consider them the equivalent of multiple-choice items in a game-based setting.

Context and Usage

Malcolm Bauer: We must consider the context for any assessment and for any type of task. More importantly, we must consider the background knowledge that students need to perform well, even if it is irrelevant to what we want to assess. Tests typically include a wide range of contexts. A reading assessment may, for example, include passages about many different topics in order to even out the fact that some students are more familiar with some contexts and less with others. The same approach can, as Andreas suggests, be applied in GBA by having many small games with different contexts and game mechanics. Tanner Jackson, one of the leaders of R&D’s game research, is investigating the potential of such micro-games. However, I see value in complex games and assessments with rich contexts as well, since the competencies we wish to assess are going to be applied in the real world, which often consists of complex and context-rich situations.

Game-based and simulation-based assessment provides opportunities for us to create assessment experiences that are closer to the real life situations where we care about predicting human performance. For example, the performance in a flight simulator may be very good at predicting the performance of piloting an actual plane if the simulation contains the critical characteristics on which piloting depends. And unlike assessing pilots on an actual plane, the simulator makes it possible to estimate how they would perform in a real situation, but without putting them in danger. Games and simulations could allow deeper learning and assessment experiences for students to explore in a wide range of academic subjects — science, mathematics, ELA, social studies, etc. — and for 21st-century skills such as collaboration, communication and social and cross-cultural skills by creating virtual experiences that are impossible to explore in real life, such as simulated historical events in social studies, or planetary experiments in science.

Andreas Oranje is a Principal Research Director and Malcolm Bauer is a Managing Senior Research Scientist in ETS’s R&D division. Robert Mislevy holds the Frederic M. Lord Chair in Measurement and Statistics at ETS and is Professor Emeritus of Measurement, Statistics and Evaluation at the University of Maryland.

More Resources About Gaming, Simulation and Assessment

  • J. P. Gee (2014). What video games have to teach us about learning and literacy. New York, NY: Macmillan.
  • R. J. Mislevy, J. T. Behrens, K. E. DiCerbo, D. C. Frezzo, & P. West (2012). Three things game designers need to know about assessment. In D. Ifenthaler, D. Eseryel, & X. Ge (Eds.), Assessment in game-based learning: Foundations, innovations, and perspectives (pp. 59–81). New York, NY: Springer.
  • D. S. McNamara, G. T. Jackson, & A. C. Graesser (2010). Intelligent tutoring and games (ITaG). In Y. K. Baek (Ed.), Gaming for classroom-based learning: Digital role-playing as a motivator of study. Hershey, PA: IGI Global.

Promotional Links

Find a Publication

Advanced Search

Pulling to the Edge: Advancing Assessment

Pulling to the Edge: Advancing Assessment

Watch a short video about game-based assessment. (5:10)