angle-up angle-right angle-down angle-left close user menu open menu closed search globe bars phone store

Assessment Development and Analysis

Supporting the Quality of Operational Tests

For more than 60 years our team has worked with organizations worldwide to design, equate, scale and score assessments that meet the highest quality standards. We accomplish this through the following capabilities:

Developing Content for Fair, Valid and Reliable Tests

Since our founding, we have been producing test items, test forms and supporting materials for ongoing testing programs — some of which are ETS-owned and some client-owned. Our assessment specialists have extensive expertise in many areas, which allows for the creation of tests with a variety of purposes, such as:

Test purposeExample
Licensure or certification in education The Praxis® Tests
English proficiency assessment The TOEFL® test
Admissions tests SAT®; The GRE® Tests
Placement Advanced Placement® (AP)
Course credit College-Level Examinations Program® (CLEP)
Accountability K–12 tests for a number of states
Major Field Tests
Curriculum-based end-of-course tests Tennessee End-of-Course
Exit testing Maryland High School Assessment
Group surveys to inform public policy National Assessment of Educational Progress (NAEP)
Counseling PSAT/NMSQT®

We develop, analyze and validate all content according to each testing program’s specifications and according to guidelines that are based on recognized standards in the field of educational measurement.

Using Evidence-centered Design

Whether developing new assessments, updating existing assessments or designing alternative assessments, our test designers start by considering the claims score users want to make and what test takers know and can do. They then design assessment tasks to yield evidence supporting those claims. Our improvements of the TOEFL test and the GRE General Test are concrete examples of how researchers and test developers apply this approach, known as evidence-centered design. This approach is also an important tool for creating assessments that are accessible for students with special needs, such as English language learners and students with disabilities.

Equating, Scaling and Scoring Tests Accurately and Reliably

ETS has long been a measurement pioneer, developing statistical methods and processes that improve the reliability and fairness of test results. Our innovations contributed significantly to the operational use of now-common psychometric methods, such as Item Response Theory (IRT) and Differential Item Functioning (DIF). ETS's highly-qualified psychometricians and research scientists have vast experience not only conducting the foundational work to develop methodologies, but also in implementing them in support of our competency in equating, scaling and scoring tests. Such assessments include traditional multiple-choice tests as well as tests that contain constructed-response questions ― items that elicit open-ended responses, such as short written answers, essays and recorded speech. Our psychometricians have expertise in scoring these questions using both human raters and automated scoring capabilities, such as the e-rater® essay scoring engine.

Analyzing and Managing Complex Data for Client Programs

Assessments, in addition to measuring individual skills, can measure collective progress toward state or national policy goals. Our state-of-the art data analysis and management capabilities include the procedures, software and systems necessary to perform complex analysis of assessments on any scale.

We use these capabilities to serve clients worldwide, including the Organisation for Economic Co-operation and Development (OECD). On behalf of the OECD, we manage the consortium responsible for the Programme for International Assessment of Adult Competencies — a five-year study of adult literacy and numeracy in more than two dozen countries. On behalf of the U.S. National Center for Educational Statistics (NCES), we develop content and analyze data for the National Assessment of Educational Progress (NAEP). Policymakers use NAEP results to compare the educational performance of U.S. states and some larger school districts — and to examine the performance of the nation’s educational system as a whole.

In collaboration with the International Association for the Evaluation of Educational Achievement, we host semiannual training academies to disseminate and advance the knowledge behind these capabilities.

Promoting Fair and Valid Use of Test Scores

Our operational testing capabilities include resources designed to make tests that result in meaningful scores for all test takers. We use this validity research capability to support our own testing programs, such as the TOEFL test and the GRE tests, as well as those of our clients. Such research involves raising awareness about the valid interpretations of a test’s results, ensuring that tests cover the appropriate amount and scope of content and investigating the relationship between test scores and what test takers actually know and can do.

Promotional Links

Find a Publication

Advanced Search