Frequently Asked Questions About the Criterion® Online Writing Evaluation Service 

Using the Criterion® Service in Teaching
How can the Criterion® service help students?

Students get a response to their writing while it is fresh in their minds. They find out immediately how their work compares to a standard and what they should do to improve it. The Criterion service also provides an environment for writing and revising that capable and motivated students can use independently. This environment, coupled with the opportunity for instant feedback, provides the directed writing practice so beneficial to students.

How many topics are available?

Currently, there are 61 College Level I topics appropriate for first-year writing courses, practice and placement; 64 College Level II topics appropriate for second-year writing courses and practice; 10 College Preparatory topics; 14 GRE® test topics; and 35 TOEFL® test topics.

The Criterion topics library also contains a group of basic skills writing assignments drawn from grade 11 and 12 topics called "College Level Preparatory." These topics are graded against a lower-level scoring rubric and can be assigned to gradually move incoming freshmen up to the first-year writing level.

In addition, when educators want students to write on a topic not available in the Criterion library, they can create and assign their own prompt for a student assignment. Although essays written on educator-created topics do not receive the holistic score, all of the features of diagnostic feedback will be reported when the essay is submitted. Colleges and universities can also work with ETS to create new topics tailored to their needs.

The Criterion library of topics contains assignments representing the following writing genres: persuasive, informative, narrative, expository, issue and argumentative.

The Criterion service also offers a library of topics for high school, middle school and elementary school, beginning at grade 4.

Where do Criterion topics come from?

Criterion topics come from a number of sources, including ETS testing programs such as the NAEP® assessment, California State University, the The Praxis Series™ assessments, the GRE test and the TOEFL tests. Criterion topics have been developed based on representative samples that are mode-specific, and utilize 6-point holistic scales based on widely accepted writing standards.

How does the Criterion service handle an unusual writing style?

The Criterion service looks for specific features of syntax, organization and vocabulary. If the essay under consideration is not sufficiently similar to those in its database of already-scored essays, the Criterion service posts a warning, called an Advisory, saying that it is unable to give an accurate score. Advisories usually result from essays that are too brief or those in which the vocabulary is unusual or the content is off-topic.

Will the use of the Criterion service stifle creative writing among students?

No. The Criterion service is designed to evaluate writing done under testing conditions — situations in which even the most creative writers concentrate on "playing it safe" with straightforward and competent writing.

Will the Criterion service catch cheating or plagiarism?

No. The Criterion service simply evaluates the essay. It is up to the institution to ensure that students are working independently and submitting their own work.

Instructors can opt to display sample essays for some topics on the Create Assignment screen. Students can then view the samples and refer to them while they write their own essays. The sample essays are in a read-only format and cannot be copied and pasted into another document.

What information does the Criterion service report to educators?

Educators have easy and secure access to each student's portfolio of essays, diagnostic reports and scores, as well as summary information on the performance of entire classes.

What information does the Criterion service report to students?

Typically, students get diagnostic feedback, as well as a holistic evaluation score, each time they submit an essay. However, educators can block students from seeing their scores — and may choose to do so if they use the Criterion service for testing. Educators also have the option of allowing students access to their own individual portfolios of essays, diagnostic reports and scores.

Can professors limit student feedback?

Yes. Professors can elect to report all, some or none of the feedback analysis. When creating an assignment, professors turn the score analysis feature on or off, as well as select which diagnostic feedback to report.

Can professors limit access to assignments?

Yes, professors can limit access when selecting assignment options. For example, the date and time an assignment is available are selected by professors during setup. They can also limit how many times a student can write and revise an assignment.

Can professors impose time limits on assignments?

Yes. Many assignments available from the Criterion library of topics have time limits associated with them. When creating the assignment, professors decide whether to impose a time limit. They can also turn off the time-limit function to allow unlimited writing time.

How is the Criterion service feedback different from Microsoft Word's Spelling and Grammar tool?

Microsoft Word's Spelling and Grammar tool is often used by good writers looking for quick analysis of common errors. Students who are learning to write need more accurate feedback on organization and development. The Criterion service provides it.

What is the Writer's Handbook?

The Writer's Handbook is an intuitive online tool that a student can access while reviewing diagnostic feedback. It explains every error or feature reported by defining it and providing examples of correct and incorrect use. There are five Writer's Handbook versions available to assign, including an ELL version and a bilingual (Spanish/English) version.

Does the Criterion service discriminate against students who struggle with standard English — for example, minorities and ESL students?

No. The Criterion service is incapable of discriminating on the basis of race, sex, national origin or student's history because these factors do not exist in its analysis. The program simply measures features in an essay and compares them to features in previously scored essays used to define the rubric. So, if the collection of sample essays includes essays that use non-standard English and also earn high scores, then the Criterion service will assign a high score to other essays with the same features.

Using the Criterion Service for Remediation, Placement and Assessment
How often does the computer's score agree with the score of a faculty reader?

In the vast majority of cases, ETS researchers generally found either exact or adjacent agreement (within one point) between the Criterion scores and those of a trained essay reader. Both used the same scoring guidelines and scoring system.

How can the Criterion service be used for writing remediation and in basic skills writing classes?

Professors assign the Criterion standard topics or use their own topics to give students opportunities for additional writing practice. The Criterion topics library contains a group of writing assignments called "College Level Preparatory." These topics are graded against a lower-level scoring rubric and can be assigned to gradually move incoming students up to the first-year writing level. Professors may assign topics to encourage students to focus on problem areas that will improve their writing. The immediate feedback features of the Criterion service provide additional motivation for students to write and revise their essays when writing on their own.

How do institutions use the Criterion scores for placement?

Some colleges assign students to composition classes on the basis of their scores on a Criterion-scored essay — or the combination of a Criterion score and other indicators. The electronic score should not be the sole basis for a placement decision. It is best to combine a Criterion score with the score of a faculty reader in the same way that institutions combine scores from two different faculty readers. If the two scores differ by more than one point, a different faculty reader should also evaluate the essay.

How do institutions use the Criterion service for assessment purposes?

Some institutions use the Criterion scores for exit testing — combining a Criterion score with the score from a faculty reader in the same way they combined scores from two different faculty readers. If the two scores differ by more than one point, a different faculty reader also evaluates the essay. Some institutions use the Criterion service for benchmark testing, assigning the Criterion-scored essays at specified points during a semester.

How can the Criterion service be used in a campus writing lab?

When the Criterion service is available in a campus writing lab, tutors and writing mentors have access to topics, feedback and student portfolios. They also have a way to communicate with professors about student progress. Using the Criterion service in a writing lab facilitates writing instruction across the curriculum when students use the lab to check writing in-progress for all of their courses. Providing access to an open-ended professor's topic allows students to write an essay about any subject assigned by any professor. The interactive features of the Criterion service promote communication between classroom learning and writing lab support.

How do students feel about being scored by a machine?

Most of today's students have had experience with instant feedback in computer programs and are becoming more comfortable with the idea of computerized scoring.

Can the Criterion service score essays on other topics?

Yes. Using the Scored Instructor Topic feature, teachers can create their own topics that are parallel to the Criterion library prompts, and the students' essays will receive holistic scores upon completion.

A Criterion pop-up window explains the requirements, and a button link offers step-by-step instructions on how to create either a persuasive or expository topic that can be scored.

Understanding the Technology
What is a holistic score?

A holistic score is an overall score (usually on a 4- or 6-point scale) that is given to an essay. The Criterion holistic scoring compares a student's writing to thousands of essays written and evaluated by writing instructors.

The essays used to build the scoring models have been scored by trained readers and were written by students under timed testing conditions. The writers had no opportunity to revise, use a spell-checker or reflect on what they had written. So when students write on the Criterion topics in a regular class, working under more relaxed conditions, instructors and students should recognize that students' scores may not precisely compare to those of the samples.

What are trait level indicators?

The Criterion service includes the option of reporting trait level indicators in addition to the holistic score and diagnostic feedback currently reported by the application. Individual trait level indicators can be enabled or disabled by the instructor on the Create Assignment screen.

The three traits for which information is provided are:

  • Grammar, Usage and Mechanics (grouped together for effective trait level feedback)
  • Style
  • Organization & Development

If a trait receives more errors/comments than expected for a holistic score, a message will display to the student, indicating that the trait category needs attention.

These messages serve as a clear signal when the proficiency level of one or more traits is not consistent with the holistic score received for that essay. The trait identified as needing attention is specific to that essay submission and is not necessarily indicative of the student's writing in general.

For example, in an essay that receives a holistic score of 5 on a 6-point scale, the expectation for proficiency is that there will be relatively few errors in grammar, usage and mechanics, and few suggestions for improvements in the areas of style, and organization and development. We know this because the e-rater® scoring engine analyzes these components, among several others, as part of the computation of the holistic score. If, however, the essay includes more errors in grammar, usage and mechanics than is typical for essays that usually receive this score, then a trait feedback message will display. This information focuses attention on specific aspects of the essay where improvements are likely to have the most impact on the overall holistic score.

The diagnostic feedback also provided in the Criterion service can provide additional detail to help find and address weak areas in the essay.

How are holistic scores and trait level indicators related?

Any trait level indicator displayed is relative to the holistic score. It indicates that the proficiency level of the trait is not consistent with the holistic score received for that essay and is lower than expected.

How does the Criterion service come up with its scores?

The Criterion service is based on a technology called e-rater that was developed at Educational Testing Service. The e-rater scoring engine compares the new essay to samples of essays previously scored by faculty readers. It looks for similarities in sentence structure, organization and vocabulary. Essays earning high scores are those with characteristics most similar to the high-scoring essays in the sample group. Essays earning low scores share characteristics with low-scoring essays in the sample group. As you might expect, the sample essays are scored very carefully, and the collection must include a sufficient number of essays for each score point.

What is the technology used in the e-rater® scoring?

E-rater scoring is an application of Natural Language Processing (NLP), a field of computer technology that uses computational methods to analyze characteristics of text. Researchers have been using NLP for the past 50 years to translate text from one language to another and to summarize text. Internet search engines currently use NLP to retrieve information.

E-rater scoring uses NLP to identify the features of the faculty-scored essays in its sample collection and store them — with their associated weights — in a database. When it evaluates a new essay, e-rater scoring compares its features to those in the database in order to assign a score.

Because the e-rater engine is not doing any actual reading, the validity of its scoring depends on the scoring of the sample essays from which the e-rater database is created.

Can students trick the Criterion service?

Yes. Since the e-rater engine cannot really understand English, it can be fooled by an illogical, but well-written, argument. Educators can stop students from deliberately trying to fool the Criterion service by announcing that a random sample of essays will be read by independent readers. The Criterion service will also display an "Advisory" along with the e-rater score when an essay displays certain characteristics that warrant attention compared to other essays scored against the same topic.

Must students be connected to the Internet to use the Criterion service?

Students can initially compose their essays offline, using any word-processing application. They will ultimately need an Internet connection, however, to be able to cut and paste their essays into the Criterion essay-submission box so their work can be scored and analyzed. For assignments that are timed, essays should be composed online to ensure accountability by all students and to accurately reflect their writing skills in the timed environment.

Can I import student identifiers from my data management system?

Yes, the Criterion service has import capabilities at several levels.

A Criterion Administrator will use the "Advanced Import" feature to create college, course and class levels by importing the required data into the Criterion service from a comma-delimited (.csv) format.

Faculty will also be able to import student information into the Criterion service using the "Import Student Information" function on the Classes Report screen.

Details are provided in both the HELP text and the Criterion® User Manual and Administrator Supplement.

Can I save my data?

Yes, the Criterion service has both an "Export Report Data" and an "Archive Portfolios" feature that can be used to create export files in a comma-delimited format (.csv) that can be opened by most text editors and spreadsheet programs. Detailed instructions for both features are provided in the Criterion® User Manual and Administrator Supplement.

What technical requirements must a user have to access the Criterion site?

The Criterion service requires only an Internet connection and a web browser. It is PC, Linux and Mac compatible.

For a complete description of minimum and recommended standards and network configuration suggestions, please refer to the System Requirements Sheet.

Understanding the Analysis of Organization and Development in Student Essays
Why do educators value the Criterion automated analysis of essay-based organizational elements in student essays?

The Criterion capability to analyze organizational elements serves as a critical complement to other tools in the application that provide feedback related to grammar, usage, mechanics and style features in student essays. There is now broad acceptance of automated essay-scoring technology for large-scale assessment and classroom instruction. Faculty and educational researchers encourage the development of improved essay-evaluation applications that not only generate a numerical rating for an essay but also analyze grammar, usage, mechanics and discourse structure. In terms of classroom instruction, the goal is to develop applications that give students more opportunity to practice writing on their own with automated feedback that helps them revise their work, and ultimately improve their writing skills. This technology is a helpful supplement to traditional faculty instruction. Specifically, it is more effective for students to receive feedback that refers explicitly to their own writing rather than just general feedback.

Which organizational elements are analyzed?

This leading-edge technology employs machine learning to identify organizational elements in student essays, including introductory or background material, thesis statements, main ideas, supporting ideas and conclusions. The system makes decisions that mirror how educators perform this task. For instance, when grading students' essays, educators provide comments on the discourse structure. Professors may indicate that there is no thesis statement or that the main idea has insufficient support. This kind of feedback from a professor helps students reflect on the discourse structure of their writing.

How did the system learn how to do the analysis?

Faculty readers annotate large samples of student essay responses with essay-based organizational elements. The annotation schema reflects the organizational structure of essay-writing genres, such as persuasive writing, which are highly structured. The increased use of automated essay-scoring technology allows for the collection of a large corpus of student essay responses that we use for annotation purposes.

How can this analysis help students?

As students become more sophisticated writers and start to think about the organizational structure of their writing, the Criterion application offers them organizational feedback. Students who use the tool can see a comprehensive analysis of the existing organizational elements of their essays. For instance, if a student writes an essay, and the system feedback indicates that the essay has no conclusion, then the student can begin to work on this specific organizational element. This kind of automated feedback serves as an initial step in organization and development improvement of their essays. This kind of feedback also resembles the traditional feedback that a student might receive from a professor.

Understanding Organization and Development Feedback
How does the automated system make decisions about text segments in a student essay and the corresponding organizational labels?

The algorithm developed to automatically identify essay-based organizational elements is based on samples of faculty-annotated essay data. Two faculty readers were trained to annotate essay data with appropriate organizational labels. Readers labeled approximately 1,400 essays on college-level topics.

What is the agreement rate between two faculty readers on the labeling task?

Two faculty members are in general agreement on all labeling tasks.

What is the agreement rate between the system and the faculty reader?

The faculty reader's assessment is in general agreement with the system. In the vast majority of cases, ETS researchers generally found either exact or adjacent agreement (within one point) between the Criterion scores and those of a trained essay reader. Both used the same scoring guidelines and scoring system.

Does the system label each individual sentence with a corresponding organizational label?

Yes. Sometimes multiple sentences are associated with a single organizational element, and the entire block of text is highlighted and appears to be assigned a single label. In fact, each sentence is labeled individually.

Does the system label according to sentence position only?

No. Many features, including word usage, rhetorical strategy information, possible sequence of organizational elements and syntactic information are used to determine the final organizational label.

Getting More Information
Where can I find additional information about the Criterion service and the e-rater technology?

The research papers on the ETS website are sources of more information about the Criterion service and its underlying technology.

Is there a publication that contains a detailed description of the current system?

Yes. See Burstein, J., Marcu, D., & Knight, K., Finding the WRITE stuff: Automatic identification of discourse structure in student essays (2003). In S. Harabagui & F. Ciravenga (Eds.), IEEE Intelligent Systems: Special Issue on Advances in Natural Language Processing, Volume 18, No. 1, pp. 32-39, available at e-rater research.

 

Winner of the following awards:

IMS 2007 Award

Codie Award Winner

Teaching and Learning Magazine's 2005 Award of Excellencentech