In the context of TOEFL 2000, performance assessment tasks may be used to evaluate foreign students' English language proficiency in a manner that closely resembles the tasks these students would be required to perform in an academic setting. The implications of using alternative assessments in other high-stakes testing environments similar to TOEFL have been discussed in the recent literature. This paper summarizes the psychometric and consequential issues involved in the use of performance assessments that are of relevance to the TOEFL 2000 project. Based on this review, several findings are of note: (1) On the whole, results from performance assessments show a high degree of task- specific variance, resulting in lower levels of score reliability than are found in traditional assessments. (2) With careful design of scoring rubrics and training of raters, the magnitude of variance due to raters or interactions of raters with examinees can be kept a level substantially smaller than other sources of error variance, the most notable of which is topic or task specificity. (3) Because performance assessment tasks are typically complex, thereby limiting the number of tasks that can be given in a fixed testing time, and because the method and content of task-based measurement can have a large effect on test scores, performance assessments are particularly context-bound and of limited generalizability.