This evaluation study compares the performance of a prototype tool called SourceFinder against the performance of highly trained human test developers. SourceFinder - a specialized search engine developed to locate source material for Graduate Record Examinations (GRE) reading comprehension passages - employs a variety of shallow linguistic features to model the search criteria employed by expert test developers, to automate the source selection process, and to reduce source-processing time. The current evaluation provides detailed information about the aspects of source variation that are not well modeled by the current prototype. Approaches for enhancing performance in identified areas are discussed. The present study also provides a more explicit description of the source selection task, and a rich data set for developing a less subjective, more explicit definition of the types of documents preferred by test developers.