Building tests out of large items brings with it a number of problems. One major problem is that it is often too difficult and too expensive to extensively pretest large items. Thus the sorts of screening for flaws that are pro forma for multiple choice items is not often done for large items. In addition, since there are so few large items on an operational test, not counting an entire item that is found to be flawed in an operational administration may be tantamount to aborting that administration. In this paper we examine the efficacy of the alternative of continuous item weighting. This alternative is illustrated on data from the 1988 administration of the College Board's Advanced Placement History Test.