As more assessment programs move towards the use of constructed response and performance assessment items, the use of human raters to score these items necessarily will increase. Different designs can be created to assign raters to score examinee responses, and in this study some of these designs are evaluated in terms of their impact on the accuracy of examinee ability estimation. As expected, the optimum design where every rater judges every performance by every examinee results in ability estimates with minimum bias and small standard error. As this design is rarely, if ever, practical in real assessment situations, the results for nested and spiral rater designs are of more interest to practitioners. The nested rater designs result in biased ability estimates for the examinees judged by the most extreme raters, but the spiral rater designs examined prove surprisingly robust to rater tendencies.