Randomly equivalent forms (REF) of tests in listening and reading for nonnative speakers of English were created by stratified random assignment of items to forms, stratifying on item content and predicted difficulty. The study included 50 replications of the procedure for each test. Each replication generated 2 REFs. The equivalence of those 2 forms was evaluated by comparing the raw-score distributions focusing on the greatest difference in the cumulative distributions. For listening, 10 replications produced cumulative distributions that differed at some point by more than 0.10, and 4 replications produced differences greater than 0.15. For reading, only 3 replications produced differences greater than 0.10. The difference between the results for listening and reading reflects the greater variation, within strata, in the difficulty of the listening items. The REF procedure may become more effective if item difficulty cam be predicted more accurately.