The Mantel–Haenszel delta difference (MH D-DIF) and the standardized proportion difference (STD P-DIF) are two observed-score methods that have been used to assess differential item functioning (DIF) at Educational Testing Service since the early 1990s. Latent-variable approaches to assessing measurement invariance at the item level have been proposed and studied since then as well. Previous research showed that using the weighted sum score as the matching variable may close the gap between the MH D-DIF statistic based on observed scores and its counterpart based on latent ability. In this study, we show that weighting reduces the difference between the STD P-DIF statistic and its counterpart based on latent ability. In addition, we discuss the factors that influence the gap and examine approaches that may facilitate the use of weighted STD P-DIF.