A Rationale for an Asymptotic Lognormal Form of Word-Frequency Distributions

Author(s):: Carroll, John B.
Publication Year:: 1969
Report Number:: RB-69-90
Source:: ETS Research Bulletin
Document Type:: Report
Page Count:: 97
Subject/Key Words:: National Institute for Child Health and Human Development (NICHD), Computational Linguistics, Mathematical Models, Probability, Psycholinguistics

Abstract

The lognormal distribution has been found to fit word-frequency distributions satisfactorily if account is taken of the relations between populations and samples. A rationale for an asymptotic lognormal distribution is derived by supposing that the probabilities at the nodes of decision trees are symmetrically distributed around .5 with a certain variance. By the central limit theorem, the logarithms of the continued products of probabilities randomly sampled from such a distribution would have an asymptotically normal distribution. Two mathematical models incorporating this notion are developed and tested; in one, the number of factors in the continued products is assumed to be fixed, while in the other, that number is dependent upon a Poisson distribution. Psycholinguistic processes corresponding to these models are postulated and illustrated with reference to two sets of data: (1) word associations to the stimulus LIGHT, and (2) the Lorge Magazine Count. Reasonable fits to observed data or to underlying lognormal distributions are obtained but there remain certain problems in estimating parameters.

Request Copy (specify title and report number, if any)
http://dx.doi.org/10.1002/j.2333-8504.1969.tb00769.x

A Rationale for an Asymptotic Lognormal Form of Word-Frequency Distributions

Abstract

Read More