| Probability mass function Yule–Simon PMF on a log-log scale. (Note that the function is only defined at integer values of k. The connecting lines do not indicate continuity.) |
|
| Cumulative distribution function Yule–Simon CMF. (Note that the function is only defined at integer values of k. The connecting lines do not indicate continuity.) |
|
| Parameters | shape (real) |
|---|---|
| Support | ![]() |
| Probability mass function (pmf) | ![]() |
| Cumulative distribution function (cdf) | ![]() |
| Mean | for ![]() |
| Median | |
| Mode | ![]() |
| Variance | for ![]() |
| Skewness | for ![]() |
| Excess kurtosis | for ![]() |
| Entropy | |
| Moment-generating function (mgf) | ![]() |
| Characteristic function | ![]() |
In probability and statistics, the Yule–Simon distribution is a discrete probability distribution named after Udny Yule and Herbert Simon. Simon originally called it the Yule distribution[1].
The probability mass function of the Yule–Simon (ρ) distribution is
for integer
and real ρ > 0, where B is the beta function. Equivalently the pmf can be written in terms of the falling factorial as
where Γ is the gamma function. Thus, if ρ is an integer,
The probability mass function f has the property that for sufficiently large k we have
This means that the tail of the Yule–Simon distribution is a realization of Zipf's law: f(k;ρ) can be used to model, for example, the relative frequency of the kth most frequent word in a large collection of text, which according to Zipf's law is inversely proportional to a (typically small) power of k.
Contents |
Occurrence
The Yule–Simon distribution arose originally as the limiting distribution of a particular stochastic process studied by Yule as a model for the distribution of biological taxa and subtaxa[2]. Simon dubbed this process the "Yule process" but it is more commonly known today as a preferential attachment process. The preferential attachment process is an urn process in which balls are added to a growing number of urns, each ball being allocated to an urn with probability linear in the number the urn already contains.
The distribution also arises as a continuous mixture of geometric distributions. Specifically, assume that W follows an exponential distribution with scale 1 / ρ or rate ρ:
Then a Yule–Simon distributed variable K has the following geometric distribution:
The pmf of a geometric distribution is
for
. The Yule–Simon pmf is then the following exponential-geometric mixture distribution:
Generalizations
The two-parameter generalization of the original Yule distribution replaces the beta function with an incomplete beta function. The probability mass function of the generalized Yule–Simon(ρ, α) distribution is defined as
with
. For α = 0 the ordinary Yule–Simon(ρ) distribution is obtained as a special case. The use of the incomplete beta function has the effect of introducing an exponential cutoff in the upper tail.
See also
Bibliography
- Colin Rose and Murray D. Smith, Mathematical Statistics with Mathematica. New York: Springer, 2002, ISBN 0-387-95234-9. (See page 107, where it is called the "Yule distribution".)
References
- ^ Simon, H. A. (1955). "On a class of skew distribution functions". Biometrika 42: 425–440.
- ^ Yule, G. U. (1925). "A Mathematical Theory of Evolution, based on the Conclusions of Dr. J. C. Willis, F.R.S.". Philosophical Transactions of the Royal Society of London, Ser. B 213: 21–87. doi:.
This entry is from Wikipedia, the leading user-contributed encyclopedia. It may not have been reviewed by professional editors (see full disclaimer)

shape (


for 

for 
for 
for 















