|
|
This article may require cleanup to meet Wikipedia's quality standards. (Consider using more specific cleanup instructions.) Please help improve this article if you can. The talk page may contain suggestions. (December 2010) |
| Notation | ![]() |
|---|---|
| Parameters | k0 ∈ N0 — the number of failures before the experiment is stopped, p ∈ Rm — m-vector of “success” probabilities, p0 = 1 − (p1+…+pm) — the probability of a “failure”. |
| Support | ![]() |
![]() where Γ(x) is the Gamma function. |
|
| Mean | ![]() |
| Variance | ![]() |
| CF | ![]() |
In probability theory and statistics, the negative multinomial distribution is a generalization of the negative binomial distribution (NB(r, p)) to more than two outcomes.[1]
Suppose we have an experiment that generates m+1≥2 possible outcomes, {X0,…,Xm}, each occurring with non-negative probabilities {p0,…,pm} respectively. If sampling proceeded until n observations were made, then {X0,…,Xm} would have been multinomially distributed. However, if the experiment is stopped once X0 reaches the predetermined value k0, then the distribution of the m-tuple {X1,…,Xm} is negative multinomial.
|
Contents
|
The table below shows the an example of 400 Melanoma (skin cancer) Patients where the Type and Site of the cancer are recorded for each subject.
| Type | Site | Totals | ||
| Head and Neck | Trunk | Extremities | ||
| Hutchinson's melanomic freckle | 22 | 2 | 10 | 34 |
| Superficial | 16 | 54 | 115 | 185 |
| Nodular | 19 | 33 | 73 | 125 |
| Indeterminant | 11 | 17 | 28 | 56 |
| Column Totals | 68 | 106 | 226 | 400 |
The sites (locations) of the cancer may be independent, but there may be positive dependencies of the type of cancer for a given location (site). For example, localized exposure to radiation implies that elevated level of one type of cancer (at a given location) may indicate higher level of another cancer type at the same location. The Negative Multinomial distribution may be used to model the sites cancer rates and help measure some of the cancer type dependencies within each location.
If
denote the cancer rates for each site (
) and each type of cancer (
), for a fixed site (
) the cancer rates are independent Negative Multinomial distributed random variables. That is, for each column index (site) the column-vector X has the following distribution:
.Different columns in the table (sites) are considered to be different instances of the random multinomially distributed vector, X. Then we have the following estimates of expected counts (frequencies of cancer):





For the first site (Head and Neck, j=0), suppose that
and
. Then:


![cov[X_1,X_3] = \frac{10 \times 0.2 \times 0.2}{0.5^2}=1.6](http://wpcontent.answcdn.com/wikipedia/en/math/3/8/5/3856250206cbbc14e0a5711c107d924f.png)


and therefore, ![corr[X_2,X_3] = \left (\frac{2 \times 4}{(10+2)(10+4)} \right )^{\frac{1}{2}} = 0.21821789023599242.](http://wpcontent.answcdn.com/wikipedia/en/math/9/3/6/936f6ecdb9cfbd2dc64f7c0d874ed47d.png)
Notice that the pair-wise NM correlations are always positive, whereas the correlations between multinomial counts are always negative. As the parameter
increases, the paired correlations tend to zero! Thus, for large
, the Negative Multinomial counts
behave as independent Poisson random variables with respect to their means
.
The marginal distribution of each of the
variables is negative binomial, as the
count (considered as success) is measured against all the other outcomes (failure). But jointly, the distribution of
is negative multinomial, i.e.,
.
) of each outcome (
) using maximum likelihood is possible. If we have a single observation vector
, then
If we have several observation vectors, like in this case we have the cancer type frequencies for 3 different sites, then the MLE estimates of the mean counts are
, where
is the cancer-type index and the summation is over the number of observed (sampled) vectors (I). For the cancer data above, we have the following MLE estimates for the expectations for the frequency counts:
) is
.
) is
.
) is
.
) is
.
parameter.[1][2] However, there are approximate protocols for estimating the
parameter using the chi-squared goodness of fit statistic. In the usual chi-squared statistic:
, we can replace the expected-means (
) by their estimates,
, and replace denominators by the corresponding negative multinomial variances. Then we get the following test statistic for negative multinomial distributed data:
.
parameter by varying the values of
in the expression
and matching the values of this statistic with the corresponding asymptotic chi-squared distribution. The following protocol summarizes these steps using the cancer data above.
) for the 4 different cancer types are:
;
; and
.
for the single variable of interest -- the unknown parameter
. In the cancer example, suppose
. Then, the solution is an asymptotic chi-squared distribution driven estimate of the parameter
.
.
Solving this equation for
provides the desired estimate for the last parameter.
) solutions to this equation: {50.5466, -21.5204, 2.40461}. Since
there are 2 candidate solutions.
and
, then:


, and
,
,
and
.
is 
Johnson, Norman L.; Kotz, Samuel; Balakrishnan, N. (1997). "Chapter 36: Negative Multinomial and Other Multinomial-Related Distributions". Discrete Multivariate Distributions. Wiley. ISBN 0-471-12844-9.
This entry is from Wikipedia, the leading user-contributed encyclopedia. It may not have been reviewed by professional editors (see full disclaimer)