Share on Facebook Share on Twitter Email
Answers.com

multiple regression model

 
Statistics Dictionary: multiple regression model

The extension of the linear regression model to the case where there is more than one explanatory variable (see regression). For the case of p X-variables and n observations the model is E(Yj)=β0+β1x1j+β2x2j+...+βpxpj, j=1, 2,..., n,where β0, β1,..., βp are unknown parameters. An equivalent presentation is Yj=β0+β1x1j+β2x2j +...+ βpxpj+εj, j=1, 2,..., n,where ε1, ε2,..., εn are random errors.

In practice the explanatory variables may be related as in the quadratic regression model




,
the cubic regression model



,
and the general polynomial regression model (also termed a curvilinear regression model)



.
In matrix terms the model is written as E(Y)=Xβ,where Y is the n×1 column vector of random variables, β is the (p+1)×1 column vector of unknown parameters, and X is the n×(p+1) design matrix given by



.
Equivalently,Y=Xβ+ε,where ε is an n×1 vector of random errors.

Usually it is assumed that the random errors, and hence the Y-variables, are independent and have common variance σ2. In this case, the ordinary least squares estimates of the β-parameters are obtained by solving the set of simultaneous equations (the normal equations) which, in matrix form, are written as X′Xβ̂=X′y,where X′ is the transpose of X and y is the n×1 column vector of observations (y1 y2...yn)′. The matrix X′X is a symmetric matrix. If it has an inverse then the solution is β̂=(X′X)−1X′y.The variance-covariance matrix of the estimators of the β-parameters is σ2(X′X)−1. The Gauss–Markov theorem shows that β̂ is the minimum variance linear unbiased estimate of β.

If the random errors are not independent or have unequal variances then ordinary least squares is inappropriate and weighted least squares may be appropriate. If the random errors are influenced by the X-variables, then a possible approach is to identify if possible W-variables that are highly correlated with the X-variables, but not with the errors. These W-variables are called instrumental variables. The subsequent analysis, commonly used in econometrics, uses two-stage least squares, in which the first stage involves the regression of the X-variables on the W-variables to obtain fitted values:X̂=W(W′W)−1W′X.These values are then used in place of the X-values in the regression for Y to give the estimate β22=(X̂′X̂)−1X̂′y.A perennial problem with multiple regression models is deciding which of the X-variables are really needed and which are redundant.

A related problem is that of collinearity (or multicollinearity). Suppose, for example, that the two explanatory variables Xj and Xk approximately satisfy the relation Xj=a+bXk. In this case a multiple regression model that involves both Xj and Xk will run into problems, since either variable would be nearly as effective on its own. The situation can arise with other numbers of X-variables: in all cases the result is that the matrix X′X is nearly singular. The equation X′Xβ̂=X′y is then said to be ill-conditioned. One suggestion is to replace the usual parameter estimate by β̂*=(X′X+kI)−1X′y,where k is a constant and I is an identity matrix. This technique is called ridge regression. A plot of the elements of β̂* against k is called a ridge trace and can be used to determine the appropriate value for k. The estimates of the β-parameters obtained using this method are biased but have smaller variance than the ordinary least squares estimates.

For methods of evaluating the fit of a multiple regression model, see regression diagnostics. See also AIC; Mallows Cp; stepwise procedure.



Search unanswered questions...
Enter a question here...
Search: All sources Community Q&A Reference topics
 
 

 

Copyrights:

Statistics Dictionary. A Dictionary of Statistics. Second edition revised. Copyright © Oxford University Press, 2008. All rights reserved.  Read more