In mathematics, and in particular linear algebra, the pseudoinverse A+ of an m × n matrix A is a generalization of the inverse matrix.[1] More precisely, this article is about the Moore–Penrose pseudoinverse, which was independently described by E. H. Moore[2] in 1920 and Roger Penrose[3] in 1955. Earlier, Fredholm had introduced the concept of a pseudoinverse of integral operators in 1903. The term generalized inverse is sometimes used as a synonym for pseudoinverse.
A common use of the pseudoinverse is to compute a 'best fit' (least squares) solution to a system of linear equations that lacks a unique solution (see below under Applications). The pseudoinverse is defined and unique for all matrices whose entries are real or complex numbers. It can be computed using the singular value decomposition.
Contents |
Definition
The pseudoinverse A+ of an m-by-n matrix A (whose entries can be real or complex numbers) is defined as the unique n-by-m matrix satisfying all of the following four criteria:[3][4]
(AA+ need not be the general identity matrix, but it maps all column vectors of A to themselves);
(A+ is a weak inverse for the multiplicative semigroup);
(AA+ is Hermitian); and
(A+A is also Hermitian).
Here M* is the Hermitian transpose (also called conjugate transpose) of a matrix M. For matrices whose elements are real numbers instead of complex numbers, M* = MT.
Properties
Proofs for some of these relations may be found on the proofs page.
- The pseudoinverse exists and is unique: for any matrix A, there is precisely one matrix
that satisfies the four properties of the definition.[4] If A has real entries, then so does
; if A has rational entries, then so does
. - If the matrix A is invertible, the pseudoinverse and the inverse coincide:
.[5]:243 - The pseudoinverse of a zero matrix is its transpose.
- The pseudoinverse of the pseudoinverse is the original matrix:
.[5]:245 - Pseudoinversion commutes with transposition, conjugation, and taking the conjugate transpose:[5]:245
- The pseudoinverse of a scalar multiple of A is the reciprocal multiple of
:
-
for
.
- If A and B are such that the product AB is defined and
- A has orthonormal columns (
, unitarity is a special case), or - B has orthonormal rows (
), or - A is of full column rank, and B is of full row rank,
- A has orthonormal columns (
- then
.
- The third case does not cover the first two; an orthonormal (including non-square) matrix must be of full rank, but otherwise there is no assumption made on the matrix it multiplies.
is the orthogonal projector onto the range of A (the space spanned by the column vectors of A);
is the orthogonal projector onto the range of
, which agrees with the orthogonal complement of the kernel (null space) of A;
is the orthogonal projector onto the kernel (null space) of A.[4]- The kernel of
is the orthogonal complement of the range of A; the image of
is the orthogonal complement of the kernel of A. - If the pseudoinverse of
is already known, it may be used to compute
:
-
.
- Likewise, if (AA * ) + is already known:
-
.
- The pseudoinverse can be computed via a limiting process:

- (see Tikhonov regularization). These limits exist even if
and
do not exist.[4]:263
- In contrast to ordinary matrix inversion, the process of taking pseudoinverses is not continuous: if the sequence (An) converges to the matrix A (in the maximum norm or Frobenius norm, say), then
need not converge to
.
Identities
The following identies can be used to cancel certain subexpressions or expand expressions involving pseudoinverses. Proofs for these properties can be found in the proofs subpage.
Special cases
Orthonormal columns or rows
If A has orthonormal columns (A * A = I) or orthonormal rows (AA * = I), then A + = A * .
Full rank
If the columns of A are linearly independent, then A * A is invertible. In this case, an explicit formula is:[1]
- A + = (A * A) − 1A * .
It follows that A + is a left inverse of A: A + A = I.
If the rows of A are linearly independent, then AA * is invertible. In this case, an explicit formula is:
- A + = A * (AA * ) − 1.
It follows that A + is a right inverse of A: AA + = I.
If both columns and rows are linearly independent (that is, for square regular matrices), the pseudoinverse is just the inverse:
- A + = A − 1.
Scalars and vectors
It is also possible to define a pseudoinverse for scalars and vectors. This amounts to treating these as matrices. The pseudoinverse of a scalar x is zero if x is zero and the reciprocal of x otherwise:
The pseudoinverse of the null vector is the transposed null vector. The pseudoinverse of a non-null vector is the conjugate transposed vector divided by its squared magnitude:
For a proof, simply check that these definitions meet the defining criteria for the pseudoinverse.
Circulant matrices
For a Circulant matrix C the singular value decomposition is given by the Fourier transform, that is the singular values are the Fourier coefficients. Let
be the Fourier matrix, then
Block matrices
Optimized approaches exist for calculating the pseudoinverse of block structured matrices.
Finding the pseudoinverse of a matrix
Using regular inverses
Let k be the rank of a
matrix A. Then A can be decomposed as A = BC, where B is a
-matrix and C is a
matrix. Then
- A + = C * (CC * ) − 1(B * B) − 1B * .
If A has full row rank, so that k = m, then B can be chosen to be the identity matrix and the formula reduces to A + = A * (AA * ) − 1. Similarly, if A has full column rank (that is, k = n), we have A + = (A * A) − 1A * .
The QR method
However, computing the product AA * or A * A or its inverse explicitly is unnecessary and only causes additional rounding errors and computational cost. Consider first the case when A is full column rank. Then all that is needed is the Cholesky decomposition
- A * A = R * R,
where R is an upper triangular matrix. Multiplication by the inverse is then done easily by solving a system with multiple right-hand-side,
by back substitution. To compute the Cholesky decomposition without forming A * A explicitly, use the QR decomposition of A, A = QR where Q has orthonormal columns, Q * Q = I, and R is upper triangular. Then
,
so R is the Cholesky factor of A * A. The case of full row rank is obtained by swapping A and A * .
The general case and the SVD method
A computationally simpler and more accurate way to get the pseudoinverse is by using the singular value decomposition.[1][4][7] If A = UΣV * is the singular value decomposition of A, then A + = VΣ + U * . For a diagonal matrix such as Σ, we get the pseudoinverse by first transposing the matrix, and then taking the reciprocal of each non-zero element on the diagonal, and leaving the zeros in place. In numerical computation, only elements larger than some small tolerance are taken to be nonzero, and the others are replaced by zeros. For example, in the MATLAB function pinv, the tolerance is taken to be t = ε•max(m,n)•max(Σ), where ε is the machine epsilon.
The cost of this method is dominated by the cost of computing the SVD, which is several times higher than matrix-matrix multiplication, if a state-of-the art implementation (such as that of LAPACK) is used.
The above procedure shows why taking the pseudoinverse is not a continuous operation: if the original matrix A has a singular value 0 (a diagonal entry of the matrix Σ above), then modifying A slightly may turn this zero into a tiny positive number, thereby affecting the pseudoinverse dramatically as we now have to take the reciprocal of a tiny number.
The iterative method of Ben-Israel and Cohen
Another method for computing the pseudoinverse uses the recursion
which is sometimes referred to as hyper-power sequence. This recursion produces a sequence converging quadratically to the pseudoinverse of A if it is started with an appropriate A0 satisfying A0A = (A0A) * . The choice A0 = αA * (where 0 < α < 2 / σ1(A), with σ1(A) denoting the largest singular value of A) [8] has been argued not to be competitive to the method using the SVD mentioned above, because even for moderately ill-conditioned matrices it takes a long time before Ai enters the region of quadratic convergence [9]. However, if started with A0 already close to the Moore-Penrose pseudoinverse and A0A = (A0A) * , for example A0: = (A * A + δI) − 1A * , convergence is fast (quadratic).
Updating the pseudoinverse
For the cases where A has full row or column rank, and the inverse of the correlation matrix (AA * for A with full row rank or A * A for full column rank) is already known, the pseudoinverse for matrices related to A can be computed by applying the Sherman–Morrison–Woodbury formula to update the inverse of the correlation matrix, which may need less work. In particular, if the related matrix differs from the original one by only a changed, added or deleted row or column, incremental algorithms[10] exist that exploit the relationship.
Similarly, it is possible to update the Cholesky factor when a row or column is added, without creating the inverse of the correlation matrix explicitly. However, updating the pseudoinverse in the general rank-deficient case is much more complicated.[11][12]
Software libraries
High quality implementations of SVD, QR, and back substitution are available in standard libraries, such as LAPACK. Writing one's own implementation of SVD is a major programming project that requires a significant numerical expertise. In special circumstances, such as parallel computing or embedded computing, however, alternative implementations by QR or even the use of an explicit inverse might be preferable, and custom implementations may be unavoidable.
Applications
The pseudoinverse provides a least squares solution to a system of linear equations.[13] Given a system of linear equations
in general we cannot always expect to find a vector x which will solve the system (i.e. for an over-determined system); or if one exists, it may not be unique (i.e. for an under-determined system). For both the under- and over-determined case, the pseudoinverse gives the "least-squares" answer. For the over-determined case, we may want a vector x that brings Ax "as close as possible" to b, i.e. a vector that minimizes the Euclidean norm
If A has full column-rank (a common assumption), there is only one solution to this problem, and it is given by the pseudoinverse:
This description suggests the following geometric construction of the pseudoinverse of an m×n matrix A. To find A + b for given b in Rm, first project b orthogonally onto the range of A, finding a point p(b) in the range. Then form A-1({p(b)}), i.e. find those vectors in Rn that A sends to p(b). This will be an affine subspace of Rn parallel to the kernel of A. The element of this subspace that has the smallest length (i.e. is closest to the origin) is the answer A + b we are looking for. It can be found by taking an arbitrary member of A-1({p(b)}) and projecting it orthogonally onto the orthogonal complement of the kernel of A.
For an under-determined system, there are usually many solutions x to Ax = b. In this case, the pseudoinverse solution x = A + b is the solution that has smallest Euclidean norm
among all solutions (see proofs subpage). Note that "least-squares" refers to the norm of x in this case, whereas it refers to the norm of the residual Ax − b in the over-determined case.
Using the pseudoinverse and a matrix norm, one can define a condition number for any matrix:
A large condition number implies that the problem of finding least-squares solutions to the corresponding system of linear equations is ill-conditioned in the sense that small errors in the entries of A can lead to huge errors in the entries of the solution.[14]
Generalizations
In order to solve more general least-squares problems, one could try to define Moore–Penrose pseudoinverses for all continuous linear operator A : H1 → H2 between two Hilbert spaces H1 and H2, using the same four conditions as in our definition above. It turns out that not every continuous linear operator has a continuous linear pseudo-inverse in this sense.[14] Those that do are precisely the ones whose range is closed in H2.
In abstract algebra, a Moore–Penrose pseudoinverse may be defined on a *-regular semigroup. This abstract definition coincides with the one in linear algebra.
See also
References
- ^ a b c Ben-Israel, Adi; Thomas N.E. Greville (2003). Generalized Inverses. Springer-Verlag. ISBN 0-387-00293-6.
- ^ Moore, E. H. (1920). "On the reciprocal of the general algebraic matrix". Bulletin of the American Mathematical Society 26: 394–395.
- ^ a b Penrose, Roger (1955). "A generalized inverse for matrices". Proceedings of the Cambridge Philosophical Society 51: 406–413.
- ^ a b c d e Golub, Gene H.; Charles F. Van Loan (1996). Matrix computations (3rd edition ed.). Baltimore: Johns Hopkins. pp. 257–258. ISBN 0-8018-5414-8.
- ^ a b c Stoer, Josef; Bulirsch, Roland (2002), Introduction to Numerical Analysis (3rd ed.), Berlin, New York: Springer-Verlag, ISBN 978-0-387-95452-3.
- ^ Stallings, W. T. (1972). "The Pseudoinverse of an r-Circulant Matrix". Proceedings of the American Mathematical Society 34: 385–388. doi:.
- ^ Linear Systems & Pseudo-Inverse
- ^ Ben-Israel, Adi; Cohen, Dan (1966). "On Iterative Computation of Generalized Inverses and Associated Projections". SIAM Journal on Numerical Analysis 3: 410–419. http://www.jstor.org/stable/2949637.
- ^ Söderström, Torsten; Stewart, G. W. (1974). "On the Numerical Properties of an Iterative Method for Computing the Moore- Penrose Generalized Inverse". SIAM Journal on Numerical Analysis 11: 61–74. http://www.jstor.org/stable/2156431.
- ^ Tino Gramß (1992) (printed dissertation). Worterkennung mit einem künstlichen neuronalen Netzwerk. Georg-August-Universität zu Göttingen.
- ^ Meyer, Carl D., Jr. Generalized inverses and ranks of block matrices. SIAM J. Appl. Math. 25 (1973), 597—602
- ^ Meyer, Carl D., Jr. Generalized inversion of modified matrices. SIAM J. Appl. Math. 24 (1973), 315—323
- ^ Penrose, Roger (1956). "On best approximate solution of linear matrix equations". Proceedings of the Cambridge Philosophical Society 52: 17–19.
- ^ a b Roland Hagen, Steffen Roch, Bernd Silbermann. C*-algebras and Numerical Analysis, CRC Press, 2001. Section 2.1.2.
External links
This entry is from Wikipedia, the leading user-contributed encyclopedia. It may not have been reviewed by professional editors (see full disclaimer)
















