Principal Componenet Analysis in Remote Sensing

SAN JOSÉ STATE UNIVERSITY
ECONOMICS DEPARTMENT
Thayer Watkins

Principal Component Analysis
in Remote Sensing

Principal components analysis is a method in which original data is transformed into a new set of data which may better capture the essential information. Often some variables are highly correlated such that the information contained in one variable is largely a duplication of the information contained in another variable. Instead of throwing away the redundant data principal components analysis condenses the information in intercorrelated variables into a few variables, called principal components.

Principal components analysis is a special case of transforming the original data into a new coordinate system. If the original data involves n different variables then each observation may be considered a point in an n-dimensional vector space. The change of coordinate system for a two dimensional space is shown below.

Change of Coordinate System

In the above diagram the original data are given terms of X1 and X2 values which are plotted with respect to the X1,X2 axes. The axes for the new coordinate system are shown as red lines. The origin of the new coordinate system at the intersection of the red lines is different from the origin of the old coordinate system. The coordinates of a point P, in the new coordinate system are measured along the X1' and X2' axes.

The determination of the coordinates of point P with respect tot he new axes can determined by geometry. Let (X1c,X2c) be the coordinates of the origin of the new system as measured in the old system. The diamgram show below shows the derivation of the new coordinates in terms of the origin and the angle of the new axes with respect to the old axes.

The net result of the derivation is that the new coordinates can be computed by equations that take a simple form. There is simpler way to express the relationship between the new coordinates and the old that makes use of what is called the dot product of vectors.

The Data Ellipsoid

When the data is plotted as points in an n-dimensional vector space they often fall roughly within an ellipsoid. For the two dimensional case the situation might be as shown below.

For this case the new coordinate system that would be appropriate might be the one in which the origin is at the mean values for all the observations. The axes for the new coordinate system would be the major and minor axes of the data ellipse.

In the new coordinate system there would be negative values for coordinates which is inappropriate for remote sensing work. This is handled by shifting the origin to a point given by the minimum values of the new coordinates.

The General Case

While the plot of the data may be roughly an ellipsoid it also might not be. The work of Kauth and Thomas argues that the data plot resembles a fuzzy tasseled cap rather than an ellipsoid. Furthermore there is the problem of determinining the ellipsoid if one does exist. There is needed a more general method for determining the axes of the new coordinate system. The general method involves the covariances between the variables. This amounts to the intercorrelations among the original variables. The process for computing the covariances is described below.

Once the covariance matrix has been determined the next question is how are the new coordinate axes to determined from it. The answer is that there are certain vectors, called characteristic vectors or eigen vectors, that give the direction of the new axes.

The Eigenvectors and Eigenvalues of a Square Matrix

Let M be an n-by-n matrix. If X is an n-dimensional vector then one can multiply the n-by-n matrix M times the n-dimensional vector X and get another n-dimensional vector, say Y. This is expressed by the equation:

Y = MX

The vector X is said to be transformed by M into the vector Y. A vector X that is transformed into a multiple of itself is special; i.e., if the transformed vector Y is equal to X with every component multiplied the same constant, say λ. Such a vector X is said to be an eigenvector of the matrix M and the constant λ is said to be an eigenvalue of the matrix. (Eigen is a German word meaning self or characteristic.) The equation defining eigenvectors and eigenvalues is:

MX = λX

The computational method for finding the eigenvalues and then the eigenvectors are not of great concern here. It suffices to say that the eigenvalues are solutions to an n-th degree polynomial equation. the details are given in the appendix.

For an n-by-n matrix there will be n eigenvalues and n orthogonal (perpendicular) eigenvectors. The eigenvectors correspond to the axes of the n-dimensional ellipsoid that best fits the data. The eigenvectors can be normalized to unit length.

The eigenvalues given the amount of the variation (sum of squared deviations from their means) in the original data that is explained by the variation along the corresponding axis (as represented by the corresponding eigenvector). Thus the sum of the eigenvalues is equal to the total variation in the original variables. The components in principal component analysis are labeled according to the size of the corresponding eigenvalue. In practice the first component in remote sensing explains the lion's share of the variation. The second and third components explain much smaller shares and those beyond the third generally account for trivially small shares of the variation. The higher components may be consider to be random variation or noise. This means that with small or no loss of information the n variables of the original data can be reduced to three or so principal components.

Appendix A: Derivation of Principal Components Analysis

Assume the original data is given in the form of z_ij where the index i stands for the variable number and j for the observation number. For simplicity it is assumed that the mean value of each variable is zero; i.e., the variables are reduced to deviations from their means. For each observation j there is a value p_j, called the principal component, which is used with a set of coefficients x_i to approximate the j-th observation of the i-th variable z_ij as x_ip_j. The values of the x_i's and the p_j are to be chosen so as to minimize the sum of the squared deviations between the actual data values and the approximations based upon the principal component; i.e.,

S = Σ_iΣ_j(z_ij - x_ip_j)².

In order for the minimization problem to have a solution it is necessary to impose a constraint on the values of the x_i's. The most convenient form of the constraint is that

Σ_ix_i² = 1.

The minimization problem is to find values for the x_i's and p_j's such that

S = Σ_iΣ_j(z_ij - x_ip_j)² is minimized subject to the constraint that Σ_ix_i² = 1.

The solution to this constrained minimization problem is found by The Lagrangian multiplier method; i.e.,

minimize S = Σ_iΣ_j(z_ij - x_ip_j)² - μ(Σ_ix_i² - 1),

where μ is a parameter chosen at a value such that the constraint is satisfied.

The first order conditions for a minimum with respect to the p_j)'s are:

∂S/∂p_j) = Σ_i2(z_ij - x_ip_j)x_i = 0
for all j.

These conditions reduce to:

p_j = Σ_iz_ijx_i
since Σ_ix_i² = 1.

This says that the proper values for the principal component p_j is the dot product of the vedctor of the coefficients x_i's with the vector of the variables for observation j.

The problem now is find a way of determining the coefficients x_i's. The first order condition for a minimim of the Lagrangian with respect to these variables are:

∂S/∂x_i = Σ_j2(z_ij - x_ip_j)p_j -2μx_i = 0

for all i.

These conditions reduce to:

x_i = (Σ_jz_ijp_j)/(Σ_jp_j² + μ)

If (Σ_jp_j² + μ) is replaced by λ and each p_j is replaced by its value of Σ_kz_kjx_k the first order conditions for the x_i's are:

λx_i = Σ_jz_ij(Σ_kz_kjx_k)
= Σ_j(Σ_kz_ijz_kjx_k)
= Σ_jΣ_kz_ijz_kjx_k
= Σ_kΣ_jz_ijz_kjx_k

Thus the first order conditions for the x_i's can be put in the form

λx_i = Σ_k(Σ_jz_ijz_kj)x_k
= Σ_km_ikx_k
where m_ik = Σ_jz_ijz_kj
is proportional to the covariance between variables i and j.

The first order conditions for the coefficients x_i's indicate that the minimizing values for the x_i's are components of an eigenvector of the matrix M. If all elements of the matrix M are divided by the number of observations t the result is the covariance matrix which has the same eigenvectors as the matrix M. The eigenvalues are affected by the division by the number of observations t but in the simple relation that the eigenvalues of the covariance matrix are equal to the eigenvalues of the moment matrix M divided by t.

Appendix B: Matrix Algebra Derivation of Principal Components Analysis

The mathematics of the principal component analysis is expressed more succinctly in terms of matrix operations. The basics of the matrix approach are given below.

Let the data be represented as a matrix Z with n rows and t columns where n is the number of variables and t is the number of observations.

The matrix of approximations of the data based upon the vector of coefficeints X and the vector of the principal component P is obtained by multiplying the nx1 vector X times the 1xt vector which is the transpose of the vector P; i.e.,

XP^T

The matrix of deviations of the actual data from the values based upon X and P is

Z - XP^T

The sum of the squared deviations can be obtained in terms of matrix operations by multiplying deviations matrix times the transpose of the deviations matrix itself; i.e.,

(Z - XP^T)(Z - XP^T)^T

The result is an nxn matrix. The sum of the elements on the principal diagonal is the sum of the squared deviations for all of the variables.

The constraint on the choice of X is that

X^TX = 1

The first order condition for a minimimum with respect to the elements of P are, in matrix form,

P^T = X^TZ
or, equivalently,
P = Z^TX

The first order condition for a minimimum with respect to the elements of X are, in matrix form,

λX^T = ZP
or, equivalently, since P = Z^TX
λX = ZZ^TX
letting M = ZZ^T
λX = MX
Thus X is an eigenvector
of the matrix M

The equation defining eigenvalues and eigenvectors can be expressed as:

(M-λI)X = 0

where I is an identity matrix and the zero on the rightside represents a vector of zeroes. For this equation to have a nontrivial solution the determinant of the coefficient matrix (M-λI) must be zero. This is the condition determining the eigen values. It reduces to an n-th degree polynomial equation. Once an eigenvalue is known the system of equations can be solved for the components of X by setting one component equal to an arbitrary nonzero value and solving for the rest. The solution can be normalized to put it into any desired form.

In general there are n eigenvalues and n orthonormal eigenvectors. The eigenvalues correspond to the amount of variation in the variables that is explained by a component so the first component should be the one utilized in computing the first principal component. The second largest eigenvalue identifies the eigenvector to be used in computing the second principal component and so forth. The proportion of the variation explained by a component is just its eigenvalue divided by the sum of the eigenvalues.

Change of Coordinate System

The Data Ellipsoid

The General Case

The Eigenvectors and Eigenvalues of a Square Matrix

Y = MX

MX = λX

Appendix A: Derivation of Principal Components Analysis

S = ΣiΣj(zij - xipj)2.

Σixi2 = 1.

S = ΣiΣj(zij - xipj)2 is minimized subject to the constraint that Σixi2 = 1.

minimize S = ΣiΣj(zij - xipj)2 - μ(Σixi2 - 1),

∂S/∂pj) = Σi2(zij - xipj)xi = 0 for all j.

pj = Σizijxi since Σixi2 = 1.

∂S/∂xi = Σj2(zij - xipj)pj -2μxi = 0 for all i.

xi = (Σjzijpj)/(Σjpj2 + μ)

λxi = Σjzij(Σkzkjxk) = Σj(Σkzijzkjxk) = ΣjΣkzijzkjxk = ΣkΣjzijzkjxk

λxi = Σk(Σjzijzkj)xk = Σkmikxk where mik = Σjzijzkj is proportional to the covariance between variables i and j.