Principal components analysis is a method in which original data is transformed into a new set of data which may better capture the essential information. Often some variables are highly correlated such that the information contained in one variable is largely a duplication of the information contained in another variable. Instead of throwing away the redundant data principal components analysis condenses the information in intercorrelated variables into a few variables, called principal components.
Principal components analysis is a special case of transforming the original data into a new coordinate system. If the original data involves n different variables then each observation may be considered a point in an n-dimensional vector space. The change of coordinate system for a two dimensional space is shown below.
In the above diagram the original data are given terms of X1 and X2 values which are plotted with respect to the X1,X2 axes. The axes for the new coordinate system are shown as red lines. The origin of the new coordinate system at the intersection of the red lines is different from the origin of the old coordinate system. The coordinates of a point P, in the new coordinate system are measured along the X1' and X2' axes.
The determination of the coordinates of point P with respect tot he new axes can determined by geometry. Let (X1c,X2c) be the coordinates of the origin of the new system as measured in the old system. The diamgram show below shows the derivation of the new coordinates in terms of the origin and the angle of the new axes with respect to the old axes.
The net result of the derivation is that the new coordinates can be computed by equations that take a simple form. There is simpler way to express the relationship between the new coordinates and the old that makes use of what is called the dot product of vectors.
When the data is plotted as points in an n-dimensional vector space they often fall roughly within an ellipsoid. For the two dimensional case the situation might be as shown below.
For this case the new coordinate system that would be appropriate might be the one in which the origin is at the mean values for all the observations. The axes for the new coordinate system would be the major and minor axes of the data ellipse.
In the new coordinate system there would be negative values for coordinates which is inappropriate for remote sensing work. This is handled by shifting the origin to a point given by the minimum values of the new coordinates.
While the plot of the data may be roughly an ellipsoid it also might not be. The work of Kauth and Thomas argues that the data plot resembles a fuzzy tasseled cap rather than an ellipsoid. Furthermore there is the problem of determinining the ellipsoid if one does exist. There is needed a more general method for determining the axes of the new coordinate system. The general method involves the covariances between the variables. This amounts to the intercorrelations among the original variables. The process for computing the covariances is described below.
Once the covariance matrix has been determined the next question is how are the new coordinate axes to determined from it. The answer is that there are certain vectors, called characteristic vectors or eigen vectors, that give the direction of the new axes.
The vector X is said to be transformed by M into the vector Y. A vector X that is transformed into a multiple of itself is special; i.e., if the transformed vector Y is equal to X with every component multiplied the same constant, say λ. Such a vector X is said to be an eigenvector of the matrix M and the constant λ is said to be an eigenvalue of the matrix. (Eigen is a German word meaning self or characteristic.) The equation defining eigenvectors and eigenvalues is:
The computational method for finding the eigenvalues and then the eigenvectors are not of great concern here. It suffices to say that the eigenvalues are solutions to an n-th degree polynomial equation. the details are given in the appendix.
For an n-by-n matrix there will be n eigenvalues and n orthogonal (perpendicular) eigenvectors. The eigenvectors correspond to the axes of the n-dimensional ellipsoid that best fits the data. The eigenvectors can be normalized to unit length.
The eigenvalues given the amount of the variation (sum of squared deviations from their means) in the original data that is explained by the variation along the corresponding axis (as represented by the corresponding eigenvector). Thus the sum of the eigenvalues is equal to the total variation in the original variables. The components in principal component analysis are labeled according to the size of the corresponding eigenvalue. In practice the first component in remote sensing explains the lion's share of the variation. The second and third components explain much smaller shares and those beyond the third generally account for trivially small shares of the variation. The higher components may be consider to be random variation or noise. This means that with small or no loss of information the n variables of the original data can be reduced to three or so principal components.
Assume the original data is given in the form of zij where the index i stands for the variable number and j for the observation number. For simplicity it is assumed that the mean value of each variable is zero; i.e., the variables are reduced to deviations from their means. For each observation j there is a value pj, called the principal component, which is used with a set of coefficients xi to approximate the j-th observation of the i-th variable zij as xipj. The values of the xi's and the pj are to be chosen so as to minimize the sum of the squared deviations between the actual data values and the approximations based upon the principal component; i.e.,
In order for the minimization problem to have a solution it is necessary to impose a constraint on the values of the xi's. The most convenient form of the constraint is that
The minimization problem is to find values for the xi's and pj's such that
The solution to this constrained minimization problem is found by The Lagrangian multiplier method; i.e.,
The first order conditions for a minimum with respect to the pj)'s are:
These conditions reduce to:
The problem now is find a way of determining the coefficients xi's. The first order condition for a minimim of the Lagrangian with respect to these variables are:
The mathematics of the principal component analysis is expressed more succinctly in terms of matrix operations. The basics of the matrix approach are given below.
Let the data be represented as a matrix Z with n rows and t columns where n is the number of variables and t is the number of observations.
The matrix of approximations of the data based upon the vector of coefficeints X and the vector of the principal component P is obtained by multiplying the nx1 vector X times the 1xt vector which is the transpose of the vector P; i.e.,
The matrix of deviations of the actual data from the values based upon X and P is
The sum of the squared deviations can be obtained in terms of matrix operations by multiplying deviations matrix times the transpose of the deviations matrix itself; i.e.,
The constraint on the choice of X is that
The first order condition for a minimimum with respect to the elements of P are, in matrix form,
The first order condition for a minimimum with respect to the elements of X are, in matrix form,
The equation defining eigenvalues and eigenvectors can be expressed as:
In general there are n eigenvalues and n orthonormal eigenvectors. The eigenvalues correspond to the amount of variation in the variables that is explained by a component so the first component should be the one utilized in computing the first principal component. The second largest eigenvalue identifies the eigenvector to be used in computing the second principal component and so forth. The proportion of the variation explained by a component is just its eigenvalue divided by the sum of the eigenvalues.