Linear discriminant analysis
From Freepedia
Linear discriminant analysis (LDA), is sometimes referred to as Fisher's linear discriminant (although Fisher's original article The Use of Multiple Measures in Taxonomic Problems (1936) actually describes a slightly different discriminant, which does not make some of the assumptions of LDA such as normally distributed classes or equal class covariances). It is typically used as a feature extraction step before classification.
LDA is used for two-class classification, where a vector of observations x (also referred to as features or measurements) is known for each sample, and it is required to predict a binary random class variable y. To solve this problem, it is assumed that the densities <math>p(\vec x|c=1)</math> and <math>p(\vec x|c=0)</math> are both normally distributed, with identical full-rank covariances <math>\Sigma_{c=0} = \Sigma_{c=1} = \Sigma</math>, but different means. Under these assumptions, a sufficient statistic for <math>P(c|\vec x)</math> is given by <math>\vec x \cdot \vec w</math>
- <math>\vec w = \Sigma^{-1} (\vec \mu_1 - \vec \mu_0)</math>
That is, the probability of an input x being in a class c is purely a function of this linear combination of the known observations.
When the assumptions of LDA are satisfied, the above dot product is equivalent to Fisher's linear discriminant. This means that it is the one-dimensional projection which maximizes the distance between the projected means to the variance of the projected distributions. Thus, in some sense, this projection maximizes the signal-to-noise ratio.
In practice, the class means and covariances are not known. They can, however, be estimated by assuming that the two densities <math>p(\vec x|c=1)</math> and <math>p(\vec x|c=0)</math> have different means and shared covariance, and then using maximum likelihood estimation or finding the maximum a posteriori estimate, which may then be used in the above equations.
LDA can be generalized to multiple discriminant analysis, where c becomes a categorical variable with N possible states, instead of only two. Analogously, if the class-conditional densities <math>p(\vec x|c=i)</math> are normal with shared covariances, the sufficient statistic for <math>P(c|\vec x)</math> are the values of N projections, which are the subspace spanned by the N means, affine projected by the inverse covariance matrix. These projections can be found by solving a generalized eigenvalue problem, where the numerator is the covariance matrix formed by treating the means as the samples, and the denominator is the shared covariance matrix.
See also
- Decision tree
- Data mining
- Linear classifier
- Logit (for logistic regression)
- Machine learning
- Perceptron
- Statistics
References
- Pattern Classification (2nd ed.), R.O. Duda, P.E. Hart, D.H. Stork, Wiley Interscience, (2000). ISBN 0471056693
- Fisher, R.A. The Use of Multiple Measurements in Taxonomic Problems. Annals of Eugenics, 7: 179-188 (1936) pdf file



