Citation
Lee H, Seung D. Learning the parts of objects by nonnegative factorization. Nature. 1999 Oct;401:78891. DOI
10 Word Summary
Decomposing neural signals with NMF gives a physiologically explained representation.
Abstract
Is perception of the whole based on perception of its parts? There is psychological and physiological evidence for partsbased representations in the brain, and certain computational theories of object recognition rely on such representations. But little is known about how brains or computers might learn the parts of objects. Here we demonstrate an algorithm for nonnegative matrix factorization that is able to learn parts of faces and semantic features of text. This is in contrast to other methods, such as principal components analysis and vector quantization, that learn holistic, not partsbased, representations. Nonnegative matrix factorization is distinguished from the other methods by its use of nonnegativity constraints. These constraints lead to a partsbased representation because they allow only additive, not subtractive, combinations. When nonnegative matrix factorization is implemented as a neural network, partsbased representations emerge by virtue of two properties: the firing rates of neurons are never negative and synaptic strengths do not change sign.
Notes
 Also look at Lee and Seung  Algorithms for Nonnegative Matrix Factorization
 Vector quantization (VQ)
 Principle components analysis (PCA)
 Nonnegative matrix factorization (NMF)
 All recreate structure using the following decomposition
 Constraints on W and H
 VQ  H must be a unary vector. This means each new object must be represented entirely in W.
 PCA  Columns of W must be orthonormal and rows of H must be orthogonal. This allows for an "eigenvector" representation. Positive and negative combinations are allowed for complex reconstruction of data.
 NMF  Only constraint is that the elements of W and H must be positive. This results in a "sparse" coding of base features.
 VQ is building things up as a prototype as opposed to NMF which is building things up from components.
 PCA  looks kindof like a frequency decomposition. The first component looks like a mean, and the other components look like higher and higher frequency.
 NMF solves for the coefficients of W and H using the following update rules:
 Need to check and make sure that seed values for W and H do not have zeros, otherwise they will remain zero.
 These are optimized by maximizing the function:
 NMF in general models a set of visible variables V by using a set of "hidden" variables H factored into a set of W combinations.
 VQ  has oneandonlyone hidden variable H for each V
 PCA  allows combination of multiple hidden variables for each V, but allows for "unnatural" combinations due to negative valued bases vectors.
 NMF  similar to PCA, but constrains combinations to summation.
 Doesn't provide information about syntactic relationships
 Complex datasets might require hierarchical decomposition that is not explained by NMF
 NMF performs "learning" and "inference" simultaneously.
 Learning is the determination of bases from data.
 Inference is the setting of weights of the hidden variables
 NMF suits neurological data "physiologically" because:
 Synapses are either inhibatory or excitatory but do not change sign.
 Firing rates cannot be negative
