Say, you trained K class-specific Restricted Boltzmann Machines and you would like to combine the K
RBMs for classification.
Unlike with, say, mixtures of Gaussians, you cannot simply use Bayes' rule,
because each RBM will have a different partition function.
As it turns, however, there is in fact a principled way to combine the RBMS for classification:
Just think of the set of K class-specific RBMs as a single conditional distribution p(inputs, hiddens|class).
Now compute p(class|inputs), integrating over the hiddens. The partition functions cancel and you can compute both the probability and the derivatives with respect to all the RBMs parameters in polynomial time.
This is the "gated softmax classifier" [pdf, NIPS2010].
An equivalent view of the model is a log-bilinear classifier (or bilinear logistic regression), whose
hiddens may be viewed "style" variables that capture within-class variability.
It can be shown that a model with K latent variables is exactly
the same as a mixture of 2^K logistic regression models with weight-sharing.
This makes it possible to train a mixture of about 100.000.000.000.000.000.000.000 linear
classifiers and apply it to test data in closed form. An implementation
of the model in Python is provided below.
The following two Python modules implement two versions of the model.
Both modules make use of GPUs via V. Mnih's cudamat package
The "factored" model, whose parameter tensor is represented by
This makes it possible to represent invariances using shared basis
functions as described in the paper.