[ home ] [ people ] [ projects ] [ courses ] [ meetings ]


Deep Belief Nets


Monday, April 16th -- Ruslan Salakhutdinov


Abstract:
 

Recently, Hinton et al. derived a way to perform fast, greedy learning of deep belief networks (DBN) one layer at a time, with the top two layers forming an undirected bipartite graph (associate memory). The learning procedure consists of training a stack of Restricted Boltzmann Machines (RBM's) each having only one layer of latent (hidden) feature detectors. The important aspect of this layer-wise training procedure is that each extra layer increases a variational lower bound on the log probability of data. The greedy layer-by-layer training can be repeated several times to learn a deep, hierarchical model in which each layer of features captures strong high-order correlations between the activities of features in the layer below. We will discuss three ideas based on greedily learning a hierarchy of features: 1. Nonlinear Dimensionality Reduction: The DBN framework allows us to make nonlinear autoencoders work considerably better than widely used methods such as PCA, SVD, and LLE.

2. Learning Semantic Address Space (SAS) for Fast Document Retrieval: The DBN framework allows us to build a model that can learn to map documents into ``semantic'' binary codes. Using these codes as memory addresses, we can learn Semantic Address Space, so a document can be mapped to a memory address in such a way that a small hamming-ball around that memory address contains semantically similar documents. This representation allows to retrieve a short-list of semantically similar documents on very large document sets in time independent of the number of documents.

3. Learning Nonlinear Similarity Measure: The DBN framework can also be used to efficiently learn a nonlinear transformation from the input space to a low-dimensional feature space in which K-nearest neighbour classification performs well. This can be viewed as a nonlinear extension of NCA.

Time permits, I will briefly mention how RBM's can be successfully applied in the collaborative filtering domain, in particular to the Netflix data set.