Machine Learning Meetings and Events
Group Meetings: Group meetings are held Mondays from 11am to Noon (talk starts 11:10am) in D.L. Pratt 290C unless otherwise noted. Meetings are coordinated by Hugo Larochelle.
Tea Talks: Tea talks are held every Wednesday at 4:00pm in D.L. Pratt 290C. Talks should be simple, accessible, and not exceed 15 minutes. Speakers bring snacks, make tea, and provide a copy of the presented paper.
Group Meeting Apr 6, 2009: On Non-Metric and Parametric Variants of t-SNE
- Speaker: Laurens van der Maaten (TiCC/Tilburg)
- Abstract:
Recently, van der Maaten and Hinton introduced a new dimensionality reduction technique, called t-SNE. t-SNE converts distances into probabilities in both the data and the latent space, and minimizes the natural divergence between the resulting probability distributions in both spaces. To correct for the exponential difference in volume between the data and the latent space, which causes a crowding problem, t-SNE employs a heavy-tailed distribution in the latent space and a Gaussian distribution in the data space. Experiments have shown that t-SNE is particularly well suited for the visualization of high- dimensional data, and that it significantly outperforms alternative techniques in various visualization tasks.
The talk addresses solutions to two main shortcomings of t-SNE: (1) the metric nature of the latent space and (2) the lack of a parametric mapping from the data space to the latent space and vice versa. Both problems and their respective solutions are outlined below.
(1) The metric nature of t-SNE prevents it from successfully modeling non-metric similarities, such as semantic relations, co-occurrence data, and association data. Non-metric similarities can, however, be modeled by a variant of t-SNE that constructs multiple complementary maps. This is very different from simply embedding in, say, ten dimensions, and then showing five two-dimensional maps. The multiple-map variant of t-SNE can successfully visualize, e.g., word association data and NIPS co-authorships. The talk also discusses some problematic properties of the multiple-map variant of t-SNE from the viewpoint of finding clusters of similar datapoints (or finding ‘topics’).
(2) The lack of a parametric mapping between the data space and the latent space (and vice versa) is problematic if one wants to generalize to unseen data (in either the data or the latent space). The talk discusses two approaches to parametrize the functions between the two spaces:
- The mapping between the data space and the latent space can be parametrized, e.g., by means of a deep network. When t-SNE is combined with recent advances in the training of deep networks, state-of-the-art results can be obtained on unseen test data. The experiments with this parametric variant of t-SNE also provide more insight into the nature of the crowding problem.
- The mapping between the latent space and the data space can be parametrized, e.g., with the help of the GPLVM. The GPLVM is a non-linear variant of probabilistic PCA that was proposed by Lawrence. It is possible to define a Gibbs prior over the latent space of the GPLVM. The energy function of the prior is formed by the t-SNE objective function. The introduction of this prior significantly improves the performance of the GPLVM. One can view upon the resulting model as a variant of t-SNE with a (probabilistic) mapping from the latent space to the data space.
References:
- L.J.P. van der Maaten and G.E. Hinton. Visualizing High-Dimensional Data Using t-SNE. Journal of Machine Learning Research 9(Nov): 2579–2605, 2008
- L.J.P. van der Maaten. Learning a Parametric Embedding by Preserving Local Structure. To appear in Artificial Intelligence and Statistics, 2009.
- L.J.P. van der Maaten. Preserving Local Structure in Gaussian Process Latent Variable Models. To appear in Proceedings of Benelearn, 2009.