![]() |
Abstract:
We describe a probabilistic model for learning rich, distributed
representations of image transformations. The basic model is a gated
conditional random field that is trained to predict transformations of
its inputs using a factorial set of latent variables. Inference in the
model consists in extracting the transformation, given a pair of images,
and can be performed exactly and efficiently.
We show that, when trained on natural videos, the model develops domain
specific motion features, in the form of fields of locally transformed
edge filters. When trained on affine, or more general, transformations of
still images, the model develops codes for these transformations, and can
subsequently perform recognition tasks that are invariant under these
transformations. It can also fantasize new transformations on previously
unseen images. We describe several variations of the basic model and
provide experimental results that demonstrate its applicability on a
variety of tasks.