Abstract: The primary application of deep networks is to provide direct prediction of response variables, whether discrete or continuous, following a forward pass through the network. In some cases the values of certain well defined but unobserved latent variables are key to successful prediction. For example the scale or rotation of an object in an image.
With deep networks the typical solution is to provide extensive training sets where the expected range of values of the latent variables is well represented. Depending on the application, the network can then predict the value of the variable as part of the forward pass. This approach has a number of limitations, which we will discuss. In particular it does not generalize well to ranges of the latent variables not observed during training. We will show a number of examples, in both supervised and unsupervised learning, of how combining the deep network architecture with online optimization over the unobserved latent variables yields improved performance with smaller data sets, at the cost of more intensive computation.