VTAE: Variational transformer autoencoder with manifolds learning
Deep generative models have demonstrated success-ful applications in learning non-linear data distributions througha number of latent variables and these models use a non-linearfunction (generator) to map latent samples into the data space.On the other hand, the non-linearity of the generator impliesthat the latent space shows an unsatisfactory projection of thedata space, which results in poor representation learning. Thisweak projection, however, can be addressed by a Riemannianmetric, and we show that geodesics computation and accurateinterpolations between data samples on the Riemannian manifoldcan substantially improve the performance of deep generativemodels. In this paper, a Variational spatial-Transformer AutoEn-coder (VTAE) is proposed to minimize geodesics on a Riemannianmanifold and improve representation learning. In particular, wecarefully design the variational autoencoder with an encodedspatial-Transformer to explicitly expand the latent variable modelto data on a Riemannian manifold, and obtain global contextmodelling. Moreover, to have smooth and plausible interpo-lations while traversing between two different objects’ latentrepresentations, we propose a geodesic interpolation networkdifferent from the existing models that use linear interpolationwith inferior performance. Experiments on benchmarks showthat our proposed model can improve predictive accuracy andversatility over a range of computer vision tasks, including imageinterpolations, and reconstructions.
History
Author affiliation
School of Computing and Mathematical Sciences, University of LeicesterVersion
- AM (Accepted Manuscript)