posted on 2025-03-07, 10:14authored byJaehoon Cha, Jinhae Park, Samuel Pinilla, Kyle L Morris, Christopher S Allen, Mark WilkinsonMark Wilkinson, Jeyan Thiyagalingam
Abstract
Learning meaningful representations of images in scientific domains that are robust to variations in centroids and orientations remains an important challenge. Here we introduce centroid- and orientation-aware disentangling autoencoder (CODAE), an encoder–decoder-based neural network that learns meaningful content of objects in a latent space. Specifically, a combination of a translation- and rotation-equivariant encoder, Euler encoding and an image moment loss enables CODAE to extract features invariant to positions and orientations of objects of interest from randomly translated and rotated images. We evaluate this approach on several publicly available scientific datasets, including protein images from life sciences, four-dimensional scanning transmission electron microscopy data from material science and galaxy images from astronomy. The evaluation shows that CODAE learns centroids, orientations and their invariant features and outputs, as well as aligned reconstructions and the exact view reconstructions of the input images with high quality.
Funding
Blueprinting AI for Science at Exascale - Phase II (BASE-II)
Ada Lovelace Centre and Chungnam National University Research Grant, 2020
History
Citation
Cha, J., Park, J., Pinilla, S. et al. Discovering fully semantic representations via centroid- and orientation-aware feature learning. Nat Mach Intell 7, 307–314 (2025).
Author affiliation
College of Science & Engineering
Physics & Astronomy
We used the XYRCS and dSprites datasets that contain reliable ground-truth labels7,19. The XYRCS dataset is a simple yet effective synthetic dataset containing three shapes (a circle, a triangle and a rectangle) with varying x and y positions, orientations and colour information (specifically, the brightness). The dSprites dataset contains three shapes (a square, an ellipse and a heart) with varying x and y positions, orientations and scale information. In addition to these two synthetic datasets, we also use three real-world datasets from three different scientific domains, namely, EMPIAR-1002922 (from life sciences), graphene CBED pattern2 (from material science) and Galaxy-Zoo17 (from astronomy) datasets.