Contrastive Hashing with Vision Transformer for Image Retrieval
Hashing techniques have attracted considerable attention owing to their advantages of efficient computation and economical storage. However, it is still a challenging problem to generate more compact binary codes for promising performance. In this paper, we propose a novel contrastive vision transformer hashing method, which seamlessly integrates contrastive learning and vision transformers (ViTs) with hash technology into a well-designed model to learn informative features and compact binary codes simultaneously. First, we modify the basic contrastive learning framework by designing several hash layers to meet the specific requirement of hash learning. In our hash network, ViTs are applied as backbones for feature learning, which is rarely performed in existing hash learning methods. Then, we design a multiobjective loss function, in which contrastive loss explores discriminative features by maximizing agreement between different augmented views from the same image, similarity preservation loss performs pairwise semantic preservation to enhance the representative capabilities of hash codes, and quantization loss controls the quantitative error. Hence, we can facilitate end-to-end joint training to improve the retrieval performance. The encouraging experimental results on three widely used benchmark databases demonstrate the superiority of our algorithm compared with several state-of-the-art hashing algorithms.
Funding
Natural Science Foundation of Shandong Province. Grant Numbers: ZR2020LZH008, ZR2021MF118, ZR2019MF071
Key Technology Research and Development Program of Shandong. Grant Numbers: 2021CXGC010506, 2021SFGC0104
History
Author affiliation
School of Computing and Mathematical Sciences, University of LeicesterVersion
- AM (Accepted Manuscript)