posted on 2020-11-05, 13:54authored byYing Gao, Yanhai Gan, Lin Qi, Huiyu Zhou, Xinghui Dong, Junyu Dong
Similarity learning plays a fundamental role in the
fields of multimedia retrieval and pattern recognition. Prediction
of perceptual similarity is a challenging task as in most cases
we lack human labeled ground-truth data and robust models to
mimic human visual perception. Although in the literature, some
studies have been dedicated to similarity learning, they mainly
focus on the evaluation of whether or not two images are similar,
rather than prediction of perceptual similarity which is consistent
with human perception. Inspired by the human visual perception
mechanism, we here propose a novel framework in order to
predict perceptual similarity between two texture images. Our
proposed framework is built on the top of Convolutional Neural
Networks (CNNs). The proposed framework considers both
powerful features and perceptual characteristics of contours
extracted from the images. The similarity value is computed by
aggregating resemblances between the corresponding convolutional layer activations of the two texture maps. Experimental
results show that the predicted similarity values are consistent
with the human-perceived similarity data.
Funding
J. Dong is supported by National Key R&D Program of China (Grant
No. 2018AAA0100602) and National Natural Science Foundation of China
(NSFC) (Grant No. 41576011, U1706218 and 41927805). (Corresponding
author: Junyu Dong).
H. Zhou is was supported in part by the U.K. EPSRC under Grant
EP/N011074/1, Royal Society-Newton Advanced Fellowship under Grant
NA160342, and European Unions Horizon 2020 Research and Innovation
Program through the Marie-Sklodowska-Curie under Grant 720325.
L. Qi is supported by National Natural Science Foundation of China (NSFC)
(Grant No. 61501417).
History
Citation
IEEE Transactions on Circuits and Systems for Video Technology ( Volume: 30, Issue: 10, Oct. 2020)
Author affiliation
/Organisation/COLLEGE OF SCIENCE AND ENGINEERING/Department of Informatics
Version
AM (Accepted Manuscript)
Published in
IEEE Transactions on Circuits and Systems for Video Technology
Volume
30
Issue
10
Publisher
Institute of Electrical and Electronics Engineers (IEEE)
The file associated with this record is under embargo until publication, in accordance with the publisher's self-archiving policy. The full text may be available through the publisher links provided above.