posted on 2019-06-21, 10:47authored byX Zhang, X Wang, X Tang, H Zhou, C Li
Image captioning generates a semantic description of an image. It deals with image understanding and text mining, which has made great progress in recent years. However, it is still a great challenge to bridge the “semantic gap” between low-level features and high-level semantics in remote sensing images, in spite of the improvement of image resolutions. In this paper, we present a new model with an attribute attention mechanism for the description generation of remote sensing images. Therefore, we have explored the impact of the attributes extracted from remote sensing images on the attention mechanism. The results of our experiments demonstrate the validity of our proposed model. The proposed method obtains six higher scores and one slightly lower, compared against several state of the art techniques, on the Sydney Dataset and Remote Sensing Image Caption Dataset (RSICD), and receives all seven higher scores on the UCM Dataset for remote sensing image captioning, indicating that the proposed framework achieves robust performance for semantic description in high-resolution remote sensing images.
Funding
This research was funded by the National Natural Science Foundation of China under Grant 61772400,
Grant 61801351, Grant 61501353, Grant 61772399, and Grant 61573267. H. Zhou was supported by UK EPSRC
under Grant EP/N011074/1, Royal Society Newton Advanced Fellowship under Grant NA160342, and European
Union’s Horizon 2020 research and innovation program under the Marie-Sklodowska-Curie grant agreement no.
720325. The APC was funded by the National Natural Science Foundation of China under Grant 61772400, Grant
61501353, Grant 61772399, and Grant 61573267.
History
Citation
Remote Sensing, 2019, 11(6), 612
Author affiliation
/Organisation/COLLEGE OF SCIENCE AND ENGINEERING/Department of Informatics