Version 2 2023-12-12, 12:49Version 2 2023-12-12, 12:49
Version 1 2023-12-11, 17:34Version 1 2023-12-11, 17:34
journal contribution
posted on 2023-12-12, 12:49authored byDaqi Liu, Miroslaw Bober, Josef Kittler
<p>As a structured prediction task, scene graph generation aims to build a visually-grounded scene graph to explicitly model objects and their relationships in an input image. Currently, the mean field variational Bayesian framework is the de facto methodology used by the existing methods, in which the unconstrained inference step is often implemented by a message passing neural network. However, such formulation fails to explore other inference strategies, and largely ignores the more general constrained optimization models. In this paper, we present a constrained structure learning method, for which an explicit constrained variational inference objective is proposed. Instead of applying the ubiquitous message-passing strategy, a generic constrained optimization method - entropic mirror descent - is utilized to solve the constrained variational inference step. We validate the proposed generic model on various popular scene graph generation benchmarks and show that it outperforms the state-of-the-art methods.</p>
Funding
U.K. Defence Science and Technology Laboratory
Engineering and Physical Research Council (Grant Number: EP/R018456/1)
EPSRC (Grant Number: MVSE (EP/V002856/1) and JADE2 (EP/T022205/1))
History
Author affiliation
School of Computing and Mathematical Sciences, University of Leicester
Version
VoR (Version of Record)
Published in
IEEE Transactions on Pattern Analysis and Machine Intelligence
Volume
45
Issue
10
Publisher
Institute of Electrical and Electronics Engineers (IEEE)