posted on 2018-02-16, 09:09authored byXin Hong, Yan Huang, Wenjun Ma, Sriram Varadarajan, Paul Miller, Weiru Liu, Maria Jose Santofimia Romero, Jesus Martinez del Rincon, Huiyu Zhou
This paper presents a new framework for multi-subject event inference in surveillance video, where measurements produced by low-level vision analytics usually are noisy, incomplete or incorrect. Our goal is to infer the composite events undertaken by each subject from noise observations. To achieve this, we consider the temporal characteristics of event relations and propose a method to correctly associate the detected events with individual subjects. The Dempster–Shafer (DS) theory of belief functions is used to infer events of interest from the results of our vision analytics and to measure conflicts occurring during the event association. Our system is evaluated against a number of videos that present passenger behaviours on a public transport platform namely buses at different levels of complexity. The experimental results demonstrate that by reasoning with spatio-temporal correlations, the proposed method achieves a satisfying performance when associating atomic events and recognising composite events involving multiple subjects in dynamic environments.
Funding
This work has been in part supported by UK EPSRC under Grants EP/G034303/1 and EP/N508664/1. Dr. H. Zhou is also supported by UK EPSRC under Grant EP/N011074/1.
History
Citation
Computer Vision and Image Understanding, 2016, 144, pp. 276-297 (22)
Supplementary material associated with this article can be found, in the online version, at 10.1016/j.cviu.2015.10.017. This is open data under the CC BY license http://creativecommons.org/licenses/by/4.0/