University of Leicester
AMIL(revise 8)-Pourya.pdf (2.45 MB)

AMIL: Adversarial Multi-instance Learning for Human Pose Estimation

Download (2.45 MB)
journal contribution
posted on 2019-09-19, 10:47 authored by Pourya Shamsolmoali, Masoumeh Zareapoor, Huiyu Zhou, Jie Yang
Human pose estimation has an important impact on a wide range of applications, from human-computer interface to surveillance and content-based video retrieval. For human pose estimation, joint obstructions and overlapping upon human bodies result in departed pose estimation. To address these problems, by integrating priors of the structure of human bodies, we present a novel structure-aware network to discreetly consider such priors during the training of the network. Typically, learning such constraints is a challenging task. Instead, we propose generative adversarial networks as our learning model in which we design two residual Multiple-Instance Learning (MIL) models with identical architecture—one is used as the generator, and the other one is used as the discriminator. The discriminator task is to distinguish the actual poses from the fake ones. If the pose generator generates results that the discriminator is not able to distinguish from the real ones, then the model has successfully learned the priors. In the proposed model, the discriminator differentiates the ground-truth heatmaps from the generated ones, and later the adversarial loss back-propagates to the generator. Such procedure assists the generator to learn reasonable body configurations and is proved to be advantageous to improve the pose estimation accuracy. Meanwhile, we propose a novel function for MIL. It is an adjustable structure for both instance selection and modeling to appropriately pass the information between instances in a single bag. In the proposed residual MIL neural network, the pooling action adequately updates the instance contribution to its bag. The proposed adversarial residual multi-instance neural network that is based on pooling has been validated on two datasets for the human pose estimation task and successfully outperforms the other state-of-the-art models. The code will be made available on


This research is partly supported by NSFC, China (No: 61876107,U1803261)and 973 Plan,China (No. 2015CB856004). H. Zhou was supported by UK EPSRC under Grant EP/N011074/1, Royal Society-Newton Advanced Fellowship under Grant NA160342, and European Union’s Horizon 2020 research and innovation program under the Marie-Sklodowska-Curie grant agreement No 720325.



ACM Transactions on Multimedia Computing, Communications, and Applications, April 2020, 16 (15),

Author affiliation

/Organisation/COLLEGE OF SCIENCE AND ENGINEERING/Department of Informatics


  • AM (Accepted Manuscript)

Published in

ACM Transactions on Multimedia Computing






Association for Computing Machinery (ACM)



Acceptance date


Copyright date


Available date



The file associated with this record is under embargo until publication, in accordance with the publisher's self-archiving policy. The full text may be available through the publisher links provided above.

Associated authors

hai, China Zhou,



Usage metrics

    University of Leicester Publications