posted on 2020-07-16, 16:05authored byEffie Lai-Chong Law, Samaneh Soleimani, Dawn Watkins, Joanna Barwick
While voice communication of emotion has been researched for decades, the accuracy of automatic voice emotion recognition (AVER) is yet to improve. In particular, the intergenerational communication has been under-researched, as indicated by the lack of an emotion corpus on child–parent conversations. In this paper, we presented our work of applying Support-Vector Machines (SVMs), established machine learning models, to analyze 20 pairs of child–parent dialogues on everyday life scenarios. Among many issues facing the emerging work of AVER, we explored two critical ones: the methodological issue of optimising its performance against computational costs, and the conceptual issue on the state of emotionally neutral. We used the minimalistic/extended acoustic feature set extracted with OpenSMILE and a small/large set of annotated utterances for building models, and analyzed the prevalence of the class neutral. Results indicated that the bigger the combined sets, the better the training outcomes. Regardless, the classification models yielded modest average recall when applied to the child–parent data, indicating their low generalizability. Implications for improving AVER and its potential uses are drawn.
History
Citation
Behaviour and Information Technology, 2020, https://doi.org/10.1080/0144929X.2020.1741684