Researchers teach neural networks to determine crowd emotions
Scholars from the Higher School Of Economics have developed an algorithm that detects emotions in a group of people on a low-quality video. The solution provides a final decision in just one hundredth of a second, which is faster than any other existing algorithms with similar accuracy. The results have been described in the paper 'Emotion Recognition of a Group of People in Video Analytics Using Deep Off-the-Shelf Image Embeddings.'
Analysing people's social behaviour with the use of images and videos is one of the most popular tasks for developers of smart man-machine interfaces. Researchers have achieved a rather high quality in group-level emotion recognition, but it remained impossible to implement this development on a mass scale. The problem was the requirement of most video systems for images containing face close-ups in good resolution. But ordinary cameras installed on the street or in a supermarket have low resolution and are mounted rather high, so that the typical facial regions in the gathered videos are very tiny.
Alexander Tarasov and Andrey Savchenko, researchers from HSE, have developed an algorithm that is comparable with the existing group-level emotion recognition techniques in terms of recognition accuracy (75.5%). At the same time, it requires only 5MB in the system memory, processes one image or video frame in just one hundredth of a second and can be used with low-quality video data.
The algorithm works in several stages. First, the image is processed with MTCNN neural network, which is traditionally used for detection of small faces. Then, the features are extracted from each face with a fully convolutional network, which was preliminarily trained to classify emotions of faces with very low resolution, no bigger than a profile picture on social media. The final decision on the emotion (negative, positive or neutral) of the whole group is made by an ensemble of known classifiers (random forest and support vector machines) applied to the weighted sum of feature vectors of all detected faces.
The novel development can potentially be used in various video surveillance systems. It can help detect changes in group emotions at a concert, football match, or a protest rally, which can help in preventing conflicts in a timely manner. Integrated in a supermarket surveillance system, it will detect consumers' emotional reaction to various promotions. Together with cameras recording a public speech, it can assess the audience's response.