Automatic recognition of emotional subgroups in images

Summary: Researchers from Vrije Universiteit Amsterdam are advancing better ways to combine social group detection and group emotion recognition in images especially for use cases such as crowd surveillance or event analysis.

Challenge: Tracking individuals is not always the most efficient way of sensing emotions, especially in large crowds, where outputs would get cluttered. Additionally, an individual’s emotion can be better predicted when incorporating emotions from others in their social group, while at the same time people tend to be part of social groups that feel and act in a similar manner. Recognizing emotional subgroups is therefore a more efficient way of detecting emotion or behavior within a crowd. Simply combining the tasks of group and emotion recognition is not likely to suffice, since emotional subgroups can either split up or combine social groups complicating the task.

Findings: Images that show agreement among annotators, are most often those that elicit the use of the summation strategy, while images with partial agreement more often elicit the use of the emotion-based fusion (putting more emphasis on emotion than social groups) or the group-based fusion (putting more emphasis on social groups than on emotion) strategy. Experimenting with different additional features suggests, with a modest performance improvement, that face size and gaze direction contain meaningful information. This shows that the task of emotional subgroup recognition is a complex one, but also that a relatively small feature vector is already able to reasonably represent human perception.

How Labelbox was used: The researchers used Labelbox to have human annotators label a set of 171 images, and their recognition strategies were analyzed. Three main strategies for labeling images are identified, with each strategy assigning either 1) more weight to emotions (emotion-based fusion), 2) more weight to spatial structures (groupbased fusion), or 3) equal weight to both (summation strategy). Based on these strategies, algorithms are developed to automatically recognize emotional subgroups. In particular, K-means and hierarchical clustering are used with location and emotion features derived from a fne-tuned VGG network. Additionally, they experimented with face size and gaze direction as extra input features and found that the best performance came from hierarchical clustering with emotion, location and gaze direction as input.

Read the full PDF here.