Emotion recognition in security: should it be implemented?

15 Feb Emotion recognition in security: should it be implemented?

Posted at 06:00h in Features, Selected-Features by Mike Dingle

Emotion recognition is widely used in customer service, neuromarketing and education – but should it be introduced into security applications as well? Sergei Novikov, CTO and Evgenia Marina, Business Development Director, MENA, from RecFaces advise we should proceed with caution.

According to an October 2022 report from FutureWise analytical agency, the market for emotion recognition technologies will grow by an average of 12.9% per year and by 2028 will reach almost $49B USD. So why is how we’re feeling so valuable?

Emotion is personal data about an individual’s states and feelings, their thoughts and intentions (even if subconscious), and their responses to stimuli, people and environment.

Psychologists also distinguish other emotional processes: affect, mood and feeling. They differ in duration and a degree of involvement: while in an affective state a person loses their will for a relatively short period of time moods and feelings last longer and are easier to cope with, serving as a background for emotions – a direct reaction to what is happening.

Of course, a person can control their appearance to a certain degree. However, it’s widely held that negative emotions are much more difficult to control than positive ones.

This partly explains why emotion recognition is a popular option in security: poorly controlled negative emotions can easily ‘fuel’ illegal actions, which can be prevented if such emotions are detected in time.

An average person can identify an emotion with an accuracy of up to 70%, but to do so they actively use contextual information such as voice and gestures. There are some commercial emotion recognition systems based on neural networks which deliver accuracy up to 90%. However, it turns out that this number is not enough for security purposes.

Why? First, let’s talk theory.

Technical implementation of emotion recognition

Emotion recognition can be performed differently. For example, it can be based on classification of key points tied to the position of eyes, eyebrows, lips, nose, and jaw. Descriptors based on this visual information are attached to each point, giving a certain vector that helps to identify emotions.

A more advanced method uses highly accurate deep neural networks trained on large datasets. To increase the accuracy, they use not just individual images, but a series of images that show positions of facial muscles in dynamics. This also makes it possible to include representatives from different ethnic and cultural groups, as well as people of different temper who express emotions and react to stress differently.

Should we implement?

AI-solutions serve, primarily, for work automation, reduction of workload and elimination of human factor. For example, watching for unusual behavior in a crowd means increased workload for security personnel that may lead to oversight. Emotion recognition systems facilitate this process, automatically classifying people in a crowd and minimising datasets that personnel have to go through manually (e.g. by tagging suspicious people).

However, it is not as easy as it seems. There are so many nuances to emotion recognition that its successful implementation depends on a particular business situation.

Technical limitations

Accurate emotion recognition requires a series of high-quality images. A car security system, for example, works only with images of its driver, while to control a crowd, a system would process images of dozens or even hundreds of people. This requires extremely expensive and sophisticated technology.

Cultural limitations

Different cultural and ethnic groups are characterised by different degrees and patterns of emotional expression. As a rule, one ethnic group always prevails in the training dataset, differing from region to region, while in reality, in a crowd there will be people of many different nationalities. The neural network will produce a response specified for the ‘national majority’, which will inevitably be less accurate.

Psychological limitations

The intensity of how people manifest their emotions, as well as the content of an emotional reaction itself, depends on many factors: temper, upbringing, education, age, social status etc. People react to stress in different ways: one falls into a stupor, another goes into hysterics, another retains a neutral expression etc. A person can change emotions to the opposite within a short time, reacting to a telephone conversation or to a random poster. The cameras will also record different background emotions depending on where they were installed. This confirms the importance of linking the system to the case.

The emotion recognition system will work well with those offenders who act spontaneously – for example, in an accidental fight provoked by rudeness. But in case of a planned crime, when an offender controls themselves well, mistakes are possible.

The very fact that people will be aware that emotion recognition systems are used to monitor crowds will make people more careful about expressing their emotions.

Facial expressions

Wrinkles and overall facial structure (especially at a certain age) may form a so-called ‘mask’ typical for a particular emotion that appears even in a neutral state. It may have nothing to do with a person’s current emotional state, but rather with their lifestyle, age, sagging muscles (e.g. in the corners of a mouth) etc. A neural network may mistake such a ‘mask’ for a true emotion. Insufficient lighting, low resolution of cameras or their improper location can also significantly increase the probability of error.

Ethical and legal issues

The control over people’s emotions by law enforcement and security services erases the personal boundaries and infringes upon international human rights.

The fundamental legal principle – the presumption of innocence. A person can fall under suspicion only because they have the ‘wrong’ facial expression, while they did not even think about violating the law. This was noted by ARTICLE 19, which released a report in January 2021 on the use of multimodal emotion recognition systems by Chinese law enforcement agencies and commercial structures.

If emotion recognition systems work in commercial enterprises, this can lead to psychological stress and burnout – as well as other methods of control.

Limitations of standardisation and replication

Each variant of the commercial deployment is tied to a specific business case and, most likely, another enterprise will not be able to use it without serious reconfiguration. Monitored emotions, their intensity, recognition accuracy, interpretation of results – all these characteristics of the system vary from implementation to implementation. Therefore, for now we are talking about developing customised solutions only. Sometimes it has commercial potential, but more often it doesn’t.

Will it make its mark in the future?

The widespread use of multimodal emotion recognition systems raises serious ethical and legal issues. On the other hand, such systems require significant computing power and technical equipment, and therefore are extremely expensive and are of no interest to either private customers or the public sector.

Emotion recognition systems, which are already common in marketing and the gaming industry, are not actively used in security. Currently, it is a product of ‘individual tailoring’ for specific requirements and scenarios. In addition, because of the specificity of facial reactions, the accuracy of such systems remains low.

The use of emotion recognition systems in security is still an open question. Such systems will become in demand when the commercial effect of their implementation will surpass the costs of development, installation and configuration.

Emotion recognition and security

Emotion recognition is expected to reduce the number of crimes and protect citizens. It’s already been used with some success in Chinese Smart Cities.

A system of CCTV cameras along with other equipment were connected to police servers, recording voice, body temperature, and movements of people coming into view. Such a multimodal system quite accurately identifies and indicates the source of a potential threat. In some locations (for example, in elevators) it helped to get rid of violations. However, this method recognises not simply emotions, but whole patterns of human behaviour.