HOME

TheInfoList



OR:

Emotion recognition is the process of identifying human
emotion Emotions are mental states brought on by neurophysiological changes, variously associated with thoughts, feelings, behavioral responses, and a degree of pleasure or displeasure. There is currently no scientific consensus on a definitio ...
. People vary widely in their accuracy at recognizing the emotions of others. Use of technology to help people with emotion recognition is a relatively nascent research area. Generally, the technology works best if it uses multiple modalities in context. To date, the most work has been conducted on automating the recognition of facial expressions from video, spoken expressions from audio, written expressions from text, and
physiology Physiology (; ) is the scientific study of functions and mechanisms in a living system. As a sub-discipline of biology, physiology focuses on how organisms, organ systems, individual organs, cells, and biomolecules carry out the chemic ...
as measured by wearables.


Human

Humans show a great deal of variability in their abilities to recognize emotion. A key point to keep in mind when learning about automated emotion recognition is that there are several sources of "ground truth," or truth about what the real emotion is. Suppose we are trying to recognize the emotions of Alex. One source is "what would most people say that Alex is feeling?" In this case, the 'truth' may not correspond to what Alex feels, but may correspond to what most people would say it looks like Alex feels. For example, Alex may actually feel sad, but he puts on a big smile and then most people say he looks happy. If an automated method achieves the same results as a group of observers it may be considered accurate, even if it does not actually measure what Alex truly feels. Another source of 'truth' is to ask Alex what he truly feels. This works if Alex has a good sense of his internal state, and wants to tell you what it is, and is capable of putting it accurately into words or a number. However, some people are alexithymic and do not have a good sense of their internal feelings, or they are not able to communicate them accurately with words and numbers. In general, getting to the truth of what emotion is actually present can take some work, can vary depending on the criteria that are selected, and will usually involve maintaining some level of uncertainty.


Automatic

Decades of scientific research have been conducted developing and evaluating methods for automated emotion recognition. There is now an extensive literature proposing and evaluating hundreds of different kinds of methods, leveraging techniques from multiple areas, such as
signal processing Signal processing is an electrical engineering subfield that focuses on analyzing, modifying and synthesizing '' signals'', such as sound, images, and scientific measurements. Signal processing techniques are used to optimize transmissions, ...
,
machine learning Machine learning (ML) is a field of inquiry devoted to understanding and building methods that 'learn', that is, methods that leverage data to improve performance on some set of tasks. It is seen as a part of artificial intelligence. Machine ...
,
computer vision Computer vision is an interdisciplinary scientific field that deals with how computers can gain high-level understanding from digital images or videos. From the perspective of engineering, it seeks to understand and automate tasks that the human ...
, and
speech processing Speech processing is the study of speech signals and the processing methods of signals. The signals are usually processed in a digital representation, so speech processing can be regarded as a special case of digital signal processing, applied t ...
. Different methodologies and techniques may be employed to interpret emotion such as
Bayesian network A Bayesian network (also known as a Bayes network, Bayes net, belief network, or decision network) is a probabilistic graphical model that represents a set of variables and their conditional dependencies via a directed acyclic graph (DAG). Ba ...
s. , Gaussian
Mixture model In statistics, a mixture model is a probabilistic model for representing the presence of subpopulations within an overall population, without requiring that an observed data set should identify the sub-population to which an individual observati ...
s and
Hidden Markov Models A hidden Markov model (HMM) is a statistical Markov model in which the system being modeled is assumed to be a Markov process — call it X — with unobservable ("''hidden''") states. As part of the definition, HMM requires that there be an ...
and deep neural networks.


Approaches

The accuracy of emotion recognition is usually improved when it combines the analysis of human expressions from multimodal forms such as texts, physiology, audio, or video. Different
emotion Emotions are mental states brought on by neurophysiological changes, variously associated with thoughts, feelings, behavioral responses, and a degree of pleasure or displeasure. There is currently no scientific consensus on a definitio ...
types are detected through the integration of information from
facial expressions A facial expression is one or more motions or positions of the muscles beneath the skin of the face. According to one set of controversial theories, these movements convey the emotional state of an individual to observers. Facial expressions are ...
, body movement and
gestures A gesture is a form of non-verbal communication or non-vocal communication in which visible bodily actions communicate particular messages, either in place of, or in conjunction with, speech. Gestures include movement of the hands, face, o ...
, and speech. The technology is said to contribute in the emergence of the so-called emotional or emotive Internet. The existing approaches in emotion recognition to classify certain
emotion Emotions are mental states brought on by neurophysiological changes, variously associated with thoughts, feelings, behavioral responses, and a degree of pleasure or displeasure. There is currently no scientific consensus on a definitio ...
types can be generally classified into three main categories: knowledge-based techniques, statistical methods, and hybrid approaches.


Knowledge-based techniques

Knowledge-based techniques (sometimes referred to as
lexicon A lexicon is the vocabulary of a language or branch of knowledge (such as nautical or medical). In linguistics, a lexicon is a language's inventory of lexemes. The word ''lexicon'' derives from Greek word (), neuter of () meaning 'of or fo ...
-based techniques), utilize domain knowledge and the
semantic Semantics (from grc, σημαντικός ''sēmantikós'', "significant") is the study of reference, meaning, or truth. The term can be used to refer to subfields of several distinct disciplines, including philosophy, linguistics and comput ...
and
syntactic In linguistics, syntax () is the study of how words and morphemes combine to form larger units such as phrases and sentences. Central concerns of syntax include word order, grammatical relations, hierarchical sentence structure (constituency) ...
characteristics of language in order to detect certain
emotion Emotions are mental states brought on by neurophysiological changes, variously associated with thoughts, feelings, behavioral responses, and a degree of pleasure or displeasure. There is currently no scientific consensus on a definitio ...
types. In this approach, it is common to use knowledge-based resources during the emotion classification process such as
WordNet WordNet is a lexical database of semantic relations between words in more than 200 languages. WordNet links words into semantic relations including synonyms, hyponyms, and meronyms. The synonyms are grouped into '' synsets'' with short defin ...
, SenticNet,
ConceptNet Open Mind Common Sense (OMCS) is an artificial intelligence project based at the Massachusetts Institute of Technology (MIT) Media Lab whose goal is to build and utilize a large commonsense knowledge base from the contributions of many thousands ...
, and EmotiNet, to name a few. One of the advantages of this approach is the accessibility and economy brought about by the large availability of such knowledge-based resources. A limitation of this technique on the other hand, is its inability to handle concept nuances and complex linguistic rules. Knowledge-based techniques can be mainly classified into two categories: dictionary-based and corpus-based approaches. Dictionary-based approaches find opinion or
emotion Emotions are mental states brought on by neurophysiological changes, variously associated with thoughts, feelings, behavioral responses, and a degree of pleasure or displeasure. There is currently no scientific consensus on a definitio ...
seed words in a
dictionary A dictionary is a listing of lexemes from the lexicon of one or more specific languages, often arranged alphabetically (or by radical and stroke for ideographic languages), which may include information on definitions, usage, etymologie ...
and search for their
synonym A synonym is a word, morpheme, or phrase that means exactly or nearly the same as another word, morpheme, or phrase in a given language. For example, in the English language, the words ''begin'', ''start'', ''commence'', and ''initiate'' are al ...
s and
antonym In lexical semantics, opposites are words lying in an inherently incompatible binary relationship. For example, something that is ''long'' entails that it is not ''short''. It is referred to as a 'binary' relationship because there are two members ...
s to expand the initial list of opinions or
emotion Emotions are mental states brought on by neurophysiological changes, variously associated with thoughts, feelings, behavioral responses, and a degree of pleasure or displeasure. There is currently no scientific consensus on a definitio ...
s. Corpus-based approaches on the other hand, start with a seed list of opinion or
emotion Emotions are mental states brought on by neurophysiological changes, variously associated with thoughts, feelings, behavioral responses, and a degree of pleasure or displeasure. There is currently no scientific consensus on a definitio ...
words, and expand the database by finding other words with context-specific characteristics in a large
corpus Corpus is Latin for "body". It may refer to: Linguistics * Text corpus, in linguistics, a large and structured set of texts * Speech corpus, in linguistics, a large set of speech audio files * Corpus linguistics, a branch of linguistics Music * ...
. While corpus-based approaches take into account context, their performance still vary in different domains since a word in one domain can have a different orientation in another domain.


Statistical methods

Statistical methods commonly involve the use of different supervised
machine learning Machine learning (ML) is a field of inquiry devoted to understanding and building methods that 'learn', that is, methods that leverage data to improve performance on some set of tasks. It is seen as a part of artificial intelligence. Machine ...
algorithms in which a large set of annotated data is fed into the algorithms for the system to learn and predict the appropriate
emotion Emotions are mental states brought on by neurophysiological changes, variously associated with thoughts, feelings, behavioral responses, and a degree of pleasure or displeasure. There is currently no scientific consensus on a definitio ...
types.
Machine learning Machine learning (ML) is a field of inquiry devoted to understanding and building methods that 'learn', that is, methods that leverage data to improve performance on some set of tasks. It is seen as a part of artificial intelligence. Machine ...
algorithms generally provide more reasonable classification accuracy compared to other approaches, but one of the challenges in achieving good results in the classification process, is the need to have a sufficiently large training set. Some of the most commonly used
machine learning Machine learning (ML) is a field of inquiry devoted to understanding and building methods that 'learn', that is, methods that leverage data to improve performance on some set of tasks. It is seen as a part of artificial intelligence. Machine ...
algorithms include Support Vector Machines (SVM), Naive Bayes, and Maximum Entropy.
Deep learning Deep learning (also known as deep structured learning) is part of a broader family of machine learning methods based on artificial neural networks with representation learning. Learning can be supervised, semi-supervised or unsupervised. ...
, which is under the unsupervised family of
machine learning Machine learning (ML) is a field of inquiry devoted to understanding and building methods that 'learn', that is, methods that leverage data to improve performance on some set of tasks. It is seen as a part of artificial intelligence. Machine ...
, is also widely employed in emotion recognition. Well-known
deep learning Deep learning (also known as deep structured learning) is part of a broader family of machine learning methods based on artificial neural networks with representation learning. Learning can be supervised, semi-supervised or unsupervised. ...
algorithms include different architectures of Artificial Neural Network (ANN) such as Convolutional Neural Network (CNN), Long Short-term Memory (LSTM), and Extreme Learning Machine (ELM). The popularity of
deep learning Deep learning (also known as deep structured learning) is part of a broader family of machine learning methods based on artificial neural networks with representation learning. Learning can be supervised, semi-supervised or unsupervised. ...
approaches in the domain of emotion recognition may be mainly attributed to its success in related applications such as in
computer vision Computer vision is an interdisciplinary scientific field that deals with how computers can gain high-level understanding from digital images or videos. From the perspective of engineering, it seeks to understand and automate tasks that the human ...
,
speech recognition Speech recognition is an interdisciplinary subfield of computer science and computational linguistics that develops methodologies and technologies that enable the recognition and translation of spoken language into text by computers with the ...
, and Natural Language Processing (NLP).


Hybrid approaches

Hybrid approaches in emotion recognition are essentially a combination of knowledge-based techniques and statistical methods, which exploit complementary characteristics from both techniques. Some of the works that have applied an ensemble of knowledge-driven linguistic elements and statistical methods include sentic computing and iFeel, both of which have adopted the concept-level knowledge-based resource SenticNet. The role of such knowledge-based resources in the implementation of hybrid approaches is highly important in the
emotion Emotions are mental states brought on by neurophysiological changes, variously associated with thoughts, feelings, behavioral responses, and a degree of pleasure or displeasure. There is currently no scientific consensus on a definitio ...
classification process. Since hybrid techniques gain from the benefits offered by both knowledge-based and statistical approaches, they tend to have better classification performance as opposed to employing knowledge-based or statistical methods independently. A downside of using hybrid techniques however, is the computational complexity during the classification process.


Datasets

Data is an integral part of the existing approaches in emotion recognition and in most cases it is a challenge to obtain annotated data that is necessary to train
machine learning Machine learning (ML) is a field of inquiry devoted to understanding and building methods that 'learn', that is, methods that leverage data to improve performance on some set of tasks. It is seen as a part of artificial intelligence. Machine ...
algorithms. For the task of classifying different
emotion Emotions are mental states brought on by neurophysiological changes, variously associated with thoughts, feelings, behavioral responses, and a degree of pleasure or displeasure. There is currently no scientific consensus on a definitio ...
types from multimodal sources in the form of texts, audio, videos or physiological signals, the following datasets are available: #HUMAINE: provides natural clips with emotion words and context labels in multiple modalities #Belfast database: provides clips with a wide range of emotions from TV programs and interview recordings #SEMAINE: provides audiovisual recordings between a person and a virtual agent and contains
emotion Emotions are mental states brought on by neurophysiological changes, variously associated with thoughts, feelings, behavioral responses, and a degree of pleasure or displeasure. There is currently no scientific consensus on a definitio ...
annotations such as angry, happy, fear, disgust, sadness, contempt, and amusement #IEMOCAP: provides recordings of dyadic sessions between actors and contains
emotion Emotions are mental states brought on by neurophysiological changes, variously associated with thoughts, feelings, behavioral responses, and a degree of pleasure or displeasure. There is currently no scientific consensus on a definitio ...
annotations such as happiness, anger, sadness, frustration, and neutral state #eNTERFACE: provides audiovisual recordings of subjects from seven nationalities and contains
emotion Emotions are mental states brought on by neurophysiological changes, variously associated with thoughts, feelings, behavioral responses, and a degree of pleasure or displeasure. There is currently no scientific consensus on a definitio ...
annotations such as happiness, anger, sadness, surprise, disgust, and fear #DEAP: provides
electroencephalography Electroencephalography (EEG) is a method to record an electrogram of the spontaneous electrical activity of the brain. The biosignals detected by EEG have been shown to represent the postsynaptic potentials of pyramidal neurons in the neocorte ...
( EEG),
electrocardiography Electrocardiography is the process of producing an electrocardiogram (ECG or EKG), a recording of the heart's electrical activity. It is an electrogram of the heart which is a graph of voltage versus time of the electrical activity of the hear ...
( ECG), and face video recordings, as well as
emotion Emotions are mental states brought on by neurophysiological changes, variously associated with thoughts, feelings, behavioral responses, and a degree of pleasure or displeasure. There is currently no scientific consensus on a definitio ...
annotations in terms of valence,
arousal Arousal is the physiological and psychological state of being awoken or of sense organs stimulated to a point of perception. It involves activation of the ascending reticular activating system (ARAS) in the brain, which mediates wakefulness, th ...
, and dominance of people watching film clips #DREAMER: provides
electroencephalography Electroencephalography (EEG) is a method to record an electrogram of the spontaneous electrical activity of the brain. The biosignals detected by EEG have been shown to represent the postsynaptic potentials of pyramidal neurons in the neocorte ...
( EEG) and
electrocardiography Electrocardiography is the process of producing an electrocardiogram (ECG or EKG), a recording of the heart's electrical activity. It is an electrogram of the heart which is a graph of voltage versus time of the electrical activity of the hear ...
( ECG) recordings, as well as
emotion Emotions are mental states brought on by neurophysiological changes, variously associated with thoughts, feelings, behavioral responses, and a degree of pleasure or displeasure. There is currently no scientific consensus on a definitio ...
annotations in terms of valence,
arousal Arousal is the physiological and psychological state of being awoken or of sense organs stimulated to a point of perception. It involves activation of the ascending reticular activating system (ARAS) in the brain, which mediates wakefulness, th ...
, and dominance of people watching film clips #MELD: is a multiparty conversational dataset where each utterance is labeled with emotion and sentiment. MELD provides conversations in video format and hence suitable for multimodal emotion recognition and
sentiment analysis Sentiment analysis (also known as opinion mining or emotion AI) is the use of natural language processing, text analysis, computational linguistics, and biometrics to systematically identify, extract, quantify, and study affective states and subjec ...
. MELD is useful for multimodal sentiment analysis and emotion recognition,
dialogue system A dialogue system, or conversational agent (CA), is a computer system intended to converse with a human. Dialogue systems employed one or more of text, speech, graphics, haptics, gestures, and other modes for communication on both the input and o ...
s and emotion recognition in conversations. #MuSe: provides audiovisual recordings of natural interactions between a person and an object. It has discrete and continuous
emotion Emotions are mental states brought on by neurophysiological changes, variously associated with thoughts, feelings, behavioral responses, and a degree of pleasure or displeasure. There is currently no scientific consensus on a definitio ...
annotations in terms of valence, arousal and trustworthiness as well as speech topics useful for multimodal sentiment analysis and emotion recognition. #UIT-VSMEC: is a standard Vietnamese Social Media Emotion Corpus (UIT-VSMEC) with about 6,927 human-annotated sentences with six emotion labels, contributing to emotion recognition research in Vietnamese which is a low-resource language in Natural Language Processing (NLP). #BED: provides
electroencephalography Electroencephalography (EEG) is a method to record an electrogram of the spontaneous electrical activity of the brain. The biosignals detected by EEG have been shown to represent the postsynaptic potentials of pyramidal neurons in the neocorte ...
( EEG) recordings, as well as
emotion Emotions are mental states brought on by neurophysiological changes, variously associated with thoughts, feelings, behavioral responses, and a degree of pleasure or displeasure. There is currently no scientific consensus on a definitio ...
annotations in terms of valence and
arousal Arousal is the physiological and psychological state of being awoken or of sense organs stimulated to a point of perception. It involves activation of the ascending reticular activating system (ARAS) in the brain, which mediates wakefulness, th ...
of people watching images. It also includes
electroencephalography Electroencephalography (EEG) is a method to record an electrogram of the spontaneous electrical activity of the brain. The biosignals detected by EEG have been shown to represent the postsynaptic potentials of pyramidal neurons in the neocorte ...
( EEG) recordings of people exposed to various stimuli ( SSVEP, resting with eyes closed, resting with eyes open, cognitive tasks) for the task of EEG-based
biometrics Biometrics are body measurements and calculations related to human characteristics. Biometric authentication (or realistic authentication) is used in computer science as a form of identification and access control. It is also used to identify i ...
.


Applications

Emotion recognition is used in society for a variety of reasons. Affectiva, which spun out of MIT, provides
artificial intelligence Artificial intelligence (AI) is intelligence—perceiving, synthesizing, and inferring information—demonstrated by machines, as opposed to intelligence displayed by animals and humans. Example tasks in which this is done include speech ...
software that makes it more efficient to do tasks previously done manually by people, mainly to gather facial expression and vocal expression information related to specific contexts where viewers have consented to share this information. For example, instead of filling out a lengthy survey about how you feel at each point watching an educational video or advertisement, you can consent to have a camera watch your face and listen to what you say, and note during which parts of the experience you show expressions such as boredom, interest, confusion, or smiling. (Note that this does not imply it is reading your innermost feelings—it only reads what you express outwardly.) Other uses by Affectiva include helping children with autism, helping people who are blind to read facial expressions, helping robots interact more intelligently with people, and monitoring signs of attention while driving in an effort to enhance driver safety.
patent
filed by
Snapchat Snapchat is an American multimedia instant messaging app and service developed by Snap Inc., originally Snapchat Inc. One of the principal features of Snapchat is that pictures and messages are usually only available for a short time before the ...
in 2015 describes a method of extracting data about crowds at public events by performing algorithmic emotion recognition on users' geotagged
selfie A selfie () is a self-portrait photograph, typically taken with a digital camera or smartphone, which may be held in the hand or supported by a selfie stick. Selfies are often shared on social media, via social networking services such ...
s. Emotient was a
startup company A startup or start-up is a company or project undertaken by an entrepreneur to seek, develop, and validate a scalable business model. While entrepreneurship refers to all new businesses, including self-employment and businesses that never intend ...
which applied emotion recognition to reading frowns, smiles, and other expressions on faces, namely
artificial intelligence Artificial intelligence (AI) is intelligence—perceiving, synthesizing, and inferring information—demonstrated by machines, as opposed to intelligence displayed by animals and humans. Example tasks in which this is done include speech ...
to predict "attitudes and actions based on facial expressions".
Apple An apple is an edible fruit produced by an apple tree (''Malus domestica''). Apple trees are cultivated worldwide and are the most widely grown species in the genus '' Malus''. The tree originated in Central Asia, where its wild ancest ...
bought Emotient in 2016 and uses emotion recognition technology to enhance the emotional intelligence of its products. nViso provides real-time emotion recognition for web and mobile applications through a real-time API. Visage Technologies AB offers emotion estimation as a part of their
Visage SDK Visage may refer to: *A synonym of face * Visage Mobile, an American software as a service company * Visage, Georgia, a community in the United States * ''Visage'' (film), also known as ''Face'', a 2009 French film * ''Visage'' (video game), a sur ...
for
marketing Marketing is the process of exploring, creating, and delivering value to meet the needs of a target market in terms of goods and services; potentially including selection of a target audience; selection of certain attributes or themes to emph ...
and scientific research and similar purposes. Eyeris is an emotion recognition company that works with
embedded system An embedded system is a computer system—a combination of a computer processor, computer memory, and input/output peripheral devices—that has a dedicated function within a larger mechanical or electronic system. It is ''embedded ...
manufacturers including car makers and social robotic companies on integrating its face analytics and emotion recognition software; as well as with video content creators to help them measure the perceived effectiveness of their short and long form video creative. Many products also exist to aggregate information from emotions communicated online, including via "like" button presses and via counts of positive and negative phrases in text and affect recognition is increasingly used in some kinds of games and virtual reality, both for educational purposes and to give players more natural control over their social avatars.


Subfields of emotion recognition

Emotion recognition is probably to gain the best outcome if applying multiple modalities by combining different objects, including
text Text may refer to: Written word * Text (literary theory), any object that can be read, including: **Religious text, a writing that a religious tradition considers to be sacred **Text, a verse or passage from scripture used in expository preachin ...
(conversation), audio, video, and
physiology Physiology (; ) is the scientific study of functions and mechanisms in a living system. As a sub-discipline of biology, physiology focuses on how organisms, organ systems, individual organs, cells, and biomolecules carry out the chemic ...
to detect emotions.


Emotion recognition in text

Text data is a favorable research object for emotion recognition when it is free and available everywhere in human life. Compare to other types of data, the storage of text data is lighter and easy to compress to the best performance due to the frequent repetition of words and characters in languages. Emotions can be extracted from two essential text forms: written texts and
conversation Conversation is interactive communication between two or more people. The development of conversational skills and etiquette is an important part of socialization. The development of conversational skills in a new language is a frequent focus ...
s (dialogues). For written texts, many scholars focus on working with sentence level to extract "words/phrases" representing emotions.


Emotion recognition in audio

Different from emotion recognition in text, vocal signals are used for the recognition to extract emotions from audio.


Emotion recognition in video

Video data is a combination of audio data, image data and sometimes texts (in case of
subtitles Subtitles and captions are lines of dialogue or other text displayed at the bottom of the screen in films, television programs, video games or other visual media. They can be transcriptions of the screenplay, translations of it, or informa ...
Dhall, A., Goecke, R., Lucey, S., & Gedeon, T. (2012)
Collecting large, richly annotated facial-expression databases from movies
IEEE multimedia, (3), 34-41.
).


Emotion recognition in conversation

Emotion recognition in conversation (ERC) extracts opinions between participants from massive conversational data in social platforms, such as
Facebook Facebook is an online social media and social networking service owned by American company Meta Platforms. Founded in 2004 by Mark Zuckerberg with fellow Harvard College students and roommates Eduardo Saverin, Andrew McCollum, Dust ...
,
Twitter Twitter is an online social media and social networking service owned and operated by American company Twitter, Inc., on which users post and interact with 280-character-long messages known as "tweets". Registered users can post, like, and ...
, YouTube, and others.Poria, S., Majumder, N., Mihalcea, R., & Hovy, E. (2019)
Emotion recognition in conversation: Research challenges, datasets, and recent advances
IEEE Access, 7, 100943-100953.
ERC can take input data like text, audio, video or a combination form to detect several emotions such as fear, lust, pain, and pleasure.


See also

*
Affective computing Affective computing is the study and development of systems and devices that can recognize, interpret, process, and simulate human affects. It is an interdisciplinary field spanning computer science, psychology, and cognitive science. While so ...
*
Face perception Facial perception is an individual's understanding and interpretation of the face. Here, perception implies the presence of consciousness and hence excludes automated facial recognition systems. Although facial recognition is found in other sp ...
*
Facial recognition system A facial recognition system is a technology capable of matching a human face from a digital image or a video frame against a database of faces. Such a system is typically employed to authenticate users through ID verification services, and ...
*
Sentiment analysis Sentiment analysis (also known as opinion mining or emotion AI) is the use of natural language processing, text analysis, computational linguistics, and biometrics to systematically identify, extract, quantify, and study affective states and subjec ...
* Interpersonal accuracy


References

{{Nonverbal communication Emotion Applications of artificial intelligence Affective computing