Audio Speech Emotion Recognition
January 2024 (1184 Words, 7 Minutes)
awesome-speech-emotion-recognition
research lab && class
AWESOME-MER
📝 A reading list focused on Multimodal Emotion Recognition (MER) 👂 👄 👀 💬
(▫️ indicates a specific modality)
🔆 Datasets
🔆 Projects
🔆 Multimodal Emotion Recognition (MER)
Datasets
- (2018) CMU-MOSEI[▫️Visual▫️Audio▫️Language]
- (2018) ASCERTAIN Dataset[▫️Facial activity data▫️Physiological data]
- (2017) EMOTIC Dataset[▫️Face▫️Context]
- (2016) Multimodal Spontaneous Emotion Database (BP4D+)[▫️Face▫️Thermal data▫️Physiological data]
- (2016) EmotiW Database[▫️Visual▫️Audio]
- (2015) LIRIS-ACCEDE Database[▫️Visual▫️Audio]
- (2014) CREMA-D[▫️Visual▫️Audio]
- (2013) SEMAINE Database[▫️Visual▫️Audio▫️Conversation transcripts]
- (2011) MAHNOB-HCI[▫️Visual▫️Eye gaze▫️Physiological data]
- (2008) IEMOCAP Database[▫️Visual▫️Audio▫️Text transcripts]
- (2005) eNTERFACE Dataset[▫️Visual▫️Audio]
Challenges
- Multimodal (Audio, Facial and Gesture) based Emotion Recognition Challenge (MMER) @ FG
- Emotion Recognition in the Wild Challenge (EmotiW) @ ICMI
- Audio/Visual Emotion Challenge (AVEC) @ ACM MM
- One-Minute Gradual-Emotion Behavior Challenge @ IJCNN
- Multimodal Emotion Recognition Challenge (MEC) @ ACII
- Multimodal Pain Recognition (Face and Body) Challenge (EmoPain) @ FG
Projects
- CMU Multimodal SDK
- Real-Time Multimodal Emotion Recognition
- MixedEmotions Toolbox
- End-to-End Multimodal Emotion Recognition
Related Reviews
- (IEEE Journal of Selected Topics in Signal Processing20) Multimodal Intelligence: Representation Learning, Information Fusion, and Applications [paper]
- (Information Fusion20) A snapshot research and implementation of multimodal information fusion for data-driven emotion recognition [paper]
- (Information Fusion17) A review of affective computing: From unimodal analysis to multimodal fusion [paper]
- (Image and Vision Computing17) A survey of multimodal sentiment analysis [paper]
- (ACM Computing Surveys15) A Review and Meta-Analysis of Multimodal Affect Detection Systems [paper]
Multimodal Emotion Recognition
🔸 CVPR
-
(2020) EmotiCon: Context-Aware Multimodal Emotion Recognition using Frege’s Principle [paper]
[▫️Faces/Gaits ▫️Background ▫️Social interactions]
-
(2017) Emotion Recognition in Context [paper]
[▫️Face ▫️Context]
🔸 ICCV
-
(2019) Context-Aware Emotion Recognition Networks [paper]
[▫️Faces ▫️Context]
-
(2017) A Multimodal Deep Regression Bayesian Network for Affective Video Content Analyses [paper]
[▫️Visual ▫️Audio]
🔸 AAAI
-
(2020) M3ER: Multiplicative Multimodal Emotion Recognition Using Facial, Textual, and Speech Cues [paper]
[▫️Face ▫️Speech ▫️Text ]
-
(2020) An End-to-End Visual-Audio Attention Network for Emotion Recognition in User-Generated Videos [paper]
[▫️Visual ▫️Audio ]
-
(2019) Multi-Interactive Memory Network for Aspect Based Multimodal Sentiment Analysis [paper]
[▫️Visual ▫️Text ]
-
(2019) VistaNet: Visual Aspect Attention Network for Multimodal Sentiment Analysis [paper]
[▫️Visual ▫️Text ]
-
(2019) Cooperative Multimodal Approach to Depression Detection in Twitter [paper]
[▫️Visual ▫️Text ]
-
(2014) Predicting Emotions in User-Generated Videos [paper]
[▫️Visual ▫️Audio ▫️Attribute ]
🔸 IJCAI
-
(2019) DeepCU: Integrating both Common and Unique Latent Information for Multimodal Sentiment Analysis [paper]
[▫️Face ▫️Audio ▫️Text ]
-
(2019) Adapting BERT for Target-Oriented Multimodal Sentiment Classification [paper]
[▫️Image ▫️Text ]
-
(2018) Personality-Aware Personalized Emotion Recognition from Physiological Signals [paper]
[▫️Personality ▫️ Physiological signals ]
-
(2015) Combining Eye Movements and EEG to Enhance Emotion Recognition [paper]
[▫️EEG ▫️Eye movements ]
🔸 ACM MM
-
(2019) Emotion Recognition using Multimodal Residual LSTM Network [paper]
[▫️EEG ▫️Other physiological signals ]
-
(2019) Mutual Correlation Attentive Factors in Dyadic Fusion Networks for Speech Emotion Recognition [paper]
[▫️Audio▫️ Text]
-
(2019) Multimodal Deep Denoise Framework for Affective Video Content Analysis [paper]
[▫️Face ▫️Body gesture▫️Voice▫️ Physiological signals]
🔸 WACV
-
(2016) Multimodal emotion recognition using deep learning architectures [paper]
[▫️Visual ▫️Audio]
🔸 FG
-
(2020) Multimodal Deep Learning Framework for Mental Disorder Recognition [paper]
[▫️Visual ▫️Audio ▫️Text]
-
(2019) Multi-Attention Fusion Network for Video-based Emotion Recognition [paper]
[▫️Visual ▫️Audio]
-
(2019) Audio-Visual Emotion Forecasting: Characterizing and Predicting Future Emotion Using Deep Learning [paper]
[▫️Face ▫️Speech]
🔸 ICMI
-
(2018) Multimodal Local-Global Ranking Fusion for Emotion Recognition [paper]
[▫️Visual ▫️Audio ]
-
(2017) Emotion recognition with multimodal features and temporal models [paper]
[▫️Visual ▫️Audio ]
-
(2017) Modeling Multimodal Cues in a Deep Learning-Based Framework for Emotion Recognition in the Wild [paper]
[▫️Visual ▫️Audio ]
🔸 IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI)
-
(2020) Context Based Emotion Recognition using EMOTIC Dataset [paper]
[▫️Face ▫️Context]
IEEE Transactions on Circuits and Systems for Video Technology
-
(2018) Learning Affective Features With a Hybrid Deep Model for Audio–Visual Emotion Recognition [paper]
[▫️Visual ▫️Audio ]
🔸 IEEE Transactions on Cybernetics
-
(2020) Emotion Recognition From Multimodal Physiological Signals Using a Regularized Deep Fusion of Kernel Machine [paper]
[▫️EEG ▫️Other physiological signals ]
-
(2019) EmotionMeter: A Multimodal Framework for Recognizing Human Emotions [paper]
[▫️EEG ▫️Eye movements]
-
(2015) Temporal Bayesian Fusion for Affect Sensing: Combining Video, Audio, and Lexical Modalities [paper]
[▫️Face ▫️Audio▫️Lexical features]
🔸 IEEE Transactions on Multimedia
-
(2020) Visual-Texual Emotion Analysis With Deep Coupled Video and Danmu Neural Networks [paper]
[▫️Visual▫️Text]
-
(2020) Locally Confined Modality Fusion Network With a Global Perspective for Multimodal Human Affective Computing [paper]
[▫️Visual▫️Audio▫️Language]
-
(2019) Metric Learning-Based Multimodal Audio-Visual Emotion Recognition [paper]
[▫️Visual▫️Audio]
-
(2019) Knowledge-Augmented Multimodal Deep Regression Bayesian Networks for Emotion Video Tagging [paper]
[▫️Visual▫️Audio▫️Attribute]
-
(2018) Multimodal Framework for Analyzing the Affect of a Group of People [paper]
[▫️Face▫️Upper body▫️ Scene]
-
(2012) Kernel Cross-Modal Factor Analysis for Information Fusion With Application to Bimodal Emotion Recognition [paper]
[▫️Visual▫️Audio]
🔸 IEEE Transactions on Affective Computing
-
(2019) Audio-Visual Emotion Recognition in Video Clips [paper]
[▫️Visual ▫️Audio]
-
(2019) Recognizing Induced Emotions of Movie Audiences From Multimodal Information [paper]
[▫️Visual ▫️Audio ▫️Dialogue▫️Attribute]
-
(2019) EmoBed: Strengthening Monomodal Emotion Recognition via Training with Crossmodal Emotion Embeddings [paper]
[▫️Face ▫️Audio]
-
(2018) Combining Facial Expression and Touch for Perceiving Emotional Valence [paper]
[▫️Face ▫️Touch stimuli]
-
(2018) A Combined Rule-Based & Machine Learning Audio-Visual Emotion Recognition Approach [paper]
[▫️Visual ▫️Audio]
-
(2016) Analysis of EEG Signals and Facial Expressions for Continuous Emotion Detection [paper]
[▫️Face ▫️EEG signals]
-
(2013) Exploring Cross-Modality Affective Reactions for Audiovisual Emotion Recognition [paper]
[▫️Face ▫️Audio]
-
(2012) Multimodal Emotion Recognition in Response to Videos [paper]
[▫️Eye gaze ▫️EEG signals]
-
(2012) Context-Sensitive Learning for Enhanced Audiovisual Emotion Classification [paper]
[▫️Visual ▫️Audio ▫️Utterance]
-
(2011) Continuous Prediction of Spontaneous Affect from Multiple Cues and Modalities in Valence-Arousal Space [paper]
[▫️Face ▫️ Shoulder gesture▫️Audio]
🔸 Neurocomputing
-
(2020) Joint low rank embedded multiple features learning for audio–visual emotion recognition [paper]
[▫️Visual ▫️Audio]
-
(2018) Multi-cue fusion for emotion recognition in the wild [paper]
[▫️Visual ▫️Audio]
-
(2018) Multi-modality weakly labeled sentiment learning based on Explicit Emotion Signal for Chinese microblog [paper]
[▫️Visual ▫️Text]
-
(2016) Fusing audio, visual and textual clues for sentiment analysis from multimodal content [paper]
[▫️Visual ▫️Audio▫️ Text]
🔸 Information Fusion
-
(2019) Affective video content analysis based on multimodal data fusion in heterogeneous networks [paper]
[▫️Visual ▫️Audio]
-
(2019) Audio-visual emotion fusion (AVEF): A deep efficient weighted approach [paper]
[▫️Visual ▫️Audio]
🔸 Neural Networks
-
(2015) Towards an intelligent framework for multimodal affective data analysis [paper]
[▫️Visual ▫️Audio▫️ Text]
-
(2015) Multimodal emotional state recognition using sequence-dependent deep hierarchical features [paper]
[▫️Face ▫️Upper-body]
🔸 Others
-
(Knowledge-Based Systems 2018) Multimodal sentiment analysis using hierarchical fusion with context modeling [paper]
[▫️Visual ▫️Audio▫️ Text]
-
(IEEE Journal of Selected Topics in Signal Processing 2017) End-to-End Multimodal Emotion Recognition Using Deep Neural Networks [paper]
[▫️Visual ▫️Audio]
-
(Computer Vision and Image Understanding 2016) Multi-modal emotion analysis from facial expressions and electroencephalogram [paper]
[▫️Face ▫️EEG]