A Comprehensive Review of Unimodal and Multimodal Emotion Detection: Datasets, Approaches, and Limitations
Priyanka Thakur
Department of CSE, University Institute of Engineering and Technology, Panjab University, Chandigarh, India
Search for more papers by this authorCorresponding Author
Nirmal Kaur
Department of CSE, University Institute of Engineering and Technology, Panjab University, Chandigarh, India
Correspondence:
Nirmal Kaur ([email protected])
Search for more papers by this authorNaveen Aggarwal
Department of CSE, University Institute of Engineering and Technology, Panjab University, Chandigarh, India
Search for more papers by this authorSarbjeet Singh
Department of CSE, University Institute of Engineering and Technology, Panjab University, Chandigarh, India
Search for more papers by this authorPriyanka Thakur
Department of CSE, University Institute of Engineering and Technology, Panjab University, Chandigarh, India
Search for more papers by this authorCorresponding Author
Nirmal Kaur
Department of CSE, University Institute of Engineering and Technology, Panjab University, Chandigarh, India
Correspondence:
Nirmal Kaur ([email protected])
Search for more papers by this authorNaveen Aggarwal
Department of CSE, University Institute of Engineering and Technology, Panjab University, Chandigarh, India
Search for more papers by this authorSarbjeet Singh
Department of CSE, University Institute of Engineering and Technology, Panjab University, Chandigarh, India
Search for more papers by this authorFunding: This work was supported by IIT Mandi iHub & HCI Foundation under grant number: IIT MANDI iHub/RD/2023-2025/04, as part of the project ‘Development of Multimodal and Multilingual Human Emotion Detection System’.
ABSTRACT
Emotion detection from face and speech is inherent for human–computer interaction, mental health assessment, social robotics, and emotional intelligence. Traditional machine learning methods typically depend on handcrafted features and are primarily centred on unimodal systems. However, the unique characteristics of facial expressions and the variability in speech features present challenges in capturing complex emotional states. Accordingly, deep learning models have been substantial in automatically extracting intrinsic emotional features with greater accuracy across multiple modalities. The proposed article presents a comprehensive review of recent progress in emotion detection, spanning from unimodal to multimodal systems, with a focus on facial and speech modalities. It examines state-of-the-art machine learning, deep learning, and the latest transformer-based approaches for emotion detection. The review aims to provide an in-depth analysis of both unimodal and multimodal emotion detection techniques, highlighting their limitations, popular datasets, challenges, and the best-performing models. Such analysis aids researchers in judicious selection of the most appropriate dataset and audio-visual emotion detection models. Key findings suggest that integrating multimodal data significantly improves emotion recognition, particularly when utilising deep learning methods trained on synchronised audio and video datasets. By assessing recent advancements and current challenges, this article serves as a fundamental resource for researchers and practitioners in the field of emotional AI, thereby aiding in the creation of more intuitive and empathetic technologies.
Open Research
Data Availability Statement
The data that support the findings of this study are available on request from the corresponding author. The data are not publicly available due to privacy or ethical restrictions.
References
- Abdelhamid, A. A., E.-S. M. El-Kenawy, B. Alotaibi, et al. 2022. “Robust Speech Emotion Recognition Using CNN+ LSTM Based on Stochastic Fractal Search Optimization Algorithm.” IEEE Access 10: 49265–49284.
10.1109/ACCESS.2022.3172954 Google Scholar
- Agarwal, G., and H. Om. 2021. “Performance of Deer Hunting Optimization Based Deep Learning Algorithm for Speech Emotion Recognition.” Multimedia Tools and Applications 80, no. 7: 9961–9992.
- Agrawal, A., and N. Mittal. 2020. “Using CNN for Facial Expression Recognition: A Study of the Effects of Kernel Size and Number of Filters on Accuracy.” Visual Computer 36, no. 2: 405–412.
- Ahmed, N., Z. Al Aghbari, and S. Girija. 2023. “A Systematic Survey on Multimodal Emotion Recognition Using Learning Algorithms.” Intelligent Systems with Applications 17: 200171.
- Akhand, M. A. H., S. Roy, N. Siddique, M. A. S. Kamal, and T. Shimamura. 2021. “Facial Emotion Recognition Using Transfer Learning in the Deep CNN.” Electronics 10, no. 9: 1036.
- Al-Dujaili Al-Khazraji, M. J., and A. Ebrahimi-Moghadam. 2024. “An Innovative Method for Speech Signal Emotion Recognition Based on Spectral Features Using GMM and HMM Techniques.” Wireless Personal Communications 134: 1–753.
10.1007/s11277-024-10918-6 Google Scholar
- Alluhaidan, A. S., O. Saidani, R. Jahangir, M. A. Nauman, and O. S. Neffati. 2023. “Speech Emotion Recognition Through Hybrid Features and Convolutional Neural Network.” Applied Sciences 13, no. 8: 4750.
- Alnuaim, A. A., M. Zakariah, P. K. Shukla, et al. 2022. “Human-Computer Interaction for Recognizing Speech Emotions Using Multilayer Perceptron Classifier.” Journal of Healthcare Engineering 1: 6005446.
- Alslaity, A., and R. Orji. 2024. “Machine Learning Techniques for Emotion Detection and Sentiment Analysis: Current State, Challenges, and Future Directions.” Behaviour & Information Technology 43, no. 1: 139–164.
- Andayani, F., L. B. Theng, M. T. Tsun, and C. Chua. 2022. “Hybrid LSTM-Transformer Model for Emotion Recognition From Speech Audio Files.” IEEE Access 10: 36018–36027.
- Aranha, R. V., C. G. Corrêa, and F. L. S. Nunes. 2019. “Adapting Software With Affective Computing: A Systematic Review.” IEEE Transactions on Affective Computing 12, no. 4: 883–899.
10.1109/TAFFC.2019.2902379 Google Scholar
- Avots, E., T. Sapiński, M. Bachmann, and D. Kamińska. 2019. “Audiovisual Emotion Recognition in Wild.” Machine Vision and Applications 30, no. 5: 975–985.
- Bakhshi, A., A. Harimi, and S. Chalup. 2022. “CyTex: Transforming Speech to Textured Images for Speech Emotion Recognition.” Speech Communication 139: 62–75.
- Banskota, N., A. Alsadoon, P. W. C. Prasad, A. Dawoud, T. A. Rashid, and O. H. Alsadoon. 2023. “A Novel Enhanced Convolution Neural Network With Extreme Learning Machine: Facial Emotional Recognition in Psychology Practices.” Multimedia Tools and Applications 82, no. 5: 6479–6503.
- Bänziger, T., H. Pirker, and K. Scherer. 2006. “GEMEP-GEneva Multimodal Emotion Portrayals: A Corpus for the Study of Multimodal Emotional Expressions.” Proceedings of LREC 6: 15–19.
- Baveye, Y., E. Dellandrea, C. Chamaret, and L. Chen. 2015. “LIRIS-ACCEDE: A Video Database for Affective Content Analysis.” IEEE Transactions on Affective Computing 6, no. 1: 43–55.
- Bhangale, K. B., and M. Kothandaraman. 2023. “Speech Emotion Recognition Using the Novel PEmoNet (Parallel Emotion Network).” Applied Acoustics 212: 109613.
- Bhattacharya, S., S. Borah, B. K. Mishra, and A. Mondal. 2022. “Emotion Detection From Multilingual Audio Using Deep Analysis.” Multimedia Tools and Applications 81, no. 28: 41309–41338.
- Bhavan, A., P. Chauhan, and R. R. Shah. 2019. “Bagged Support Vector Machines for Emotion Recognition From Speech.” Knowledge-Based Systems 184: 104886.
- Bilquise, G., S. Ibrahim, and K. Shaalan. 2022. “Emotionally Intelligent Chatbots: A Systematic Literature Review.” Human Behavior and Emerging Technologies 1: 9601630.
- Burkhardt, F., A. Paeschke, M. Rolfes, et al. 2005. A Database of German Emotional Speech. Vol. 5. Interspeech.
10.21437/Interspeech.2005-446 Google Scholar
- Busso, C., M. Bulut, C.-C. Lee, et al. 2008. “IEMOCAP: Interactive Emotional Dyadic Motion Capture Database.” Language Resources and Evaluation 42: 335–359.
- Busso, C., S. Parthasarathy, A. Burmania, M. AbdelWahab, N. Sadoughi, and E. M. Provost. 2016. “MSP-IMPROV: An Acted Corpus of Dyadic Interactions to Study Emotion Perception.” IEEE Transactions on Affective Computing 8, no. 1: 67–80.
- Cai, L., J. Dong, and M. Wei. 2020. “ Multi-Modal Emotion Recognition From Speech and Facial Expression Based on Deep Learning.” In 2020 Chinese Automation Congress (CAC). IEEE.
10.1109/CAC51589.2020.9327178 Google Scholar
- Calvo, M. G., A. Fernández-Martín, G. Recio, and D. Lundqvist. 2018. “Human Observers and Automated Assessment of Dynamic Emotional Facial Expressions: KDEF-Dyn Database Validation.” Frontiers in Psychology 9: 2052.
- Canal, F. Z., T. R. Müller, J. C. Matias, et al. 2022. “A Survey on Facial Emotion Recognition Techniques: A State-Of-The-Art Literature Review.” Information Sciences 582: 593–617.
- Cao, H., D. G. Cooper, M. K. Keutmann, R. C. Gur, A. Nenkova, and R. Verma. 2014. “Crema-d: Crowd-Sourced Emotional Multimodal Actors Dataset.” IEEE Transactions on Affective Computing 5, no. 4: 377–390.
- Cen, L., F. Wu, Z. L. Yu, and F. Hu. 2016. “ A Real-Time Speech Emotion Recognition System and Its Application in Online Learning.” In Emotions, Technology, Design, and Learning, 27–46. Academic Press.
10.1016/B978-0-12-801856-9.00002-5 Google Scholar
- Chamishka, S., I. Madhavi, R. Nawaratne, et al. 2022. “A Voice-Based Real-Time Emotion Detection Technique Using Recurrent Neural Network Empowered Feature Modelling.” Multimedia Tools and Applications 81, no. 24: 35173–35194.
- Chaudhari, A., C. Bhatt, A. Krishna, and P. L. Mazzeo. 2022. “ViTFER: Facial Emotion Recognition With Vision Transformers.” Applied System Innovation 5, no. 4: 80.
- Chauhan, K., K. K. Sharma, and T. Varma. 2023. “Improved Speech Emotion Recognition Using Channel-Wise Global Head Pooling (Cwghp).” Circuits, Systems, and Signal Processing 42, no. 9: 5500–5522.
- Che, N., Y. Zhu, H. Wang, et al. 2025. “AFT-SAM: Adaptive Fusion Transformer With a Sparse Attention Mechanism for Audio–Visual Speech Recognition.” Applied Sciences (2076–3417) 15, no. 1: 199.
- Chen, J., Y. Lv, R. Xu, and C. Xu. 2019. “Automatic Social Signal Analysis: Facial Expression Recognition Using Difference Convolution Neural Network.” Journal of Parallel and Distributed Computing 131: 97–102.
- Cheng, F., J. Yu, and H. Xiong. 2010. “Facial Expression Recognition in JAFFE Dataset Based on Gaussian Process Classification.” IEEE Transactions on Neural Networks 21, no. 10: 1685–1690.
- Chowdary, M. K., T. N. Nguyen, and D. J. Hemanth. 2023. “Deep Learning-Based Facial Emotion Recognition for Human–Computer Interaction Applications.” Neural Computing and Applications 35, no. 32: 23311–23328.
- Cornejo, J. Y. R., and H. Pedrini. 2019. “ Audio-Visual Emotion Recognition Using a Hybrid Deep Convolutional Neural Network Based on Census Transform.” In 2019 IEEE International Conference on Systems, Man and Cybernetics (SMC). IEEE.
10.1109/SMC.2019.8914193 Google Scholar
- Costantini, G., I. Iaderola, A. Paoloni, and M. Todisco. 2014. “ EMOVO Corpus: An Italian Emotional Speech Database.” In Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14). European Language Resources Association (ELRA).
- Dhall, A., R. Goecke, S. Lucey, and T. Gedeon. 2011. “ Static Facial Expression Analysis in Tough Conditions: Data, Evaluation Protocol and Benchmark.” In 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops). IEEE.
10.1109/ICCVW.2011.6130508 Google Scholar
- Di Luzio, F., A. Rosato, and M. Panella. 2023. “A Randomized Deep Neural Network for Emotion Recognition With Landmarks Detection.” Biomedical Signal Processing and Control 81: 104418.
- Do, L.-N., H.-J. Yang, H.-D. Nguyen, S.-H. Kim, G.-S. Lee, and I.-S. Na. 2021. “Deep Neural Network-Based Fusion Model for Emotion Recognition Using Visual Data.” Journal of Supercomputing 77, no. 10: 10773.
- Douglas-Cowie, E., C. Cox, J.-C. Martin, et al. 2011. “ The HUMAINE Database.” In Emotion-Oriented Systems: The Humaine Handbook, 243–284. Springer.
10.1007/978-3-642-15184-2_14 Google Scholar
- Du, G., S. Long, and H. Yuan. 2020. “Non-Contact Emotion Recognition Combining Heart Rate and Facial Expression for Interactive Gaming Environments.” IEEE Access 8: 11896–11906.
- Egger, M., M. Ley, and S. Hanke. 2019. “Emotion Recognition From Physiological Signal Analysis: A Review.” Electronic Notes in Theoretical Computer Science 343: 35–55.
10.1016/j.entcs.2019.04.009 Google Scholar
- Ekman, P. 1992. “An Argument for Basic Emotions.” Cognition & Emotion 6, no. 3–4: 169–200.
- Ekman, P., and W. V. Friesen. 1978. “ Facial Action Coding System.” In Environmental Psychology & Nonverbal Behavior.
- Elyoseph, Z., E. Refoua, K. Asraf, M. Lvovsky, Y. Shimoni, and D. Hadar-Shoval. 2024. “Capacity of Generative AI to Interpret Human Emotions From Visual and Textual Data: Pilot Evaluation Study.” JMIR Mental Health 11: e54369.
- Engberg, I. S., A. V. Hansen, O. Andersen, and P. Dalsgaard. 1997. Design, Recording and Verification of a Danish Emotional Speech Database. Eurospeech.
10.21437/Eurospeech.1997-482 Google Scholar
- Essa, I. A., and A. P. Pentland. 1997. “Coding, Analysis, Interpretation, and Recognition of Facial Expressions.” IEEE Transactions on Pattern Analysis and Machine Intelligence 19, no. 7: 757–763.
- Ezzameli, K., and H. Mahersia. 2023. “Emotion Recognition From Unimodal to Multimodal Analysis: A Review.” Information Fusion 99: 101847.
- Falahzadeh, M. R., F. Farokhi, A. Harimi, and R. Sabbaghi-Nadooshan. 2023. “Deep Convolutional Neural Network and Gray Wolf Optimization Algorithm for Speech Emotion Recognition.” Circuits, Systems, and Signal Processing 42, no. 1: 449–492.
- Farhoudi, Z., and S. Setayeshi. 2021. “Fusion of Deep Learning Features With Mixture of Brain Emotional Learning for Audio-Visual Emotion Recognition.” Speech Communication 127: 92–103.
- Fayek, H. M., M. Lech, and L. Cavedon. 2017. “Evaluating Deep Learning Architectures for Speech Emotion Recognition.” Neural Networks 92: 60–68.
- Filali, H., J. Riffi, I. Aboussaleh, A. M. Mahraz, and H. Tairi. 2022. “Meaningful Learning for Deep Facial Emotional Features.” Neural Processing Letters 54, no. 1: 387–404.
- Fourati, N., and C. Pelachaud. 2014. Emilya: Emotional Body Expression in Daily Actions Database. LREC.
- Fu, J., Q. Mao, J. Tu, and Y. Zhan. 2019. “Multimodal Shared Features Learning for Emotion Recognition by Enhanced Sparse Local Discriminative Canonical Correlation Analysis.” Multimedia Systems 25, no. 5: 451–461.
- Geetha, A. V., T. Mala, D. Priyanka, and E. Uma. 2024. “Multimodal Emotion Recognition With Deep Learning: Advancements, Challenges, and Future Directions.” Information Fusion 105: 102218.
- Ghaleb, E., J. Niehues, and S. Asteriadis. 2020. “ Multimodal Attention-Mechanism for Temporal Emotion Recognition.” In 2020 IEEE International Conference on Image Processing (ICIP). IEEE.
10.1109/ICIP40778.2020.9191019 Google Scholar
- Ghaleb, E., J. Niehues, and S. Asteriadis. 2023. “Joint Modelling of Audio-Visual Cues Using Attention Mechanisms for Emotion Recognition.” Multimedia Tools and Applications 82, no. 8: 11239–11264.
- Ghaleb, E., M. Popa, and S. Asteriadis. 2019. “ Multimodal and Temporal Perception of Audio-Visual Cues for Emotion Recognition.” In 2019 8th International Conference on Affective Computing and Intelligent Interaction (ACII). IEEE.
10.1109/ACII.2019.8925444 Google Scholar
- Gideon, J., M. G. McInnis, and E. M. Provost. 2019. “Improving Cross-Corpus Speech Emotion Recognition With Adversarial Discriminative Domain Generalization (ADDoG).” IEEE Transactions on Affective Computing 12, no. 4: 1055–1068.
- Goncalves, L., and C. Busso. 2022. “Robust Audiovisual Emotion Recognition: Aligning Modalities, Capturing Temporal Information, and Handling Missing Features.” IEEE Transactions on Affective Computing 13, no. 4: 2156–2170.
- Goodfellow, I. J., D. Erhan, P. L. Carrier, et al. 2013. “ Challenges in Representation Learning: A Report on Three Machine Learning Contests.” In Neural Information Processing: 20th International Conference, ICONIP 2013. November 3–7, 2013. Proceedings, Part III 20. Springer Berlin Heidelberg.
10.1007/978-3-642-42051-1_16 Google Scholar
- Gross, R., I. Matthews, J. Cohn, T. Kanade, and S. Baker. 2010. “Multi-Pie.” Image and Vision Computing 28, no. 5: 807–813.
- Gu, X., Y. Shen, and J. Xu. 2021. “ Multimodal Emotion Recognition in Deep Learning: A Survey.” In 2021 International Conference on Culture-Oriented Science & Technology (ICCST). IEEE.
10.1109/ICCST53801.2021.00027 Google Scholar
- Guanghui, C., and Z. Xiaoping. 2021. “Multi-Modal Emotion Recognition by Fusing Correlation Features of Speech-Visual.” IEEE Signal Processing Letters 28: 533–537.
10.1109/LSP.2021.3055755 Google Scholar
- Guo, W., X. Zhao, S. Zhang, and X. Pan. 2023. “Learning Inter-Class Optical Flow Difference Using Generative Adversarial Networks for Facial Expression Recognition.” Multimedia Tools and Applications 82, no. 7: 10099–10116.
- Hajarolasvadi, N., E. Bashirov, and H. Demirel. 2021. “Video-Based Person-Dependent and Person-Independent Facial Emotion Recognition.” Signal, Image and Video Processing 15, no. 5: 1049–1056.
- Haq, S., P. J. B. Jackson, and J. D. Edge. 2008. Audio-Visual Feature Selection and Reduction for Emotion Classification. AVSP.
- He, J., X. Yu, B. Sun, and L. Yu. 2021. “Facial Expression and Action Unit Recognition Augmented by Their Dependencies on Graph Convolutional Networks.” Journal on Multimodal User Interfaces 15: 412–429.
- He, Z., Z. Li, F. Yang, et al. 2020. “Advances in Multimodal Emotion Recognition Based on Brain–Computer Interfaces.” Brain Sciences 10, no. 10: 687.
- Hossain, M. S., and G. Muhammad. 2019. “Emotion Recognition Using Secure Edge and Cloud Computing.” Information Sciences 504: 589–601.
- Hossain, S., S. Umer, R. K. Rout, and M. Tanveer. 2023. “Fine-Grained Image Analysis for Facial Expression Recognition Using Deep Convolutional Neural Networks With Bilinear Pooling.” Applied Soft Computing 134: 109997.
- Huang, Q., C. Huang, X. Wang, and F. Jiang. 2021. “Facial Expression Recognition With Grid-Wise Attention and Visual Transformer.” Information Sciences 580: 35–54.
- Issa, D., M. Fatih Demirci, and A. Yazici. 2020. “Speech Emotion Recognition With Deep Convolutional Neural Networks.” Biomedical Signal Processing and Control 59: 101894.
- Jahangir, R., Y. W. Teh, F. Hanif, and G. Mujtaba. 2021. “Deep Learning Approaches for Speech Emotion Recognition: State of the Art and Research Challenges.” Multimedia Tools and Applications 80, no. 16: 23745–23812.
- Jain, D. K., P. Shamsolmoali, and P. Sehdev. 2019. “Extended Deep Neural Network for Facial Emotion Recognition.” Pattern Recognition Letters 120: 69–74.
- Jain, D. K., Z. Zhang, and K. Huang. 2020. “Multi Angle Optimal Pattern-Based Deep Learning for Automatic Facial Expression Recognition.” Pattern Recognition Letters 139: 157–165.
- Jaratrotkamjorn, A., and A. Choksuriwong. 2019. “ Bimodal Emotion Recognition Using Deep Belief Network.” In 2019 23rd International Computer Science and Engineering Conference (ICSEC). IEEE.
10.1109/ICSEC47112.2019.8974707 Google Scholar
- Jeong, D., B.-G. Kim, and S.-Y. Dong. 2020. “Deep Joint Spatiotemporal Network (DJSTN) for Efficient Facial Expression Recognition.” Sensors 20, no. 7: 1936.
- Jia, X., S. Xu, Y. Zhou, L. Wang, and W. Li. 2023. “A Novel Dual-Channel Graph Convolutional Neural Network for Facial Action Unit Recognition.” Pattern Recognition Letters 166: 61–68.
- Jiang, P., H. Fu, H. Tao, P. Lei, and L. Zhao. 2019. “Parallelized Convolutional Recurrent Neural Network With Spectral Features for Speech Emotion Recognition.” IEEE Access 7: 90368–90377.
- Jiang, Y., W. Li, M. S. Hossain, M. Chen, A. Alelaiwi, and M. al-Hammadi. 2020. “A Snapshot Research and Implementation of Multimodal Information Fusion for Data-Driven Emotion Recognition.” Information Fusion 53: 209–221.
- John, V., and Y. Kawanishi. 2022. “ Audio and Video-Based Emotion Recognition Using Multimodal Transformers.” In 26th International Conference on Pattern Recognition (ICPR). IEEE.
- Kakuba, S., A. Poulose, and D. S. Han. 2022. “Attention-Based Multi-Learning Approach for Speech Emotion Recognition With Dilated Convolution.” IEEE Access 10: 122302–122313.
- Kalateh, S., L. A. Estrada-Jimenez, S. Nikghadam-Hojjati, and J. Barata. 2024. “A Systematic Review on Multimodal Emotion Recognition: Building Blocks, Current State, Applications, and Challenges.” IEEE Access 12: 103976–104019.
- Kansizoglou, I., L. Bampis, and A. Gasteratos. 2022. “An Active Learning Paradigm for Online Audio-Visual Emotion Recognition.” IEEE Transactions on Affective Computing 13, no. 2: 756–768.
- Khan, W. A., H. u. Qudous, and A. A. Farhan. 2024. “Speech Emotion Recognition Using Feature Fusion: A Hybrid Approach to Deep Learning.” Multimedia Tools and Applications 83: 1–75584.
- Kim, J.-H., B.-G. Kim, P. P. Roy, and D.-M. Jeong. 2019. “Efficient Facial Expression Recognition Algorithm Based on Hierarchical Deep Neural Network Structure.” IEEE Access 7: 41273–41285.
- Kim, N., S. Cho, and B. Bae. 2022. “SMaTE: A Segment-Level Feature Mixing and Temporal Encoding Framework for Facial Expression Recognition.” Sensors 22, no. 15: 5753.
- Koduru, A., H. B. Valiveti, and A. K. Budati. 2020. “Feature Extraction Algorithms to Improve the Speech Emotion Recognition Rate.” International Journal of Speech Technology 23, no. 1: 45–55.
- Koolagudi, S. G., R. Reddy, J. Yadav, et al. 2011. “ IITKGP-SEHSC: Hindi Speech Corpus for Emotion Analysis.” In 2011 International Conference on Devices and Communications (ICDeCom). IEEE.
10.1109/ICDECOM.2011.5738540 Google Scholar
- Kossaifi, J., R. Walecki, Y. Panagakis, et al. 2019. “Sewa Db: A Rich Database for Audio-Visual Emotion and Sentiment Research in the Wild.” IEEE Transactions on Pattern Analysis and Machine Intelligence 43, no. 3: 1022–1040.
10.1109/TPAMI.2019.2944808 Google Scholar
- Kumar, P., S. Malik, and B. Raman. 2024. “Interpretable Multimodal Emotion Recognition Using Hybrid Fusion of Speech and Image Data.” Multimedia Tools and Applications 83, no. 10: 28373–28394.
- Kwon, S. 2021a. “MLT-DNet: Speech Emotion Recognition Using 1D Dilated CNN Based on Multi-Learning Trick Approach.” Expert Systems with Applications 167: 114177.
- Kwon, S. 2021b. “Att-Net: Enhanced Emotion Recognition System Using Lightweight Self-Attention Module.” Applied Soft Computing 102: 107101.
- Lakshmi, K. L., P. Muthulakshmi, A. A. Nithya, et al. 2023. “Recognition of Emotions in Speech Using Deep CNN and RESNET.” Soft Computing 5: 1–17.
- Langner, O., R. Dotsch, G. Bijlstra, D. H. J. Wigboldus, S. T. Hawk, and A. van Knippenberg. 2010. “Presentation and Validation of the Radboud Faces Database.” Cognition and Emotion 24, no. 8: 1377–1388.
- Latif, S., R. Rana, S. Khalifa, R. Jurdak, J. Qadir, and B. Schuller. 2021. “Survey of Deep Representation Learning for Speech Emotion Recognition.” IEEE Transactions on Affective Computing 14, no. 2: 1634–1654.
10.1109/TAFFC.2021.3114365 Google Scholar
- Latif, S., R. Rana, S. Younis, J. Qadir, and J. Epps. 2018. “Transfer Learning for Improving Speech Emotion Classification Accuracy.” arXiv preprint arXiv:1801.06353.
- Lei, Y., and H. Cao. 2023. “Audio-Visual Emotion Recognition With Preference Learning Based on Intended and Multi-Modal Perceived Labels.” IEEE Transactions on Affective Computing 14: 2954–2969.
- Li, C., J. Wang, Y. Zhang, et al. 2023. “The Good, the Bad, and Why: Unveiling Emotions in Generative Ai.” arXiv preprint arXiv:2312.11111.
- Li, D., Y. Zhou, Z. Wang, and D. Gao. 2021. “Exploiting the Potentialities of Features for Speech Emotion Recognition.” Information Sciences 548: 328–343.
- Li, S., W. Deng, and J. P. Du. 2017. “ Reliable Crowdsourcing and Deep Locality-Preserving Learning for Expression Recognition in the Wild.” In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE.
10.1109/CVPR.2017.277 Google Scholar
- Li, X., and M. Akagi. 2019. “Improving Multilingual Speech Emotion Recognition by Combining Acoustic Features in a Three-Layer Model.” Speech Communication 110: 1–12.
- Li, Y., J. Tao, L. Chao, W. Bao, and Y. Liu. 2017. “CHEAVD: A Chinese Natural Emotional Audio–Visual Database.” Journal of Ambient Intelligence and Humanized Computing 8: 913–924.
- Lian, Z., Y. Li, J.-H. Tao, J. Huang, and M.-Y. Niu. 2020. “Expression Analysis Based on Face Regions in Real-World Conditions.” International Journal of Automation and Computing 17: 96–107.
- Liu, D., L. Chen, L. Wang, and Z. Wang. 2022. “A Multi-Modal Emotion Fusion Classification Method Combined Expression and Speech Based on Attention Mechanism.” Multimedia Tools and Applications 81, no. 29: 41677–41695.
- Liu, D., Z. Wang, L. Wang, and L. Chen. 2021. “Multi-Modal Fusion Emotion Recognition Method of Speech Expression Based on Deep Learning.” Frontiers in Neurorobotics 15: 697634.
- Liu, F., Z. Fu, Y. Wang, and Q. Zheng. 2025. “TACFN: Transformer-Based Adaptive Cross-Modal Fusion Network for Multimodal Emotion Recognition.” arXiv preprint arXiv:2505.06536.
- Liu, L.-Y., W.-Z. Liu, J. Zhou, H.-Y. Deng, and L. Feng. 2022. “ATDA: Attentional Temporal Dynamic Activation for Speech Emotion Recognition.” Knowledge-Based Systems 243: 108472.
- Liu, M., A. N. Joseph Raj, V. Rajangam, K. Ma, Z. Zhuang, and S. Zhuang. 2024. “Multiscale-Multichannel Feature Extraction and Classification Through One-Dimensional Convolutional Neural Network for Speech Emotion Recognition.” Speech Communication 156: 103010.
- Liu, P., K. Li, and H. Meng. 2022. “Group Gated Fusion on Attention-Based Bidirectional Alignment for Multimodal Emotion Recognition.” arXiv preprint arXiv:2201.06309.
- Liu, Y., J. Peng, W. Dai, J. Zeng, and S. Shan. 2023. “Joint Spatial and Scale Attention Network for Multi-View Facial Expression Recognition.” Pattern Recognition 139: 109496.
- Livingstone, S. R., and F. A. Russo. 2018. “The Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS): A Dynamic, Multimodal Set of Facial and Vocal Expressions in North American English.” PLoS One 13, no. 5: e0196391.
- Lotfian, R., and C. Busso. 2017. “Building Naturalistic Emotionally Balanced Speech Corpus by Retrieving Emotional Speech From Existing Podcast Recordings.” IEEE Transactions on Affective Computing 10, no. 4: 471–483.
10.1109/TAFFC.2017.2736999 Google Scholar
- Lucey, P., J. F. Cohn, T. Kanade, et al. 2010. “ The Extended Cohn-Kanade Dataset (Ck+): A Complete Dataset for Action Unit and Emotion-Specified Expression.” In 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition-Workshops. IEEE.
- Luna-Jiménez, C., D. Griol, Z. Callejas, R. Kleinlein, J. M. Montero, and F. Fernández-Martínez. 2021. “Multimodal Emotion Recognition on RAVDESS Dataset Using Transfer Learning.” Sensors 21, no. 22: 7665.
- Luna-Jiménez, C., R. Kleinlein, D. Griol, Z. Callejas, J. M. Montero, and F. Fernández-Martínez. 2022. “A Proposal for Multimodal Emotion Recognition Using Aural Transformers and Action Units on RAVDESS Dataset.” Applied Sciences 12, no. 1: 327.
- Ly, S. T., N.-T. Do, G.-S. Lee, et al. 2019. “Multimodal 2D and 3D for in-The-Wild Facial Expression Recognition.” Cvpr Workshops: 2927–2934.
- Ma, F., B. Sun, and S. Li. 2021. “Facial Expression Recognition With Visual Transformers and Attentional Selective Fusion.” IEEE Transactions on Affective Computing 14, no. 2: 1236–1248.
10.1109/TAFFC.2021.3122146 Google Scholar
- Ma, F., Y. Yuan, Y. Xie, et al. 2024. “Generative Technology for Human Emotion Recognition: A Scope Review.” arXiv preprint arXiv:2407.03640.
- Ma, F., W. Zhang, Y. Li, S.-L. Huang, and L. Zhang. 2020. “Learning Better Representations for Audio-Visual Emotion Recognition With Common Information.” Applied Sciences 10, no. 20: 7239.
- Ma, Y., Y. Hao, M. Chen, J. Chen, P. Lu, and A. Košir. 2019. “Audio-Visual Emotion Fusion (AVEF): A Deep Efficient Weighted Approach.” Information Fusion 46: 184–192.
- Mamieva, D., A. B. Abdusalomov, A. Kutlimuratov, B. Muminov, and T. K. Whangbo. 2023. “Multimodal Emotion Detection via Attention-Based Fusion of Extracted Facial and Speech Features.” Sensors 23, no. 12: 5475.
- Mao, K., Y. Wang, L. Ren, J. Zhang, J. Qiu, and G. Dai. 2023. “Multi-Branch Feature Learning Based Speech Emotion Recognition Using SCAR-NET.” Connection Science 35, no. 1: 2189217.
- Martin, O., I. Kotsia, B. Macq, and I. Pitas. 2006. “ The eNTERFACE'05 Audio-Visual Emotion Database.” In 22nd International Conference on Data Engineering Workshops (ICDEW'06). IEEE.
10.1109/ICDEW.2006.145 Google Scholar
- McKeown, G., M. Valstar, R. Cowie, M. Pantic, and M. Schroder. 2011. “The Semaine Database: Annotated Multimodal Records of Emotionally Colored Conversations Between a Person and a Limited Agent.” IEEE Transactions on Affective Computing 3, no. 1: 5–17.
10.1109/T-AFFC.2011.20 Google Scholar
- Meena, G., K. K. Mohbey, A. Indian, M. Z. Khan, and S. Kumar. 2024. “Identifying Emotions From Facial Expressions Using a Deep Convolutional Neural Network-Based Approach.” Multimedia Tools and Applications 83, no. 6: 15711–15732.
- Mehendale, N. 2020. “Facial Emotion Recognition Using Convolutional Neural Networks (FERC).” SN Applied Sciences 2, no. 3: 446.
- Meng, H., T. Yan, F. Yuan, and H. Wei. 2019. “Speech Emotion Recognition From 3D Log-Mel Spectrograms With Deep Learning Network.” IEEE Access 7: 125868–125881.
- Middya, A. I., B. Nag, and S. Roy. 2022. “Deep Learning Based Multimodal Emotion Recognition Using Model-Level Fusion of Audio–Visual Modalities.” Knowledge-Based Systems 244: 108580.
- Minaee, S., M. Minaei, and A. Abdolrashidi. 2021. “Deep-Emotion: Facial Expression Recognition Using Attentional Convolutional Network.” Sensors 21, no. 9: 3046.
- Mishra, S. P., P. Warule, and S. Deb. 2024. “Speech Emotion Recognition Using MFCC-Based Entropy Feature.” Signal, Image and Video Processing 18, no. 1: 153–161.
- Mocanu, B., R. Tapu, and T. Zaharia. 2023. “Multimodal Emotion Recognition Using Cross Modal Audio-Video Fusion With Attention and Deep Metric Learning.” Image and Vision Computing 133: 104676.
- Mollahosseini, A., B. Hasani, and M. H. Mahoor. 2017. “Affectnet: A Database for Facial Expression, Valence, and Arousal Computing in the Wild.” IEEE Transactions on Affective Computing 10, no. 1: 18–31.
- Morais, E., R. Hoory, W. Zhu, et al. 2022. “ Speech Emotion Recognition Using Self-Supervised Features.” In ICASSP 2022–2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE.
10.1109/ICASSP43922.2022.9747870 Google Scholar
- Mordor Intelligence. n.d. “Emotion Detection and Recognition Market Size & Share Analysis - Industry Research Report - Growth Trends.” https://www.mordorintelligence.com/industry-reports.
- Murugaiyan, S., and S. R. Uyyala. 2023. “Aspect-Based Sentiment Analysis of Customer Speech Data Using Deep Convolutional Neural Network and Bilstm.” Cognitive Computation 15, no. 3: 914–931.
- Mustaqeem, and S. Kwon. 2020. “CLSTM: Deep Feature-Based Speech Emotion Recognition Using the Hierarchical ConvLSTM Network.” Mathematics 8, no. 12: 2133.
10.3390/math8122133 Google Scholar
- Mustaqeem, and S. Kwon. 2021. “Optimal Feature Selection Based Speech Emotion Recognition Using Two-Stream Deep Convolutional Neural Network.” International Journal of Intelligent Systems 36, no. 9: 5116–5135.
- Naga, P., S. D. Marri, and R. Borreo. 2023. “Facial Emotion Recognition Methods, Datasets and Technologies: A Literature Survey.” Materials Today Proceedings 80: 2824–2828.
10.1016/j.matpr.2021.07.046 Google Scholar
- Nguyen, D., D. T. Nguyen, R. Zeng, et al. 2022. “Deep Auto-Encoders With Sequential Learning for Multimodal Dimensional Emotion Recognition.” IEEE Transactions on Multimedia 24: 1313–1324.
- Noroozi, F., M. Marjanovic, A. Njegus, S. Escalera, and G. Anbarjafari. 2019. “Audio-visual emotion recognition in video clips.” IEEE Transactions on Affective Computing 10, no. 1: 60–75.
- Ortega, J. D. S., P. Cardinal, and A. L. Koerich. 2019. “ Emotion Recognition Using Fusion of Audio and Video Features.” In 2019 IEEE International Conference on Systems, Man and Cybernetics (SMC). IEEE.
10.1109/SMC.2019.8914655 Google Scholar
- Pan, B., K. Hirota, Z. Jia, and Y. Dai. 2023. “A Review of Multimodal Emotion Recognition From Datasets, Preprocessing, Features, and Fusion Methods.” Neurocomputing 561: 126866.
10.1016/j.neucom.2023.126866 Google Scholar
- Pan, B., K. Hirota, Z. Jia, L. Zhao, X. Jin, and Y. Dai. 2023. “Multimodal Emotion Recognition Based on Feature Selection and Extreme Learning Machine in Video Clips.” Journal of Ambient Intelligence and Humanized Computing 14, no. 3: 1903–1917.
- Pantic, M., M. Valstar, R. Rademaker, and L. Maat. 2005. “ Web-Based Database for Facial Expression Analysis.” In 2005 IEEE International Conference on Multimedia and Expo. IEEE.
10.1109/ICME.2005.1521424 Google Scholar
- Patel, K., D. Mehta, C. Mistry, et al. 2020. “Facial Sentiment Analysis Using AI Techniques: State-Of-The-Art, Taxonomies, and Challenges.” IEEE Access 8: 90495–90519.
- Pichora-Fuller, M. K., and K. Dupuis. 2020. “Toronto Emotional Speech Set (TESS).” Scholars Portal Dataverse 1: 2020.
- Pise, A. A., M. A. Alqahtani, P. Verma, et al. 2022. “Methods for Facial Expression Recognition With Applications in Challenging Situations.” Computational Intelligence and Neuroscience 1, no. 2022: 9261438.
- Poria, S., E. Cambria, R. Bajpai, and A. Hussain. 2017. “A Review of Affective Computing: From Unimodal Analysis to Multimodal Fusion.” Information Fusion 37: 98–125.
- Poria, S., D. Hazarika, N. Majumder, G. Naik, E. Cambria, and R. Mihalcea. 2018. “Meld: A Multimodal Multi-Party Dataset for Emotion Recognition in Conversations. arXiv preprint arXiv:1810.02508.”
- Pravin, S. C., V. B. Sivaraman, and J. Saranya. 2023. “Deep Ensemble Models for Speech Emotion Classification.” Microprocessors and Microsystems 98: 104790.
- Rao, K. P., M. V. P. C. S. Rao, and N. H. Chowdary. 2019. “An Integrated Approach to Emotion Recognition and Gender Classification.” Journal of Visual Communication and Image Representation 60: 339–345.
- Ringeval, F., A. Sonderegger, J. Sauer, et al. 2013. “ Introducing the RECOLA Multimodal Corpus of Remote Collaborative and Affective Interactions.” In 2013 10th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG). IEEE.
10.1109/FG.2013.6553805 Google Scholar
- Roshanzamir, M., M. Jafari, R. Alizadehsani, et al. 2024. “What Happens in Face During a Facial Expression? Using Data Mining Techniques to Analyze Facial Expression Motion Vectors.” Information Systems Frontiers 26: 1–19.
- Rouast, P. V., M. T. P. Adam, and R. Chiong. 2019. “Deep Learning for Human Affect Recognition: Insights and New Developments.” IEEE Transactions on Affective Computing 12, no. 2: 524–543.
10.1109/TAFFC.2018.2890471 Google Scholar
- Russell, J. A. 1980. “A Circumplex Model of Affect.” Journal of Personality and Social Psychology 39, no. 6: 1161–1178.
- Sahoo, G. K., S. K. Das, and P. Singh. 2023. “Performance Comparison of Facial Emotion Recognition: A Transfer Learning-Based Driver Assistance Framework for In-Vehicle Applications.” Circuits, Systems, and Signal Processing 42, no. 7: 4292–4319.
- Said, Y., and M. Barr. 2021. “Human Emotion Recognition Based on Facial Expressions via Deep Learning on High-Resolution Images.” Multimedia Tools and Applications 80, no. 16: 25241–25253.
- Sharafi, M., M. Yazdchi, R. Rasti, and F. Nasimi. 2022. “A Novel Spatio-Temporal Convolutional Neural Framework for Multimodal Emotion Recognition.” Biomedical Signal Processing and Control 78: 103970.
- Siddiqui, M. F., and A. Y. Javaid. 2020. “A Multimodal Facial Emotion Recognition Framework Through the Fusion of Speech With Visible and Infrared Images.” Multimodal Technologies and Interaction 4, no. 3: 46.
- Singh, L., N. Aggarwal, and S. Singh. 2023. “PUMAVE-D: Panjab University Multilingual Audio and Video Facial Expression Dataset.” Multimedia Tools and Applications 82, no. 7: 10117–10144.
- Singh, P., M. Sahidullah, and G. Saha. 2023. “Modulation Spectral Features for Speech Emotion Recognition Using Deep Neural Networks.” Speech Communication 146: 53–69.
- Singh, Y. B., and S. Goel. 2023. “A Lightweight 2D CNN Based Approach for Speaker-Independent Emotion Recognition From Speech With New Indian Emotional Speech Corpora.” Multimedia Tools and Applications 82, no. 15: 23055–23073.
- Song, Y., Y. Cai, and L. Tan. 2021. “ Video-Audio Emotion Recognition Based on Feature Fusion Deep Learning Method.” In 2021 IEEE International Midwest Symposium on Circuits and Systems (MWSCAS). IEEE.
10.1109/MWSCAS47672.2021.9531812 Google Scholar
- Spezialetti, M., G. Placidi, and S. Rossi. 2020. “Emotion Recognition for Human-Robot Interaction: Recent Advances and Future Perspectives.” Frontiers in Robotics and AI 7: 532279.
- Sun, L., B. Zou, S. Fu, J. Chen, and F. Wang. 2019. “Speech Emotion Recognition Based on DNN-Decision Tree SVM Model.” Speech Communication 115: 29–37.
- Sun, N., Q. Li, R. Huan, J. Liu, and G. Han. 2019. “Deep Spatial-Temporal Feature Fusion for Facial Expression Recognition in Static Images.” Pattern Recognition Letters 119: 49–61.
- Szwoch, M., and W. Szwoch. 2015. “ Emotion Recognition for Affect Aware Video Games.” In Image Processing & Communications Challenges 6. Springer International Publishing.
10.1007/978-3-319-10662-5_28 Google Scholar
- Talaat, F. M., Z. H. Ali, R. R. Mostafa, and N. El-Rashidy. 2024. “Real-Time Facial Emotion Recognition Model Based on Kernel Autoencoder and Convolutional Neural Network for Autism Children.” Soft Computing 28, no. 9: 6695–6708.
- Tang, G., Y. Xie, K. Li, R. Liang, and L. Zhao. 2023. “Multimodal Emotion Recognition From Facial Expression and Speech Based on Feature Fusion.” Multimedia Tools and Applications 82, no. 11: 16359–16373.
- Tellai, M., L. Gao, and Q. Mao. 2023. “An Efficient Speech Emotion Recognition Based on a Dual-Stream CNN-Transformer Fusion Network.” International Journal of Speech Technology 26, no. 2: 541–557.
10.1007/s10772-023-10035-y Google Scholar
- Thirumuru, R., K. Gurugubelli, and A. K. Vuppala. 2022. “Novel Feature Representation Using Single Frequency Filtering and Nonlinear Energy Operator for Speech Emotion Recognition.” Digital Signal Processing 120: 103293.
- Tiwari, P., H. Rathod, S. Thakkar, and A. D. Darji. 2023. “Multimodal Emotion Recognition Using SDA-LDA Algorithm in Video Clips.” Journal of Ambient Intelligence and Humanized Computing 14, no. 6: 6585–6602.
- Ullah, R., M. Asif, W. A. Shah, et al. 2023. “Speech Emotion Recognition Using Convolution Neural Networks and Multi-Head Convolutional Transformer.” Sensors 23, no. 13: 6212.
- Vaswani, A., N. Shazeer, N. Parmar, et al. 2017. “Attention is all You Need.” Advances in Neural Information Processing Systems 30: 5998–6008.
- Wagner, J., A. Triantafyllopoulos, H. Wierstorf, et al. 2023. “Dawn of the Transformer Era in Speech Emotion Recognition: Closing the Valence Gap.” IEEE Transactions on Pattern Analysis and Machine Intelligence 45: 10745–10759.
- Wang, C., Y. Ren, N. Zhang, F. Cui, and S. Luo. 2022. “Speech Emotion Recognition Based on Multi-Feature and Multi-Lingual Fusion.” Multimedia Tools and Applications 81, no. 4: 4897–4907.
- Wang, X., X. Chen, and C. Cao. 2020. “Human Emotion Recognition by Optimally Fusing Facial Expression and Speech Feature.” Signal Processing: Image Communication 84: 115831.
- Wang, Y., and L. Guan. 2008. “Recognizing Human Emotional State From Audiovisual Signals.” IEEE Transactions on Multimedia 10, no. 5: 936–946.
- Wang, Z., Y. Wang, J. Zhang, Y. Tang, and Z. Pan. 2023. “A Lightweight Domain Adversarial Neural Network Based on Knowledge Distillation for EEG-Based Cross-Subject Emotion Recognition.” arXiv preprint arXiv:2305.07446.
- Wei, J., X. Yang, and Y. Dong. 2021. “User-Generated Video Emotion Recognition Based on Key Frames.” Multimedia Tools and Applications 80: 14343–14361.
- Wei, W., Q. Jia, Y. Feng, G. Chen, and M. Chu. 2020. “Multi-Modal Facial Expression Feature Based on Deep-Neural Networks.” Journal on Multimodal User Interfaces 14: 17–23.
- Wu, M., W. Su, L. Chen, W. Pedrycz, and K. Hirota. 2022. “Two-Stage Fuzzy Fusion Based-Convolution Neural Network for Dynamic Emotion Recognition.” IEEE Transactions on Affective Computing 13, no. 2: 805–817.
- Xie, J., M. Zhu, and K. Hu. 2023. “Fusion-Based Speech Emotion Classification Using Two-Stage Feature Selection.” Speech Communication 152: 102955.
- Xie, Y., R. Liang, Z. Liang, C. Huang, C. Zou, and B. Schuller. 2019. “Speech Emotion Classification Using Attention-Based LSTM.” IEEE/ACM Transactions on Audio, Speech and Language Processing 27, no. 11: 1675–1685.
- Yalamanchili, B., S. K. Samayamantula, and K. R. Anne. 2022. “Neural Network-Based Blended Ensemble Learning for Speech Emotion Recognition.” Multidimensional Systems and Signal Processing 33, no. 4: 1323–1348.
- Yang, B., J. Wu, K. Ikeda, et al. 2023. “Deep Learning Pipeline for Spotting Macro-and Micro-Expressions in Long Video Sequences Based on Action Units and Optical Flow.” Pattern Recognition Letters 165: 63–74.
- Yin, L., X. Wei, Y. Sun, J. Wang, and M. J. Rosato. 2006. “ A 3D Facial Expression Database for Facial Behavior Research.” In 7th International Conference on Automatic Face and Gesture Recognition (FGR06). IEEE.
- Yolcu, G., I. Oztel, S. Kazan, et al. 2019. “Facial Expression Recognition for Monitoring Neurological Disorders Based on Convolutional Neural Network.” Multimedia Tools and Applications 78: 31581–31603.
- Yoon, W.-J., Y.-H. Cho, and K.-S. Park. 2007. “ A Study of Speech Emotion Recognition and Its Application to Mobile Services.” In Ubiquitous Intelligence and Computing: 4th International Conference, UIC, July 11–13, 2007. Proceedings 4. Springer Berlin Heidelberg.
10.1007/978-3-540-73549-6_74 Google Scholar
- Yu, M., H. Zheng, Z. Peng, J. Dong, and H. du. 2020. “Facial Expression Recognition Based on a Multi-Task Global-Local Network.” Pattern Recognition Letters 131: 166–171.
- Zadeh, A., R. Zellers, E. Pincus, and L.-P. Morency. 2016. “Mosi: Multimodal Corpus of Sentiment Intensity and Subjectivity Analysis in Online Opinion Videos. arXiv preprint arXiv:1606.06259.”
- Zadeh, A. A. B., P. P. Liang, S. Poria, E. Cambria, and L.-P. Morency. 2018. “ Multimodal Language Analysis in the Wild: Cmu-Mosei Dataset and Interpretable Dynamic Fusion Graph.” In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, vol. 1. Association for Computational Linguistics (ACL).
- Zarbakhsh, P., and H. Demirel. 2020. “4D Facial Expression Recognition Using Multimodal Time Series Analysis of Geometric Landmark-Based Deformations.” Visual Computer 36, no. 5: 951–965.
- Zhalehpour, S., O. Onder, Z. Akhtar, and C. E. Erdem. 2016. “BAUM-1: A Spontaneous Audio-Visual Face Database of Affective and Mental States.” IEEE Transactions on Affective Computing 8, no. 3: 300–313.
10.1109/TAFFC.2016.2553038 Google Scholar
- Zhang, G., T. Luo, W. Pedrycz, M. A. El-Meligy, M. A. F. Sharaf, and Z. Li. 2020. “Outlier Processing in Multimodal Emotion Recognition.” IEEE Access 8: 55688–55701.
- Zhang, H., B. Huang, and G. Tian. 2020. “Facial Expression Recognition Based on Deep Convolution Long Short-Term Memory Networks of Double-Channel Weighted Mixture.” Pattern Recognition Letters 131: 128–134.
- Zhang, H., A. Jolfaei, and M. Alazab. 2019. “A Face Emotion Recognition Method Using Convolutional Neural Network and Image Edge Computing.” IEEE Access 7: 159081–159089.
- Zhang, J. T. F. L. M., and H. Jia. 2008. “ Design of Speech Corpus for Mandarin Text to Speech.” In The Blizzard Challenge 2008 Workshop. Centre for Speech Technology Research (CSTR), University of Edinburgh.
- Zhang, L., S. Walter, X. Ma, et al. 2016. “ ‘BioVid Emo DB’: A Multimodal Database for Emotion Analyses Validated by Subjective Ratings.” In 2016 IEEE Symposium Series on Computational Intelligence (SSCI). IEEE.
10.1109/SSCI.2016.7849931 Google Scholar
- Zhang, S., X. Pan, Y. Cui, X. Zhao, and L. Liu. 2019. “Learning Affective Video Features for Facial Expression Recognition via Hybrid Deep Learning.” IEEE Access 7: 32297–32304.
- Zhang, T., M. Liu, T. Yuan, and N. Al-Nabhan. 2020. “Emotion-Aware and Intelligent Internet of Medical Things Toward Emotion Recognition During COVID-19 Pandemic.” IEEE Internet of Things Journal 8, no. 21: 16002–16013.
- Zhang, T., and Z. Tan. 2024. “Survey of Deep Emotion Recognition in Dynamic Data Using Facial, Speech and Textual Cues.” Multimedia Tools and Applications 83: 1–40.
- Zhao, G., X. Huang, M. Taini, S. Z. Li, and M. Pietikäinen. 2011. “Facial Expression Recognition From Near-Infrared Videos.” Image and Vision Computing 29, no. 9: 607–619.
- Zhao, J., X. Mao, and L. Chen. 2019. “Speech Emotion Recognition Using Deep 1D & 2D CNN LSTM Networks.” Biomedical Signal Processing and Control 47: 312–323.
- Zhao, Z., Z. Bao, Y. Zhao, et al. 2019. “Exploring Deep Spectrum Representations via Attention-Based Recurrent and Convolutional Neural Networks for Speech Emotion Recognition.” IEEE Access 7: 97515–97525.
- Zhu, Z., W. Dai, Y. Hu, and J. Li. 2020. “Speech Emotion Recognition Model Based on bi-GRU and Focal Loss.” Pattern Recognition Letters 140: 358–365.