Volume 40, Issue 1 e13096
ORIGINAL ARTICLE

Effective deep learning based multimodal sentiment analysis from unstructured big data

Swasthika Jain Thandaga Jwalanaiah

Corresponding Author

Swasthika Jain Thandaga Jwalanaiah

Department of Computer Science and Engineering, GITAM School of Technology, Bangalore, India

Correspondence

Swasthika Jain Thandaga Jwalanaiah, Department of Computer Science and Engineering, GITAM School of Technology, Bangalore, India.

Email: [email protected]

Search for more papers by this author
Israel Jeena Jacob

Israel Jeena Jacob

CSE, GITAM School of Technology, Bangalore, India

Search for more papers by this author
Ajay Kumar Mandava

Ajay Kumar Mandava

EECE, GITAM School of Technology, Bangalore, India

Search for more papers by this author
First published: 09 July 2022
Citations: 8

Abstract

More recently, as images, memes and graphics interchange formats have dominated social feeds, typographic/infographic visual content has emerged as an important social media component. This multimodal text combines text and image, defining a novel visual language that must be analysed because it has the potential to modify, confirm or grade the sentiment's polarity. The problem is how to effectively use information from the visual and textual content in image-text posts. This article presents a new deep learning-based multimodal sentiment analysis (MSA) model using multimodal data such as images, text and multimodal text (image with embedded text). The text analytic unit, the discretization control unit, the picture analytic component and the decision-making component are all included in this system. The discretization unit separates the text from the picture using the variant and channel augmented maximally stable extremal regions (VCA-MSERs) technique, which are then analysed as discrete elements and fed into the appropriate image and text analytics units. The text analytics system utilizes a stacked recurrent neural network with multilevel attention and feedback module (SRNN-MAFM) to detect the sentiment of the text. A deep convolutional neural network (CNN) structure with parallel-dilated convolution and self-attention module (PDC-SAM) is developed to forecast the emotional response to visual content. Finally, the decision component employs a Boolean framework including an OR function to evaluate and classify the output into three fine-grained sentiment classes: positive, neutral and negative. The proposed work is simulated in the python platform using the STS-Gold, Flickr 8k and B-T4SA datasets for sentiment analysis of text and visual and multimodal text. Simulation outcomes proved that the suggested method achieved better accuracy of 97.8%, 97.7% and 90% for text, visual and MSA individually compared to other methods.

DATA AVAILABILITY STATEMENT

Data sharing is not applicable to this article as no new data were created or analyzed in this study.

The full text of this article hosted at iucr.org is unavailable due to technical difficulties.