Sentiment Classification of Emoji and Text: An Analysis of Model Families, Fusion Strategies, and Performance Gains

Main Article Content

Sapandeep Singh Sandhu, Amanpreet Kaur Sandhu

Abstract

Social media posts often combine words with emojis that intensify, invert, or clarify sentiment. Because emojis add information that is not present in text alone, removing them cannot, in principle, reduce optimal classification error; conversely, naïve handling can misinterpret polarity in practice. This study examines emoji-aware sentiment classification, including the model families, fusion strategies, emoji representations, datasets, metrics, performance gains over text-only baselines, and recurring failure modes. We reviewed 20 English-language studies (2015–2025) evaluated via a predefined schema for reproducibility, baseline parity, ablations, and statistical reporting. Emoji representations and fusion strategies form a multimodal taxonomy. Of the fusion strategies, intermediate fusion most often outperforms early and late variants despite higher resource costs, with reported results including 95 % accuracy (early-fusion BiLSTM on Weibo), gains of 2.3 %, 10.9 %, and 2.7 % (cross-attention model on MSD/TD/ERD), 80.42 % macro-F1 and 87.95 % accuracy (mutual-attention text-emoji-image model), and approximately 9-point accuracy improvement on Twitter Airline. Generalization to unseen emojis remains fragile without robust encoders, necessitating validation via paired testing and confidence intervals against threats like data leakage.

Article Details

Section
Articles