Unmasking Online Aggression: Iterative Sentiment Analysis and Machine Learning for Enhanced Bully Tweet Detection

Main Article Content

Dhanroop Mal Nagar, Narpat Singh Shekhawat, Pratap Singh Barth

Abstract

This research paper details the development and iterative evaluation of various machine learning models for sentiment analysis specifically applied to the detection of bully tweets. The study systematically employs a comprehensive range of algorithms, including Support Vector Machine, Naïve Bayes, Random Forest, Logistic Regression, Bootstrap Aggregating, Gradient Boosting, Light Gradient Boosting Machine, Adaptive Boosting, and eXtreme Gradient Boosting, to classify Twitter data. The primary objective is to leverage sentiment analysis techniques to accurately identify and categorize aggressive sentiment indicative of cyberbullying within social media interactions, thereby contributing to safer online environments. Through the initial application of advanced vectorization techniques and rigorous cross-validation methods, models were evaluated using the F1-score. A critical phase involved a detailed misclassification analysis of initially top-performing models, identifying 18 specific instances where sentiment interpretation failed. This analysis informed the engineering of new sentiment-driven features, which were subsequently integrated to refine model performance. This systematic approach ultimately culminated in the identification of eXtreme Gradient Boosting, when combined with a Count Vectorizer and Stratified Shuffle Split, as the superior model. After optimization through iterative misclassification analysis and feature engineering, this model achieved an F1-Score of approximately 0.833, representing a notable improvement in discerning aggressive sentiment. This enhanced performance underscores the profound potential of machine learning, particularly through refined sentiment analysis, in addressing the complex and pervasive issue of cyberbullying by effectively analyzing the emotional nuances embedded in textual data.

Article Details

Section
Articles