Harnessing Machine Learning to Analyse Sentiments in YouTube User Comments

Main Article Content

Yogeshchandra Puranik, Ravikant Zirmite

Abstract

The rapid expansion of online platforms has resulted in an abundance of user-generated content, creating opportunities for sentiment analysis to extract valuable insights. This study investigates the sentiments expressed in YouTube comments by leveraging machine learning techniques to uncover patterns, track trends, and interpret user opinions. Using A data compilation pertaining to 1,500 annotated comments, Six types of machine learning methods, namely Random Forest, K-Nearest Neighbour Naïve Bayes, Decision Tree,, Logistic Regression, Decision Tree, and Naïve Bayes, Support Vector Machine are applied to classify sentiments as positive, negative, or neutral. Preprocessing steps, including tokenization, lemmatization, and removal of stop words, are employed to enhance data quality and system performance. The evaluation is conducted using metrics like accuracy and F-Score, providing a comprehensive analysis of model effectiveness. Our The findings demonstrate important connections among sentiment patterns and real-world events, offering insights into public opinion dynamics. This research contributes to advancing sentiment analysis techniques and demonstrates their applicability in understanding social media interactions.
Introduction:


Due to the rapid and exponential advancement of social media platforms, analysing user-generated content has become a crucial area of research Within the domain of natural language processing (NLP). YouTube, Among the prominent largest video-sharing platforms, generates billions of comments daily, reflecting diverse public opinions, emotions, and reactions. These comments provide a rich source of unstructured data, making sentiment analysis a valuable tool for extracting actionable insights. Sentiment analysis, or Opinion mining entails the use of computational techniques to identify and analyse the sentiment polarity positive, negative, or neutral—within textual data.


Despite significant advancements Within the context of sentiment analysis for platforms like Twitter, examining the YouTube comments remains relatively underexplored because of the diversity and informal characteristics of the text. YouTube comments are often influenced by cultural, social, and contextual factors, posing unique challenges such as noise, abbreviations, and sarcasm. The moto of this study is to tackle these challenges by applying machine learning algorithms to analyse sentiments expressed in YouTube comments, providing a deeper understanding of public opinion dynamics.


In this research, a dataset comprising 1,500 annotated YouTube comments was processed and analysed using six machine learning algorithms: Random Forest (RF), Naïve Bayes (NB K-Nearest Neighbour (KNN), Logistic Regression, Decision Tree (DT), and Support Vector Machine. Key preprocessing steps, including lemmatization, tokenization, and n-gram analysis, were employed to enhance the quality of input data. The effectiveness of these models was evaluated using metrics such as accuracy and F-score.


This research work emphasizes the capabilities of machine learning techniques in sentiment analysis but also showing how these methods can reveal patterns and trends that correspond to real-world events. By focusing on YouTube comments, this research contributes to the broader understanding of user interactions on digital platforms, offering insights into public sentiment and its evolution over time.


Objectives: This research work emphasizes the capabilities of machine learning techniques in sentiment analysis but also showing how these methods can reveal patterns and trends that correspond to real-world events. By focusing on YouTube comments, this research contributes to the broader understanding of user interactions on digital platforms, offering insights into public sentiment and its evolution over time.

Article Details

Section
Articles