A Comparative Study of Ensemble and Transformer-Based Machine Learning Models for High-Dimensional Classification Problems

Main Article Content

Pooja Katariya, Srashti Chauhan, Shailendra Singh Bhalla, Rekha Yadav

Abstract

High-dimensional classification problems are increasingly encountered in domains such as bioinformatics, finance, image processing, and text analytics, where datasets often contain thousands of features. These problems introduce challenges including the curse of dimensionality, overfitting, feature redundancy, and increased computational complexity. Traditional machine learning algorithms often struggle to maintain performance under such conditions. Ensemble learning techniques and transformer-based architectures have emerged as powerful approaches to address these challenges. Ensemble models improve predictive accuracy by combining multiple base learners, while transformer models utilize attention mechanisms to capture complex feature relationships. This paper presents a comprehensive comparative study between ensemble methods, including Random Forest, Gradient Boosting, and XGBoost, and transformer-based models such as TabTransformer and FT-Transformer. The comparison is conducted using high-dimensional datasets, evaluating performance based on accuracy, precision, recall, F1-score, AUROC, and computational efficiency. The results demonstrate that ensemble methods provide stable and interpretable performance on structured datasets, whereas transformer-based models outperform in capturing intricate feature dependencies in highly complex data environments. The study concludes by highlighting the potential of hybrid approaches that integrate both methodologies for improved performance.

Article Details

Section
Articles