A Hybrid Graph-Based and Evolutionary Learning Approach for Enhanced Software Fault Prediction Using Machine Learning and Deep Neural Architectures

Main Article Content

Durga, Anupa Sinha

Abstract

Software fault prediction has become an essential aspect of modern Software Engineering, aimed at improving software quality and reducing maintenance costs by identifying defect-prone modules at an early stage. This study presents a comprehensive comparative analysis of multiple predictive approaches, including traditional machine learning models, ensemble techniques, deep learning architectures, graph-based learning models, and a proposed hybrid optimization model. The research utilizes a structured dataset of student programming metrics, incorporating preprocessing techniques such as data cleaning, normalization, class balancing using SMOTE and SMOTE-Tomek, and feature selection through Chi-Square and Mutual Information methods. A variety of models, including Decision Tree, Random Forest, Support Vector Machine, AdaBoost, Gradient Boosting Machine, XGBoost, Deep Neural Network, Convolutional Neural Network, Graph Convolutional Network, and a hybrid Genetic Algorithm–Decision Tree (GA-DT) model, were implemented and evaluated under consistent experimental conditions. The performance of these models was assessed using standard evaluation metrics such as Accuracy, Precision, Recall, F1-Score, and ROC-AUC. The results indicate that ensemble and deep learning models outperform traditional classifiers, while graph-based models further enhance prediction capability by capturing structural relationships within software systems. The proposed GA-DT hybrid model achieved the highest performance with an accuracy of 97.13%, demonstrating the effectiveness of combining evolutionary optimization with machine learning techniques. Statistical validation using a paired t-test confirmed that graph-based models significantly improve prediction performance compared to conventional approaches. The findings highlight the importance of integrating structural learning, optimization techniques, and advanced neural architectures for accurate and reliable software fault prediction. This study contributes to the development of intelligent, data-driven solutions for enhancing software reliability and quality assurance processes.

Article Details

Section
Articles

References

Chawla, N. V., Bowyer, K. W., Hall, L. O., & Kegelmeyer, W. P. (2002). SMOTE: Synthetic minority over-sampling technique. Journal of Artificial Intelligence Research, 16, 321–357. https://doi.org/10.1613/jair.953

Chen, T., & Guestrin, C. (2016). XGBoost: A scalable tree boosting system. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 785–794). https://doi.org/10.1145/2939672.2939785

Hall, T., Beecham, S., Bowes, D., Gray, D., & Counsell, S. (2012). A systematic literature review on fault prediction performance in software engineering. IEEE Transactions on Software Engineering, 38(6), 1276–1304. https://doi.org/10.1109/TSE.2011.103

Holland, J. H. (1992). Adaptation in natural and artificial systems. MIT Press.

Kipf, T. N., & Welling, M. (2017). Semi-supervised classification with graph convolutional networks. In Proceedings of the International Conference on Learning Representations (ICLR). https://arxiv.org/abs/1609.02907

Lessmann, S., Baesens, B., Mues, C., & Pietsch, S. (2008). Benchmarking classification models for software defect prediction: A proposed framework and novel findings. IEEE Transactions on Software Engineering, 34(4), 485–496. https://doi.org/10.1109/TSE.2008.35

Pressman, R. S., & Maxim, B. R. (2014). Software engineering: A practitioner’s approach (8th ed.). McGraw-Hill Education.

Wang, S., Liu, T., Tan, L., & Xie, X. (2016). Automatically learning semantic features for defect prediction. In Proceedings of the 38th International Conference on Software Engineering (ICSE) (pp. 297–308). https://doi.org/10.1145/2884781.2884804

Zhou, Z.-H. (2012). Ensemble methods: Foundations and algorithms. CRC Press.

Breiman, L. (2001). Random forests. Machine Learning, 45(1), 5–32. https://doi.org/10.1023/A:1010933404324

Cortes, C., & Vapnik, V. (1995). Support-vector networks. Machine Learning, 20(3), 273–297. https://doi.org/10.1007/BF00994018

Freund, Y., & Schapire, R. E. (1997). A decision-theoretic generalization of on-line learning and an application to boosting. Journal of Computer and System Sciences, 55(1), 119–139. https://doi.org/10.1006/jcss.1997.1504

Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep learning. MIT Press.