Advancing Automatic Math Word Problem Solvers: The UMLC Dataset and Comprehensive Model Evaluation

Michael Kurisappan

doi:10.52783/pmj.v35.i2s.2639

PDF

Published: Dec 1, 2024

DOI: https://doi.org/10.52783/pmj.v35.i2s.2639

Keywords:

Mathematical word problem, Deep Learning, Natural Language Processing, NLP Datasets, pre-trained language model, Large Language Models, MATH, GSM8K, BERTGen, GPT-2, GPT-3, UMLC, Graph2Tree

Michael Kurisappan, Kanmani

Abstract

Recent years have seen a rise in the number of generative AI technologies, there has been a significant improvement in solving math word problems. This presents its own unique set of challenges because of answer accuracy while solving more complex real-world logical questions. Despite its popularity, existing benchmark datasets are suffering from limitations due to lack of a unified, standardized, fair, and comprehensive dataset to train the AI models to solve complex math reasoning. To overcome the shortcomings of previous studies outlined above, we propose the Unified Mathematical Language Challenge (UMLC), an innovative and comprehensive dataset exactly crafted to ease the solving of simple to complex Math Word Problems (MWPs). UMLC differentiates itself with its extensive range of diverse MWP combinations, spanning various mathematical concepts and problem complexities. Throughout the development, the maximum possible effort was made to cover a diverse array of scenarios, making it an in-depth resource for testing, and improving MWP solvers. Researchers are invited to explore the complexities of UMLC dataset, which serves as an invaluable asset for the advancement of Math Word Problem solving in both practical applications and academic research. The most significant observation of this study is that by using a united, consistent, and reasonable dataset with varying complexity of math word problems, may result in a significant drop in performance of majority of the well-known models.

Issue

Vol. 35 No. 2s (2025)

Section

Articles

Year	Rate
2022	24%
2021	29%
2020	36%

Article Sidebar

Main Article Content

Abstract

Article Details