Comparative Study of Character Recognition for Handwritten Characters
Main Article Content
Abstract
Optical Character Recognition (OCR) represents an important technology in the context of computer vision, which facilitates the extraction of textual data with the help images. Although a lot of investigations have examined various OCR models, there is a notable absence of comparative studies evaluating different algorithms on a unified standardized dataset. This research targets to fill this void by assessing multiple OCR models across two distinct datasets: one consisting of 28×28-pixel images and the other comprising of 64×64-pixel images. The models evaluated include ten Convolutional Neural Networks (CNNs) characterized by diverse activation functions, architectural depths, and, dropout rates, in addition to Long Short-Term Memory (LSTM) networks, Support Vector Machines (SVM), Encoder-Decoder frameworks, and Random Forest classifiers.
Through our analysis, it was revealed that the CNN-based models demonstrate exceptional performance, with the leading 64×64 CNN model achieving an accuracy of 0.9882, while the highest-performing 28×28 CNN model reported an accuracy of 0.9763. The Encoder-Decoder model also showed formidable results, achieving an accuracy of 0.9781 on the 64×64 dataset and 0.9810 on the 28×28 dataset. SVM exhibited robust performance on the higher-resolution dataset, achieving an accuracy of 0.9777, but encountered significant challenges on the 28×28 dataset, where the accuracy of only 0.7252 was reported. The Random Forest classifier maintained a consistent accuracy of 0.9538 across both datasets. Conversely, LSTM models struggled to generalize effectively for OCR applications, with the best LSTM model achieving a mere 0.0346 accuracy on the 64×64 dataset and performing poorly on the 28×28 dataset as well.
Through the comprehensive study of CNN model, it was observed that, Model 3 attained the highest accuracy of 0.9907 across both datasets, accompanied by minimal validation loss (0.0377 for 28×28 and 0.0577 for 64×64). Other CNN models exhibited varying levels of performance, with deeper architectures generally surpassing their shallower counterparts. Our methodology encompassed preprocessing the datasets, partitioning them into training and testing sets, and training each model with suitable hyperparameters. The findings underscore that CNN-based architectures are the most effective for OCR tasks, particularly at elevated resolutions.
These results offer significant insights into the efficacy of various OCR models.