Deep Learning-Based Hybridized Bi-LSTM Model for Author Identification in the Marathi Language

Main Article Content

Sunil D. Kale, Chudaman D. Sukte, Sumit A. Hirve, Swapnil K. Shinde, Amar Buchade , Nilesh Sable

Abstract

Nowadays, author identification system for linguistics articles has imperatively needed for copy right violations. It is the activity in which linguistic attempts to identify the original author of an unknown textual information based on the utilized vocabulary and the writing style of the author. There have a lot of existing studies focusing on popular languages like English, Spanish, Chinese, and so on. In this paper, we propose an author identification system for Indian language Marathi based on hybridized Bi-directional long short-term memory (Bi-LSTM) model. The Marathi language totally differs from other popular languages because it uses the Devanagari script. It is one of the complicated scripts available in most Indian languages. The key idea of proposed author identification system is to extract high level feature representation from Marathi script using very deep convolutional neural network (VD-CNN) and hybridized Bi-LSTM model used for the identification of unknown authors. The VD-CNN based feature extraction is performed based on the input of two textual data analysis methods such as Term Frequency- Inverse Document Frequency (TF-IDF) and word embedding. The proposed model is tested with Author wise Marathi Language Text Corpus and simulation results show that the average accuracy for author identification has reached 96.01% when using TF-IDF and 99.16% with word embedding. Moreover, the proposed hybridized Bi-LSTM model provides better performance compared to stand-alone convolutional neural network (CNN) and Recurrent Neural Network (RNN) based models.

Article Details

Section
Articles