AutoETL: A Nonlinear Deep Learning Framework for ETL Automation
Main Article Content
Abstract
This study presents a nonlinear framework for automating Extract, Transform, Load (ETL) processes. The framework uses natural language processing techniques, transformer-based models, and reinforcement learning to convert unstructured data into structured formats. It focuses on creating and refining transformation rules based on data patterns. The research addresses challenges in automating ETL processes, particularly the need to handle complex data relationships without manual input. The TPC-DI dataset is used to test the framework, which transforms financial newswire data into a structured warehouse format. The process follows ACID and OpenClass standards. The framework includes data preparation through tokenization and normalization. A transformer-based model processes sequences to identify patterns. Reinforcement learning refines transformation rules using feedback. The methods ensure structured data alignment measured through metrics like Intersection over Union (IoU), mean average precision (mAP), and mean squared error (MSE). The results show consistent performance across various data thresholds, highlighting its ability to handle diverse data patterns. This research outlines a method to automate data handling while reducing manual involvement, with potential applications across domains.