Revolutionary Data Deduplication with Fuzzy C-Means: Advancing Data Quality Management

Main Article Content

P. Selvi

Abstract

Maintaining the integrity and precision of data depends on the crucial process of data deduplication, the search and elimination of duplicate data from a database. Conventional deduplication methods may not be useful when dealing with data with variances and uncertainty as from time to time depend on spotting the closest matches.  In this point, another procedure frequently applied in data clustering but especially for data deduplication is employed in this work to propose a new approach to data deduplication involving Fuzzy C-use (FCM) clustering. FCM allows to set as many data points as desired to peculiar clusters in components, in addition to the account for variation and error. Therefore, it is possible to improve the further usage of fuzzy C-Means clustering in the data deduplication field. It provides significant practical support in the areas of statistical preprocessing and information quality management among the disciplines. It is now possible to assert that in the context of global development and growing importance of big data, the improvements in this sphere are expanding the potential for creating more efficient and powerful analytic tools. The proposed research focuses on establishing whether the FCM-based deduplication strategy will enhance the false positive and false negative ratios. Due to this, it is a reliable solution in cases where data duplication can be expected to occur.

Article Details

Section
Articles