AbstractsLaw & Legal Studies

Detection and removal of redundant And illegitimate data in data Repository an empirical analysis;

by Senthilkumar P




Institution: Anna University
Department: Detection and removal of redundant And illegitimate data in data Repository an empirical analysis
Year: 2015
Keywords: Data cleansing; Levenshtein; Rabin s fingerprinting algorithm
Record ID: 1201900
Full text PDF: http://shodhganga.inflibnet.ac.in/handle/10603/38989


Abstract

Data cleansing is described as the sum of operations executed on newlineexisting data to eliminate anomalies and obtain a data collection being a newlineprecise and exclusive representation These data anomalies that contain errors discrepancies redundancies ambiguities and incompleteness hinder the newlineeffectiveness of analysis or data mining Decreasing the time and intricacies newlineof the mining process and improving the quality of datum present in the data newlinewarehouse are the important objectives of data cleansing With the intention newlineof this, the efficient technique is proposed capable of providing accurate data newlinerecords by removing the errors such as duplicate records near duplicate newlinerecords misspelling errors and illegal value errors which usually arise when newlinedata is warehoused from external sources In our proposed technique after the newlinepreprocessing steps Rabin s fingerprinting algorithm and Levenshtein newlinedistance is used for cleansing the dataset from duplicate records and nearduplicate newlinerecords respectively For correcting misspelling errors Levenshtein newlineedit distance method is utilized and the illegal value errors are identified using newlineRule Based method newline newline%%%reference p133-140.