Filter Parallel Records Over Data Result From Distributed Web Databases
The same real entity which fetched from different sources, is an important step for data integration
that means while searching we can get the same result from multiple data bases covers duplication to avoid and
achieve integration we want to find different techniques like record matching all records matching techniques
may fail on the results of new query to address the problem we are proposing non-supervise duplicate filtering
(NDF), which, for a given query identify and filters the duplicate records from distributed data bases to achieve
the above to summarising classifier and similar record machine (SRM) classifiers are help fill to iteratively
identify the duplications so that will find the integrated data result from distributed databases.