Vision-Based Deep Web Data Extraction For Web Document Clustering: Approach to vision-based deep web data extraction for the clustering of the web document (VDEC) - M. Lavanya
-30% with code BOOKS
Shipping in 15-21 days
30-day return policy
The VDEC approach comprises of two phases: 1) Vision-based web data extraction, and 2) Web document clustering. In phase 1, the web page information is segmented into various chunks from which, surplus noise and duplicate chunks are removed using three parameters, such as hyperlink percentage, noise score and cosine similarity. To identify the relevant chunk, three parameters such as Title word Relevancy, K ... Full description
You May Also Like
Description
The VDEC approach comprises of two phases: 1) Vision-based web data extraction, and 2) Web document clustering. In phase 1, the web page information is segmented into various chunks from which, surplus noise and duplicate chunks are removed using three parameters, such as hyperlink percentage, noise score and cosine similarity. To identify the relevant chunk, three parameters such as Title word Relevancy, Keyword frequency-based chunk selection, Position features are used and then, a set of keywords is extracted from those main chunks. Finally, the extracted keywords are subjected to web document clustering using Fuzzy C-Means clustering (FCM). The proposed vision based deep web data extraction is implemented and tested using synthetic dataset. The results are compared with existing two algorithms, the one is Vision-based Data Record Extraction (ViDE) and another is Mining Data Region (MDR) algorithm. From the experimental results that has been performed on two different synthetic datasets, the results showed that the proposed VDEC method can achieve stable and good results of about 99.2% and 99.1% precision value in both datasets with different threshold values provided.
More Information
| Author | M. Lavanya |
|---|---|
| Publisher | LAP LAMBERT Academic Publishing |
| Release year | 2022 |
| Cover type | Softcover |
| EAN | 9786204956060 |