Sources
Web data integration extends and specializes data integration to see the web as a collection of views of databases accessible over the web protocols, including, but not limited to: * Open data catalogs * Government data catalogs * Web applications and sites ** UI (Data access and transformation
WDI has technical challenges different from data integration due to the data access and transformation required for the web data sources being often unstructured or semi-structured data without a standard query mechanism.Data quality
Understanding the quality and veracity of data is even more important in WDI than in data integration, as the data is generally less implicitly trusted and of lower quality than that which is collected from a trusted source. There are attempts to try to automate a trust rating for web data. Data quality in data integration can generally happen after data access and transformation, but in WDI quality may need to be monitored as data is collected, due to both the time and the cost of re-collecting the data.Applications
WDI has application in many fields, including bioinformatics, search engines, price comparison, and forensic search data analysis, business intelligence, ecommerce, healthcare, pharmaceutical and product development. Most price comparison engines and recommendation systems use user generated data to create recommendations for their users. Similarly, healthcare systems use results of competitions conducted on websites like Kaggle to see the accuracy of data and to create user-focused products. In fact, IBM estimates that poor quality WDI is costing companies over $3 trillion in revenue each year.References
{{Reflist Data management Data integration