HOME

TheInfoList



OR:

Data Toolbar is a
Web scraping Web scraping, web harvesting, or web data extraction is data scraping used for extracting data from websites. Web scraping software may directly access the World Wide Web using the Hypertext Transfer Protocol or a web browser. While web scraping ...
computer software Software is a set of computer programs and associated documentation and data. This is in contrast to hardware, from which the system is built and which actually performs the work. At the lowest programming level, executable code consists ...
add-on to the
Internet Explorer Internet Explorer (formerly Microsoft Internet Explorer and Windows Internet Explorer, commonly abbreviated IE or MSIE) is a series of graphical user interface, graphical web browsers developed by Microsoft which was used in the Microsoft Wind ...
,
Mozilla Firefox Mozilla Firefox, or simply Firefox, is a free and open-source web browser developed by the Mozilla Foundation and its subsidiary, the Mozilla Corporation. It uses the Gecko rendering engine to display web pages, which implements current and a ...
, and
Google Chrome Google Chrome is a cross-platform web browser developed by Google. It was first released in 2008 for Microsoft Windows, built with free software components from Apple WebKit and Mozilla Firefox. Versions were later released for Linux, macOS ...
Web browser A web browser is application software for accessing websites. When a user requests a web page from a particular website, the browser retrieves its files from a web server and then displays the page on the user's screen. Browsers are used on ...
s that collects and converts structured data from
Web Web most often refers to: * Spider web, a silken structure created by the animal * World Wide Web or the Web, an Internet-based hypertext system Web, WEB, or the Web may also refer to: Computing * WEB, a literate programming system created by ...
pages into a tabular format that can be loaded into a
spreadsheet A spreadsheet is a computer application for computation, organization, analysis and storage of data in tabular form. Spreadsheets were developed as computerized analogs of paper accounting worksheets. The program operates on data entered in cel ...
or
database management program In computing, a database is an organized collection of data stored and accessed electronically. Small databases can be stored on a file system, while large databases are hosted on computer clusters or cloud storage. The design of databases s ...
.


Algorithm

The program implements a variation of the genetic tree matching algorithm with respect to nested lists. That is, inside a given website, the program recursively traverses the branches of its
DOM Dom or DOM may refer to: People and fictional characters * Dom (given name), including fictional characters * Dom (surname) * Dom La Nena (born 1989), stage name of Brazilian-born cellist, singer and songwriter Dominique Pinto * Dom people, an et ...
tree, aiming to detect nested lists of data items matching the format of the specified content. This approach is known to have several advantages over a simple string-matching algorithm.Nitin Jindal, Bing Liu
A Generalized Tree Matching Algorithm Considering Nested Lists for Web Data Extraction
' Proceedings of the Tenth SIAM International Conference on Data Mining, 2010


Features

* Collection of data and images directly from the Internet Explorer * Collection of information from Details pages linked to the catalog * Automatic processing of multi-page catalogs * Support of irregular multi-row catalogs mixed with advertisement


Similar tools

*
Automation Anywhere Automation Anywhere is an American global software company that develops robotic process automation (RPA) software. Founded in 2003, the company is headquartered in San Jose, California. History Automation Anywhere was originally founded as ...
- The Web Extractor is a part of the larger automation system
Easy Web Extract
- Standalone application, Windows
Mozenda
- Web based service
Newprosoft
- Standalone application, includes an Agent, Windows
OutWit
– Standalone Application and Firefox Extension
Data Scraping Studio
– Standalone Application for Windows and Chrome Extension
Diggernaut
– Web platform with standalone application for Windows, Linux, MacOS and Google Chrome Extension


Sources


External links

*http://datatoolbar.com/ Internet Explorer add-ons Web scraping