Data Toolbar is a
Web scraping
Web scraping, web harvesting, or web data extraction is data scraping used for extracting data from websites. Web scraping software may directly access the World Wide Web using the Hypertext Transfer Protocol or a web browser. While web scraping ...
computer software
Software is a set of computer programs and associated documentation and data. This is in contrast to hardware, from which the system is built and which actually performs the work.
At the lowest programming level, executable code consists ...
add-on to the
Internet Explorer
Internet Explorer (formerly Microsoft Internet Explorer and Windows Internet Explorer, commonly abbreviated IE or MSIE) is a series of graphical user interface, graphical web browsers developed by Microsoft which was used in the Microsoft Wind ...
,
Mozilla Firefox
Mozilla Firefox, or simply Firefox, is a free and open-source web browser developed by the Mozilla Foundation and its subsidiary, the Mozilla Corporation. It uses the Gecko rendering engine to display web pages, which implements current and a ...
, and
Google Chrome
Google Chrome is a cross-platform web browser developed by Google. It was first released in 2008 for Microsoft Windows, built with free software components from Apple WebKit and Mozilla Firefox. Versions were later released for Linux, macOS ...
Web browser
A web browser is application software for accessing websites. When a user requests a web page from a particular website, the browser retrieves its files from a web server and then displays the page on the user's screen. Browsers are used on ...
s that collects and converts structured data from
Web
Web most often refers to:
* Spider web, a silken structure created by the animal
* World Wide Web or the Web, an Internet-based hypertext system
Web, WEB, or the Web may also refer to:
Computing
* WEB, a literate programming system created by ...
pages into a tabular format that can be loaded into a
spreadsheet
A spreadsheet is a computer application for computation, organization, analysis and storage of data in tabular form. Spreadsheets were developed as computerized analogs of paper accounting worksheets. The program operates on data entered in cel ...
or
database management program
In computing, a database is an organized collection of data stored and accessed electronically. Small databases can be stored on a file system, while large databases are hosted on computer clusters or cloud storage. The design of databases s ...
.
Algorithm
The program implements a variation of the genetic tree matching algorithm with respect to nested lists. That is, inside a given website, the program recursively traverses the branches of its
DOM Dom or DOM may refer to:
People and fictional characters
* Dom (given name), including fictional characters
* Dom (surname)
* Dom La Nena (born 1989), stage name of Brazilian-born cellist, singer and songwriter Dominique Pinto
* Dom people, an et ...
tree, aiming to detect nested lists of data items matching the format of the specified content. This approach is known to have several advantages over a simple string-matching algorithm.
[Nitin Jindal, Bing Liu ]
A Generalized Tree Matching Algorithm Considering Nested Lists for Web Data Extraction
' Proceedings of the Tenth SIAM International Conference on Data Mining, 2010
Features
* Collection of data and images directly from the Internet Explorer
* Collection of information from Details pages linked to the catalog
* Automatic processing of multi-page catalogs
* Support of irregular multi-row catalogs mixed with advertisement
Similar tools
*
Automation Anywhere
Automation Anywhere is an American global software company that develops robotic process automation (RPA) software.
Founded in 2003, the company is headquartered in San Jose, California.
History
Automation Anywhere was originally founded as ...
- The Web Extractor is a part of the larger automation system
Easy Web Extract- Standalone application, Windows
Mozenda- Web based service
Newprosoft- Standalone application, includes an Agent, Windows
OutWit– Standalone Application and Firefox Extension
Data Scraping Studio– Standalone Application for Windows and Chrome Extension
Diggernaut– Web platform with standalone application for Windows, Linux, MacOS and Google Chrome Extension
Sources
External links
*http://datatoolbar.com/
Internet Explorer add-ons
Web scraping