80legs
   HOME

TheInfoList



OR:

80legs is a
web crawling A Web crawler, sometimes called a spider or spiderbot and often shortened to crawler, is an Internet bot that systematically browses the World Wide Web and that is typically operated by search engines for the purpose of Web indexing (''web spid ...
service that allows its users to create and run web crawls through its
software as a service Software as a service (SaaS ) is a software licensing and delivery model in which software is licensed on a subscription basis and is centrally hosted. SaaS is also known as "on-demand software" and Web-based/Web-hosted software. SaaS is con ...
platform.


History

80legs was created by Computational Crawling, a company in
Houston, Texas Houston (; ) is the most populous city in Texas, the most populous city in the Southern United States, the fourth-most populous city in the United States, and the sixth-most populous city in North America, with a population of 2,304,580 in ...
. The company launched the private beta of 80legs in April 2009 and publicly launched the service at the DEMOfall 09 conference. At the time of its public launch, 80legs offered customized web crawling and scraping services. It has since added subscription plans and other product offerings.


Technology

80legs is built on top of a distributed
grid computing Grid computing is the use of widely distributed computer resources to reach a common goal. A computing grid can be thought of as a distributed system with non-interactive workloads that involve many files. Grid computing is distinguished from co ...
network. This grid consists of approximately 50,000 individual computers, distributed across the world, and uses bandwidth monitoring technology to prevent bandwidth cap overages. 80legs has been criticised by numerous site owners for its technology effectively acting as a
Distributed Denial of Service In computing, a denial-of-service attack (DoS attack) is a cyber-attack in which the perpetrator seeks to make a machine or network resource unavailable to its intended users by temporarily or indefinitely disrupting services of a host connect ...
attack and not obeying
robots.txt The robots exclusion standard, also known as the robots exclusion protocol or simply robots.txt, is a standard used by websites to indicate to visiting web crawlers and other web robots which portions of the site they are allowed to visit. Th ...
. As the average webmaster is not aware of the existence of 80legs, blocking access to its crawler can only be done when it is already too late, the server DDoSed, and the guilty party detected after a time-consuming in-depth analysis of the logfiles. Some rulesets for
modsecurity ModSecurity, sometimes called Modsec, is an open-source web application firewall (WAF). Originally designed as a module for the Apache HTTP Server, it has evolved to provide an array of Hypertext Transfer Protocol request and response filterin ...
(like the one from Atomicorp) block all access to the webserver from 80legs in order to prevent a DDOS. WebKnight also blocks 80legs by default. As it is a distributed crawler, it is impossible to block this crawler by IP. The best way found to block 80legs is by its UserAgent, "008".
Wrecksite Wrecksite is a non-profit organization that documents maritime wrecks around the world and is free to use. Accessing more data requires a subscription. The website is the world largest online nautical wreck database, and has 187,030 wrecks and 164, ...
blocks 80legs by default.


References


External links

* {{Web crawlers Web crawlers