HOME

TheInfoList



OR:

Scrapy ( ) is a
free and open-source Free and open-source software (FOSS) is a term used to refer to groups of software consisting of both free software and open-source software where anyone is freely licensed to use, copy, study, and change the software in any way, and the source ...
web-crawling
framework A framework is a generic term commonly referring to an essential supporting structure which other things are built on top of. Framework may refer to: Computing * Application framework, used to implement the structure of an application for an op ...
written in Python and developed in Cambuslang. Originally designed for
web scraping Web scraping, web harvesting, or web data extraction is data scraping used for extracting data from websites. Web scraping software may directly access the World Wide Web using the Hypertext Transfer Protocol or a web browser. While web scraping ...
, it can also be used to extract data using APIs or as a general-purpose web crawler. It is currently maintained by Zyte (formerly Scrapinghub), a web-scraping development and services company. Scrapy project architecture is built around "spiders", which are self-contained crawlers that are given a set of instructions. Following the spirit of other
don't repeat yourself "Don't repeat yourself" (DRY) is a principle of software development aimed at reducing repetition of software patterns, replacing it with abstractions or using data normalization to avoid redundancy. The DRY principle is stated as "Every piece o ...
frameworks, such as Django, it makes it easier to build and scale large crawling projects by allowing developers to reuse their code. The Scrapy framework provides you with powerful features such as auto-throttle, rotating proxies and user-agents, allowing you scrape virtually undetected across the net. Scrapy also provides a web-crawling
shell Shell may refer to: Architecture and design * Shell (structure), a thin structure ** Concrete shell, a thin shell of concrete, usually with no interior columns or exterior buttresses ** Thin-shell structure Science Biology * Seashell, a hard ou ...
, which can be used by developers to test their assumptions on a site’s behavior. Some well-known companies and products using Scrapy are: Lyst, Parse.ly, Sayone Technologies,
Sciences Po , motto_lang = fr , mottoeng = Roots of the Future , type = Public university, Public research university''Grande école'' , established = , founder = Émile Boutmy , a ...
Medialab,
Data.gov.uk data.gov.uk is a UK Government project to make available non-personal UK government data as open data. It was launched in closed beta in September 2009 and publicly launched in January 2010. As of February 2015 it contained over 19,343 datasets, r ...
’s World Government Data site.


History

Scrapy was born at London-based web-aggregation and e-commerce company Mydeco, where it was developed and maintained by employees of Mydeco and Insophia (a web-consulting company based in
Montevideo Montevideo () is the Capital city, capital and List of cities in Uruguay, largest city of Uruguay. According to the 2011 census, the city proper has a population of 1,319,108 (about one-third of the country's total population) in an area of . M ...
, Uruguay). The first public release was in August 2008 under the
BSD license BSD licenses are a family of permissive free software licenses, imposing minimal restrictions on the use and distribution of covered software. This is in contrast to copyleft licenses, which have share-alike requirements. The original BSD lic ...
, with a milestone 1.0 release happening in June 2015. In 2011, Zyte (formerly Scrapinghub) became the new official maintainer.Interview Scraping Hub


References


External links

* {{Official website
Scrapy Tutorial Series
Web crawlers Web scraping Free software programmed in Python Software using the BSD license