Scrapy ( ) is a
free and open-source
Free and open-source software (FOSS) is a term used to refer to groups of software consisting of both free software and open-source software where anyone is freely licensed to use, copy, study, and change the software in any way, and the source ...
web-crawling framework
A framework is a generic term commonly referring to an essential supporting structure which other things are built on top of.
Framework may refer to:
Computing
* Application framework, used to implement the structure of an application for an op ...
written in Python and developed in Cambuslang. Originally designed for
web scraping
Web scraping, web harvesting, or web data extraction is data scraping used for extracting data from websites. Web scraping software may directly access the World Wide Web using the Hypertext Transfer Protocol or a web browser. While web scraping ...
, it can also be used to extract data using
APIs or as a general-purpose web crawler. It is currently maintained by
Zyte (formerly
Scrapinghub), a web-scraping development and services company.
Scrapy project architecture is built around "spiders", which are self-contained crawlers that are given a set of instructions. Following the spirit of other
don't repeat yourself
"Don't repeat yourself" (DRY) is a principle of software development aimed at reducing repetition of software patterns, replacing it with abstractions or using data normalization to avoid redundancy.
The DRY principle is stated as "Every piece o ...
frameworks, such as
Django, it makes it easier to build and scale large crawling projects by allowing developers to reuse their code.
The Scrapy framework provides you with powerful features such as auto-throttle, rotating proxies and user-agents, allowing you scrape virtually undetected across the net. Scrapy also provides a web-crawling
shell
Shell may refer to:
Architecture and design
* Shell (structure), a thin structure
** Concrete shell, a thin shell of concrete, usually with no interior columns or exterior buttresses
** Thin-shell structure
Science Biology
* Seashell, a hard ou ...
, which can be used by developers to test their assumptions on a site’s behavior.
Some well-known companies and products using Scrapy are: Lyst,
Parse.ly,
Sayone Technologies,
Sciences Po
, motto_lang = fr
, mottoeng = Roots of the Future
, type = Public university, Public research university''Grande école''
, established =
, founder = Émile Boutmy
, a ...
Medialab,
Data.gov.uk
data.gov.uk is a UK Government project to make available non-personal UK government data as open data. It was launched in closed beta in September 2009 and publicly launched in January 2010. As of February 2015 it contained over 19,343 datasets, r ...
’s World Government Data site.
History
Scrapy was born at London-based web-aggregation and e-commerce company Mydeco, where it was developed and maintained by employees of Mydeco and Insophia (a web-consulting company based in
Montevideo
Montevideo () is the Capital city, capital and List of cities in Uruguay, largest city of Uruguay. According to the 2011 census, the city proper has a population of 1,319,108 (about one-third of the country's total population) in an area of . M ...
, Uruguay). The first public release was in August 2008 under the
BSD license
BSD licenses are a family of permissive free software licenses, imposing minimal restrictions on the use and distribution of covered software. This is in contrast to copyleft licenses, which have share-alike requirements. The original BSD lic ...
, with a milestone 1.0 release happening in June 2015. In 2011, Zyte (formerly Scrapinghub) became the new official maintainer.
Interview Scraping Hub
References
External links
* {{Official website
Scrapy Tutorial Series
Web crawlers
Web scraping
Free software programmed in Python
Software using the BSD license