Scrapy ( ) is a
free and open-source
Free and open-source software (FOSS) is software available under a Software license, license that grants users the right to use, modify, and distribute the software modified or not to everyone free of charge. FOSS is an inclusive umbrella term ...
web-crawling framework written in Python. Originally designed for
web scraping
Web scraping, web harvesting, or web data extraction is data scraping used for data extraction, extracting data from websites. Web scraping software may directly access the World Wide Web using the Hypertext Transfer Protocol or a web browser. W ...
, it can also be used to extract data using
APIs or as a general-purpose web crawler. It is currently maintained by
Zyte (formerly
Scrapinghub), a web-scraping development and services company.
Scrapy project architecture is built around "spiders", which are self-contained crawlers that are given a set of instructions. Following the spirit of other
don't repeat yourself
"Don't repeat yourself" (DRY) is a principle of software development aimed at reducing repetition of information which is likely to change, replacing it with abstractions that are less likely to change, or using data normalization which avoids r ...
frameworks, such as
Django, it makes it easier to build and scale large crawling projects by allowing developers to reuse their code.
Some well-known companies and products using Scrapy are: Lyst,
Parse.ly,
Sayone Technologies,
Sciences Po
Sciences Po () or Sciences Po Paris, also known as the Paris Institute of Political Studies (), is a public research university located in Paris, France, that holds the status of ''grande école'' and the legal status of . The university's unde ...
Medialab,
Data.gov.uk’s World Government Data site.
History
Scrapy was born at London-based web-aggregation and e-commerce company Mydeco, where it was developed and maintained by employees of Mydeco and Insophia (a web-consulting company based in
Montevideo
Montevideo (, ; ) is the capital city, capital and List of cities in Uruguay, largest city of Uruguay. According to the 2023 census, the city proper has a population of 1,302,954 (about 37.2% of the country's total population) in an area of . M ...
, Uruguay). The first public release was in August 2008 under the
BSD license
BSD licenses are a family of permissive free software licenses, imposing minimal restrictions on the use and distribution of covered software. This is in contrast to copyleft licenses, which have share-alike requirements. The original BSD lic ...
, with a milestone 1.0 release happening in June 2015. In 2011, Zyte (formerly Scrapinghub) became the new official maintainer.
Interview Scraping Hub
.
References
{{Reflist, 30em
Web crawlers
Web scraping
Free software programmed in Python
Software using the BSD license