Archiveteam
   HOME

TheInfoList



OR:

Archive Team is a group dedicated to
digital preservation In library science, library and archival science, digital preservation is a formal process to ensure that digital information of continuing value remains accessible and usable in the long term. It involves planning, resource allocation, and appli ...
and
web archiving Web archiving is the process of collecting, preserving, and providing access to material from the World Wide Web. The aim is to ensure that information is preserved in an archival format for research and the public. Web archivists typically ...
that was co-founded by Jason Scott in 2009. Its primary focus is the copying and preservation of content housed by at-risk online services. Some of its projects include the partial and complete preservation of services such as
GeoCities GeoCities, later Yahoo! GeoCities, was a web hosting service that allowed users to create and publish websites for free and to browse user-created websites by their theme or interest, active from 1994 to 2009. GeoCities was started in November 1 ...
, Yahoo! Video,
Google Video Google Video was a free video hosting service, originally launched by Google on January 25, 2005. Initially focused on searching TV program transcripts, it soon evolved to allow hosting video clips on Google servers and embedding onto other ...
,
Friendster Friendster was a social networking service originally based in Mountain View, California, founded by Jonathan Abrams and launched in March 2003.Eric Eldon, August 4, 2008.Friendster raises $20 million, nabs a Googler to be CEO" VentureBeat. ...
, FortuneCity, TwitPic,
SoundCloud SoundCloud is a German audio streaming service owned and operated by SoundCloud Global Limited & Co. KG. The service enables its users to upload, promote, and share audio. Founded in 2007 by Alexander Ljung and Eric Wahlforss, SoundCloud is ...
, and the "
Aaron Swartz Aaron Hillel Swartz (; November 8, 1986January 11, 2013), also known as AaronSw, was an American computer programmer, entrepreneur, writer, political organizer, and Internet hacktivism, hacktivist. As a programmer, Swartz helped develop the we ...
Memorial
JSTOR JSTOR ( ; short for ''Journal Storage'') is a digital library of academic journals, books, and primary sources founded in 1994. Originally containing digitized back issues of academic journals, it now encompasses books and other primary source ...
Liberator". Archive Team also archives
URL shortener URL shortening is a technique on the World Wide Web in which a Uniform Resource Locator (URL) may be made substantially shorter and still direct to the required page. This is achieved by using a redirect which links to the web page that has a ...
services and
wikis A wiki ( ) is a form of hypertext publication on the internet which is collaboratively edited and managed by its audience directly through a web browser. A typical wiki contains multiple pages that can either be edited by the public or l ...
on a regular basis. The content archived by Archive Team is usually made available in the
Wayback Machine The Wayback Machine is a digital archive of the World Wide Web founded by Internet Archive, an American nonprofit organization based in San Francisco, California. Launched for public access in 2001, the service allows users to go "back in ...
, that is the recommended way of accessing it. According to Jason Scott, "Archive Team was started out of anger and a feeling of powerlessness, this feeling that we were letting companies decide for us what was going to survive and what was going to die." Scott continues, "it's not our job to figure out what's valuable, to figure out what's meaningful. We work by three virtues: rage,
paranoia Paranoia is an instinct or thought process that is believed to be heavily influenced by anxiety, suspicion, or fear, often to the point of delusion and irrationality. Paranoid thinking typically includes persecutory beliefs, or beliefs of co ...
, and
kleptomania Kleptomania is the inability to resist the urge to steal items, usually for reasons other than personal use or financial gain. First described in 1816, kleptomania is classified in psychiatry as an impulse-control disorder. Some of the main ch ...
."


Warrior/Tracker system

Archive Team is composed of a loose community of independent contributors/users. Their archival process makes use of a "Warrior", a
virtual machine In computing, a virtual machine (VM) is the virtualization or emulator, emulation of a computer system. Virtual machines are based on computer architectures and provide the functionality of a physical computer. Their implementations may involve ...
environment. Individuals use the Warrior in their desktop environments to download content without requiring technical expertise. Tasks are allocated by a centrally-managed Tracker that networks with and allocates items to Warriors. The tracker also monitors user upload activity and displays a leader board.


Warrior Projects

There are several long-running Warrior projects: *
Imgur Imgur ( , stylized as imgur) is an American online image sharing and image hosting service with a focus on social gossip that was founded by Alan Schaaf in 2009. The service has hosted viral images and memes, particularly those posted on ...
: The image host Imgur updated their terms of service on April 19, 2023. This update focused on removing old, unused, and inactive content that is not tied to a user account, along with NSFW content. *
Blogger A blog (a Clipping (morphology), truncation of "weblog") is an informational website consisting of discrete, often informal diary-style text entries also known as posts. Posts are typically displayed in Reverse chronology, reverse chronologic ...
: In May 2023, Google announced that inactive accounts would be deleted starting on 2023-12-01 across their platform, including Blogger blogs. *
Reddit Reddit ( ) is an American Proprietary software, proprietary social news news aggregator, aggregation and Internet forum, forum Social media, social media platform. Registered users (commonly referred to as "redditors") submit content to the ...
: Banning communities that generate bad PR for Reddit Inc. Restricting access to APIs and data on June 19, 2023. *
Russian invasion of Ukraine On 24 February 2022, , starting the largest and deadliest war in Europe since World War II, in a major escalation of the Russo-Ukrainian War, conflict between the two countries which began in 2014. The fighting has caused hundreds of thou ...
: Archiving various .ua sites in the wake of the Russian government's invasion. *
Telegram Telegraphy is the long-distance transmission of messages where the sender uses symbolic codes, known to the recipient, rather than a physical exchange of an object bearing the message. Thus flag semaphore is a method of telegraphy, whereas pi ...
: Archiving public messages in various newsworthy and/or otherwise notable Telegram channels. *
GitHub GitHub () is a Proprietary software, proprietary developer platform that allows developers to create, store, manage, and share their code. It uses Git to provide distributed version control and GitHub itself provides access control, bug trackin ...
: When it was bought by Microsoft in 2018, many archivists and users were worried the site would become more restrictive. This project archives the UI parts of GitHub and the code of each repository. *
Mediafire MediaFire is a file hosting, file synchronization, and cloud storage service based in Shenandoah, Texas, United States. Founded in June 2006 by Derek Labian and Tom Langridge, the company provides client software for Microsoft Windows, macOS ...
: On 2020-12-18, users reported that they began receiving emails from MediaFire how they plan to classify accounts as abandoned if they fail to meet certain criteria, starting in January. * Coronavirus Outbreak: Documenting and preserving data, events, and impacts of
COVID-19 Coronavirus disease 2019 (COVID-19) is a contagious disease caused by the coronavirus SARS-CoV-2. In January 2020, the disease spread worldwide, resulting in the COVID-19 pandemic. The symptoms of COVID‑19 can vary but often include fever ...
on society. *
YouTube YouTube is an American social media and online video sharing platform owned by Google. YouTube was founded on February 14, 2005, by Steve Chen, Chad Hurley, and Jawed Karim who were three former employees of PayPal. Headquartered in ...
: Saving metadata, thumbnails, comments and selected videos. Videos and channels are to be limited to: Channels that may be deleted because company went bankrupt, channel owner died, YouTube banning certain content, and channels related to world events and politics. * Wikiteam: Saving wiki
xml Extensible Markup Language (XML) is a markup language and file format for storing, transmitting, and reconstructing data. It defines a set of rules for encoding electronic document, documents in a format that is both human-readable and Machine-r ...
dumps. * Urlteam: Saving URL shorteners. * URLs: Archiving
URL A uniform resource locator (URL), colloquially known as an address on the Web, is a reference to a resource that specifies its location on a computer network and a mechanism for retrieving it. A URL is a specific type of Uniform Resource Identi ...
s from various sources. , the largest project on ArchiveTeam is
URL A uniform resource locator (URL), colloquially known as an address on the Web, is a reference to a resource that specifies its location on a computer network and a mechanism for retrieving it. A URL is a specific type of Uniform Resource Identi ...
s, with over 10 petabytes archived.


ArchiveBot

ArchiveBot is a web archiving system operated by the Archive Team for conducting curated crawls of websites. Controlled through an IRC channel, ArchiveBot allows volunteers to submit URLs for archiving, typically in response to site shutdowns, policy changes, or other events threatening online data. Jobs are processed by a network of worker systems known as pipelines, which crawl and save content in the WARC (Web ARChive) format. Volunteers monitor active crawls (jobs) via a public dashboard and may apply ignore rules to handle problematic areas of websites—such as calendars, infinite scroll, or session-based content that can disrupt recursive crawling. The results of ArchiveBot crawls are uploaded to the
Internet Archive The Internet Archive is an American 501(c)(3) organization, non-profit organization founded in 1996 by Brewster Kahle that runs a digital library website, archive.org. It provides free access to collections of digitized media including web ...
and are typically accessible through the Wayback Machine, where they can be viewed by the public. ArchiveBot has been used to preserve a wide range of content, including user-generated platforms, news outlets, and government websites.


See also

*
Anna's Archive Anna's Archive is an open source search engine for shadow library, shadow libraries that was launched by Anna shortly after law enforcement efforts to Z-Library#United States, shut down Z-Library in 2022. The site aggregates records from major ...
*
Digital Dark Age The digital dark age is a lack of historical information in the digital age as a direct result of outdated file formats, software, or hardware that becomes corrupt, scarce, or inaccessible as technologies evolve and data decays. Future generatio ...
*
Digital hoarding Digital hoarding (also known as e-hoarding, e-clutter, data hoarding, digital pack-rattery or cyber hoarding) is defined by researchers as an emerging sub-type of hoarding disorder characterized by individuals collecting excessive digital materia ...
*
Flashpoint Archive Flashpoint Archive (formerly BlueMaxima's Flashpoint) is an Archive, archival and Preservation (library and archive), preservation project that allows Browser game, browser games, Web animation, web animations and other general Rich web applicat ...
*
Internet Archive The Internet Archive is an American 501(c)(3) organization, non-profit organization founded in 1996 by Brewster Kahle that runs a digital library website, archive.org. It provides free access to collections of digitized media including web ...
* List of digital preservation initiatives *
Wayback Machine The Wayback Machine is a digital archive of the World Wide Web founded by Internet Archive, an American nonprofit organization based in San Francisco, California. Launched for public access in 2001, the service allows users to go "back in ...


Notes


References


External links

*
Archive Team collection
at
Internet Archive The Internet Archive is an American 501(c)(3) organization, non-profit organization founded in 1996 by Brewster Kahle that runs a digital library website, archive.org. It provides free access to collections of digitized media including web ...
* * by Jason Scott
ArchiveTeam subreddit
at reddit.com {{Jason Scott Jason Scott Organizations established in 2009 2009 in Internet culture Web archiving initiatives