Internet Archive
   HOME

TheInfoList



OR:

The Internet Archive is an American
non-profit A nonprofit organization (NPO), also known as a nonbusiness entity, nonprofit institution, not-for-profit organization, or simply a nonprofit, is a non-governmental (private) legal entity organized and operated for a collective, public, or so ...
organization founded in 1996 by Brewster Kahle that runs a
digital library A digital library (also called an online library, an internet library, a digital repository, a library without walls, or a digital collection) is an online database of digital resources that can include text, still images, audio, video, digital ...
website, archive.org. It provides free access to collections of digitized media including
website A website (also written as a web site) is any web page whose content is identified by a common domain name and is published on at least one web server. Websites are typically dedicated to a particular topic or purpose, such as news, educatio ...
s, software applications,
music Music is the arrangement of sound to create some combination of Musical form, form, harmony, melody, rhythm, or otherwise Musical expression, expressive content. Music is generally agreed to be a cultural universal that is present in all hum ...
,
audiovisual Audiovisual (AV) is electronic media possessing both a sound and a visual component, such as slide-tape presentations, films, television programs, corporate conferencing, church services, and live theater productions. Audiovisual service provide ...
, and print materials. The Archive also advocates a free and open
Internet The Internet (or internet) is the Global network, global system of interconnected computer networks that uses the Internet protocol suite (TCP/IP) to communicate between networks and devices. It is a internetworking, network of networks ...
. Its mission is committing to provide "universal access to all knowledge". The Internet Archive allows the public to upload and download digital material to its data cluster, but the bulk of its data is collected automatically by its
web crawler Web crawler, sometimes called a spider or spiderbot and often shortened to crawler, is an Internet bot that systematically browses the World Wide Web and that is typically operated by search engines for the purpose of Web indexing (''web spider ...
s, which work to preserve as much of the public web as possible. Its
web archive The WARC (Web ARChive) archive format specifies a method for combining multiple digital resources into an aggregate archive file together with related information. These combined resources are saved as a WARC computer file, file which can be rep ...
, the
Wayback Machine The Wayback Machine is a digital archive of the World Wide Web founded by Internet Archive, an American nonprofit organization based in San Francisco, California. Launched for public access in 2001, the service allows users to go "back in ...
, contains hundreds of billions of web captures. The Archive also oversees numerous book digitization projects, collectively one of the world's largest book digitization efforts.


History

Brewster Kahle founded the Archive in May 1996, around the same time that he began the for-profit
web crawling Web crawler, sometimes called a spider or spiderbot and often shortened to crawler, is an Internet bot that systematically browses the World Wide Web and that is typically operated by search engines for the purpose of Web indexing (''web spider ...
company
Alexa Internet Alexa Internet, Inc. was a web traffic analysis company based in San Francisco, California. It was founded as an independent company by Brewster Kahle and Bruce Gilliat in 1996. Alexa provided web traffic data, global rankings, and other info ...
. Th
earliest known archived page
on the site was saved on May 10, 1996, at 2:42 pm
UTC Coordinated Universal Time (UTC) is the primary time standard globally used to regulate clocks and time. It establishes a reference for the current time, forming the basis for civil time and time zones. UTC facilitates international communica ...
(7:42 am PDT). By October of that year, the Internet Archive had begun to archive and preserve the
World Wide Web The World Wide Web (WWW or simply the Web) is an information system that enables Content (media), content sharing over the Internet through user-friendly ways meant to appeal to users beyond Information technology, IT specialists and hobbyis ...
in large amounts. The archived content became more easily available to the general public in 2001, through the
Wayback Machine The Wayback Machine is a digital archive of the World Wide Web founded by Internet Archive, an American nonprofit organization based in San Francisco, California. Launched for public access in 2001, the service allows users to go "back in ...
. In late 1999, the Archive expanded its collections beyond the web archive, beginning with the
Prelinger Archives The Prelinger Archives is a collection of films relating to U.S. cultural history, the evolution of the American landscape, everyday life, and social history. Originally based in New York City from 1982 through 2002, it is now based in San Franci ...
. Now, the Internet Archive includes texts, audio, moving images, and
software Software consists of computer programs that instruct the Execution (computing), execution of a computer. Software also includes design documents and specifications. The history of software is closely tied to the development of digital comput ...
. It hosts a number of other projects: the
NASA The National Aeronautics and Space Administration (NASA ) is an independent agencies of the United States government, independent agency of the federal government of the United States, US federal government responsible for the United States ...
Images Archive, the contract crawling service Archive-It, and the wiki-editable library catalog and book information site
Open Library Open Library is an online project intended to create "one web page for every book ever published". Created by Aaron Swartz, Brewster Kahle, Alexis Rossi, Anand Chitipothu, and Rebecca Hargrave Malamud, Open Library is a project of the Internet ...
. Soon after that, the Archive began working to provide specialized services relating to the
information access Information access is the freedom or ability to identify, obtain and make use of database or information effectively. There are various research efforts in information access for which the objective is to simplify and make it more effective fo ...
needs of the print-disabled; publicly accessible books were made available in a protected
Digital Accessible Information System Digital accessible information system (DAISY) is a technical standard for digital audiobooks, periodicals, and computerized text. DAISY is designed to be a complete audio substitute for print material and is specifically designed for use by peop ...
(DAISY) format. According to its website: In August 2012, the Archive announced that it had added
BitTorrent BitTorrent is a Protocol (computing), communication protocol for peer-to-peer file sharing (P2P), which enables users to distribute data and electronic files over the Internet in a Decentralised system, decentralized manner. The protocol is d ...
to its file download options for more than 1.3 million existing files, and all newly uploaded files. This method is the fastest means of downloading media from the Archive, as files are served from two Archive data centers, in addition to other torrent clients which have downloaded and continue to serve the files. On November 6, 2013, the Internet Archive's headquarters in San Francisco's Richmond District caught fire, destroying equipment and damaging some nearby apartments. According to the Archive, it lost a side-building housing one of 30 of its scanning centers; cameras, lights, and scanning equipment worth hundreds of thousands of dollars; and "maybe 20 boxes of books and film, some irreplaceable, most already digitized, and some replaceable". The nonprofit Archive sought donations to cover the estimated $600,000 in damage. An overhaul of the site was launched as beta in November 2014, and the legacy layout was removed in March 2016. In November 2016, Kahle announced that the Internet Archive was building the Internet Archive of Canada, a copy of the Archive to be based somewhere in
Canada Canada is a country in North America. Its Provinces and territories of Canada, ten provinces and three territories extend from the Atlantic Ocean to the Pacific Ocean and northward into the Arctic Ocean, making it the world's List of coun ...
. The announcement received widespread coverage due to the implication that the decision to build a backup archive in a foreign country was because of the upcoming
presidency of Donald Trump Presidency of Donald Trump may refer to: * First presidency of Donald Trump, the United States presidential administration from 2017 to 2021 * Second presidency of Donald Trump, the United States presidential administration since 2025 See also * ...
. Beginning in 2017,
OCLC OCLC, Inc. See also: is an American nonprofit cooperative organization "that provides shared technology services, original research, and community programs for its membership and the library community at large". It was founded in 1967 as the ...
and the Internet Archive have collaborated to make the Archive's records of digitized books available in
WorldCat WorldCat is a union catalog that itemizes the collections of tens of thousands of institutions (mostly libraries), in many countries, that are current or past members of the OCLC global cooperative. It is operated by OCLC, Inc. Many of the O ...
. Since 2018, the Internet Archive visual arts residency, which is organized by Amir Saber Esfahani and Andrew McClintock, helps connect artists with the Archive's over 48
petabyte The byte is a unit of digital information that most commonly consists of eight bits. Historically, the byte was the number of bits used to encode a single character of text in a computer and for this reason it is the smallest addressable un ...
s of digitized materials. Over the course of the yearlong residency, visual artists create a body of work which culminates in an exhibition. The hope is to connect digital history with the arts and create something for future generations to appreciate online or off. Previous artists in residence include Taravat Talepasand, Whitney Lynn, and Jenny Odell. The Internet Archive acquires most materials from donations, such as hundreds of thousands of 78 rpm discs from
Boston Public Library The Boston Public Library is a municipal public library system in Boston, Massachusetts, founded in 1848. The Boston Public Library is also Massachusetts' Library for the Commonwealth (formerly ''library of last recourse''), meaning all adult re ...
in 2017, a donation of 250,000 books from
Trent University Trent University is a public liberal arts university in Peterborough, Ontario, with a satellite campus in Oshawa, which serves the Regional Municipality of Durham. Founded in 1964, the university is known for its Oxbridge college system, sma ...
in 2018, and the entire collection of Marygrove College's library after it closed in 2020. All material is then digitized and retained in digital storage, while a digital copy is returned to the original holder and the Internet Archive's copy, if not in the public domain, is lent to patrons worldwide one at a time under the
controlled digital lending Controlled digital lending (CDL) is a model by which libraries digitize materials in their collection and make them available for lending. It is based on interpretations of the United States copyright principles of fair use and copyright exhau ...
(CDL) theory of the
first-sale doctrine The first-sale doctrine (also sometimes referred to as the "right of first sale" or the "first sale rule") is a legal concept that limits the rights of an intellectual property owner to control resale of products embodying its intellectual prope ...
. On June 1, 2020, four large publishing houses –
Hachette Book Group Hachette Book Group, Inc. (HBG) is a publishing company owned by Hachette Livre, the largest publishing company in France, and the third largest trade and educational publisher in the world. Hachette Livre is a wholly owned subsidiary of Lagard� ...
,
Penguin Random House Penguin Random House Limited is a British-American multinational corporation, multinational conglomerate (company), conglomerate publishing company formed on July 1, 2013, with the merger of Penguin Books and Random House. Penguin Books was or ...
,
HarperCollins HarperCollins Publishers LLC is a British–American publishing company that is considered to be one of the "Big Five (publishers), Big Five" English-language publishers, along with Penguin Random House, Hachette Book Group USA, Hachette, Macmi ...
, and John Wiley – filed a lawsuit against the Internet Archive before the
United States District Court for the Southern District of New York The United States District Court for the Southern District of New York (in case citations, S.D.N.Y.) is a federal trial court whose geographic jurisdiction encompasses eight counties of the State of New York. Two of these are in New York Ci ...
, claiming that the Internet Archive's practice of
controlled digital lending Controlled digital lending (CDL) is a model by which libraries digitize materials in their collection and make them available for lending. It is based on interpretations of the United States copyright principles of fair use and copyright exhau ...
constituted
copyright infringement Copyright infringement (at times referred to as piracy) is the use of Copyright#Scope, works protected by copyright without permission for a usage where such permission is required, thereby infringing certain exclusive rights granted to the c ...
. On March 25, 2023, the court found in favor of the publishers. The negotiated judgment of August 11, 2023, barred the Internet Archive from digitally lending books for which electronic copies are on sale. Also on August 11, 2023, the
music industry The music industry are individuals and organizations that earn money by Songwriter, writing songs and musical compositions, creating and selling Sound recording and reproduction, recorded music and sheet music, presenting live music, concerts, ...
giants
Universal Music Group Universal Music Group N.V. (often abbreviated as UMG and referred to as Universal Music Group or Universal Music) is a Netherlands, Dutch–United States, American multinational Music industry, music corporation under Law of the Netherlands, ...
,
Sony Music Sony Music Entertainment (SME), commonly known as Sony Music, is an American multinational music company owned by Japanese conglomerate Sony Group Corporation. It is the recording division of Sony Music Group, with the other half being the ...
and Concord (together with their respective
labels A label (as distinct from signage) is a piece of paper, plastic film, cloth, metal, or other material affixed to a container or product. Labels are most often affixed to packaging and containers using an adhesive, or sewing when affixed to ...
Capitol Records Capitol Records, LLC (known legally as Capitol Records, Inc. until 2007), and simply known as Capitol, is an American record label owned by Universal Music Group through its Capitol Music Group imprint. It was founded as the first West Coast-base ...
,
Arista Records Arista Records ( ) is an American record label owned by Sony Music Entertainment, a subsidiary of Sony Corporation of America, the American division of the Japanese conglomerate Sony. The label was previously a division of Bertelsmann Music G ...
and CMGI Recorded Music Assets) sued the Internet Archive before the same United States District Court for the Southern District of New York over the Internet Archive's Great 78 Project for $621 million in damages from alleged copyright infringement. In September 2024,
Google Google LLC (, ) is an American multinational corporation and technology company focusing on online advertising, search engine technology, cloud computing, computer software, quantum computing, e-commerce, consumer electronics, and artificial ...
and the Internet Archive announced a collaboration where links to the wayback machine would be included in the 'more about this page' menu in
Google Search Google Search (also known simply as Google or Google.com) is a search engine operated by Google. It allows users to search for information on the World Wide Web, Web by entering keywords or phrases. Google Search uses algorithms to analyze an ...
. This collaboration effectively replaced Google's own Google Cache service that it had retired earlier that year. In September 2024, Google and the Internet Archive announced a collaboration providing links to the Wayback Machine from within
Google Search Google Search (also known simply as Google or Google.com) is a search engine operated by Google. It allows users to search for information on the World Wide Web, Web by entering keywords or phrases. Google Search uses algorithms to analyze an ...
.


Cyberattacks

During the week of May 27, 2024, the Internet Archive suffered a series of
distributed denial of service In computing, a denial-of-service attack (DoS attack) is a cyberattack in which the perpetrator seeks to make a machine or network resource unavailable to its intended users by temporarily or indefinitely disrupting services of a host conne ...
(DDoS) attacks that made its services unavailable intermittently, sometimes for hours at a time, over a period of several days. The attack was claimed on May 28 by a hacker group called SN_BLACKMETA, with possible links to
Anonymous Sudan Anonymous Sudan is a criminal hacker group that has been active since mid-January 2023. They are alleged to have committed over 35,000 distributed denial-of-service (DDoS) attacks against entire small countries, government agencies, universitie ...
. The incident drew a comparison with the 2023 British Library cyberattack, which affected the UK Web Archive. Beginning October 9, 2024, the Internet Archive's team, including archivist Jason Scott and security researcher Scott Helme, confirmed DDoS attacks, site defacement, and a data breach. The purported
hacktivist Hacktivism (or hactivism; a portmanteau of '' hack'' and ''activism''), is the use of computer-based techniques such as hacking as a form of civil disobedience to promote a political agenda or social change. A form of Internet activism with roo ...
group SN_BLACKMETA again claimed responsibility. A pop-up on the defaced site claimed that there was a "catastrophic" security breach, stating "Have you ever felt like the Internet Archive runs on sticks and is constantly on the verge of suffering a catastrophic security breach? It just happened. See 31 million of you on HIBP!" It was reported that about 31 million user accounts were affected, and compromised in a file called "ia_users.sql", dated September 28, 2024. The attackers stole users' email addresses and
Bcrypt bcrypt is a password-hashing function designed by Niels Provos and David Mazières. It is based on the Blowfish (cipher), Blowfish cipher and presented at USENIX in 1999. Besides incorporating a salt (cryptography), salt to protect against rain ...
-hashed passwords. As of October 15, 2024, the website was still mostly offline for "prioritizing keeping data safe at the expense of service availability." On October 11, Kahle said that the data is safe, and will bring the service back to normal "in days, not weeks." On October 13, the Wayback Machine was restored in a read-only format, while archiving web pages was temporarily disabled. On October 14, Brewster Kahle said " he Wayback Machinevolume is back to normal: 1,500 requests per second". On October 20, threat actors stole unrotated API tokens and breached Internet Archive on its Zendesk email support platform; they also claimed responsibility for the other breaches yet stated that SN_BLACKMETA was behind just the DDoS attacks. On October 21, Internet Archive went back online in a read-only manner. On October 22, all Internet Archive services temporarily went offline, but later that same day, only the Wayback Machine, Archive-It, and blog.archive.org were resumed. On October 23, archive.org, the Wayback Machine, Archive-It, and the Open Library services all resumed but with some features, such as logging in, still unavailable until the staff announced it back available in the next day or two. On October 25, the login feature was made available and the site has remained active since.


Operations

The Archive is a
501(c)(3) A 501(c)(3) organization is a United States corporation, Trust (business), trust, unincorporated association or other type of organization exempt from federal income tax under section 501(c)(3) of Title 26 of the United States Code. It is one of ...
nonprofit operating in the United States. In 2019, it had an annual budget of $37 million, derived from revenue from its Web crawling services, various partnerships, grants, donations, and the Kahle-Austin Foundation. The Internet Archive also manages periodic funding campaigns. For instance, a December 2019 campaign had a goal of reaching $6 million in donations. It uses
Ubuntu Ubuntu ( ) is a Linux distribution based on Debian and composed primarily of free and open-source software. Developed by the British company Canonical (company), Canonical and a community of contributors under a Meritocracy, meritocratic gover ...
as its choice of
operating system An operating system (OS) is system software that manages computer hardware and software resources, and provides common daemon (computing), services for computer programs. Time-sharing operating systems scheduler (computing), schedule tasks for ...
for the website servers. The Archive is headquartered in
San Francisco San Francisco, officially the City and County of San Francisco, is a commercial, Financial District, San Francisco, financial, and Culture of San Francisco, cultural center of Northern California. With a population of 827,526 residents as of ...
, California. From 1996 to 2009, its headquarters were in the
Presidio of San Francisco The Presidio of San Francisco (originally, El Presidio Real de San Francisco or The Royal Fortress of Saint Francis) is a park and former U.S. Army post on the northern tip of the San Francisco Peninsula in San Francisco, California, and is part ...
, a former U.S. military base. Since 2009, its headquarters have been at 300 Funston Avenue in San Francisco, a former Christian Science Church. At one time, most of its staff worked in its book-scanning centers; as of 2019, scanning is performed by 100 paid operators worldwide. The Archive also has
data center A data center is a building, a dedicated space within a building, or a group of buildings used to house computer systems and associated components, such as telecommunications and storage systems. Since IT operations are crucial for busines ...
s in three Californian cities: San Francisco,
Redwood City Redwood City is a city on the San Francisco Peninsula in the Bay Area of Northern California, approximately south of San Francisco and northwest of San Jose. The city's population was 84,292 according to the 2020 census. The Port of Redwo ...
, and
Richmond Richmond most often refers to: * Richmond, British Columbia, a city in Canada * Richmond, California, a city in the United States * Richmond, London, a town in the London Borough of Richmond upon Thames, England * Richmond, North Yorkshire, a town ...
. To reduce the risk of data loss, the Archive creates copies of parts of its collection at more distant locations, including the
Bibliotheca Alexandrina The Bibliotheca Alexandrina (Latin, 'Library of Alexandria'; , ) (BA) is a major library and cultural center on the shore of the Mediterranean Sea in Alexandria, Egypt. It is a commemoration of the Library of Alexandria, once one of the larg ...
in
Egypt Egypt ( , ), officially the Arab Republic of Egypt, is a country spanning the Northeast Africa, northeast corner of Africa and Western Asia, southwest corner of Asia via the Sinai Peninsula. It is bordered by the Mediterranean Sea to northe ...
and a facility in
Amsterdam Amsterdam ( , ; ; ) is the capital of the Netherlands, capital and Municipalities of the Netherlands, largest city of the Kingdom of the Netherlands. It has a population of 933,680 in June 2024 within the city proper, 1,457,018 in the City Re ...
. Since 2016, Internet Archive started to work to create a decentralized prototype of the digital library. From 2020, content from Internet Archive started to be stored in Filecoin. By October 2023, one petabyte of data had been uploaded to the Filecoin network. The Archive is a member of the International Internet Preservation Consortium and was officially designated as a library by the state of California in 2007.


Web archiving


Wayback Machine

The
Wayback Machine The Wayback Machine is a digital archive of the World Wide Web founded by Internet Archive, an American nonprofit organization based in San Francisco, California. Launched for public access in 2001, the service allows users to go "back in ...
is a service that allows archives of the World Wide Web to be searched and accessed. It can be used to see what previous versions of web sites used to look like or to visit web sites that no longer even exist. The Wayback Machine was created as a joint effort between
Alexa Internet Alexa Internet, Inc. was a web traffic analysis company based in San Francisco, California. It was founded as an independent company by Brewster Kahle and Bruce Gilliat in 1996. Alexa provided web traffic data, global rankings, and other info ...
(owned by
Amazon.com Amazon.com, Inc., doing business as Amazon, is an American multinational technology company engaged in e-commerce, cloud computing, online advertising, digital streaming, and artificial intelligence. Founded in 1994 by Jeff Bezos in Bellevu ...
) and the Internet Archive. Hundreds of billions of web sites and their associated data (images, source code, documents, etc.) are saved in a database. , the Internet Archive held over 866 billion web pages, more than 42.5 million print materials, 13 million videos, 3 million TV news reports, 1.2 million software programs, 14 million audio files, 5 million images, and 272,660 concerts in its Wayback Machine.


Archive-It

Created in early 2006, Archive-It is a web archiving subscription service that allows institutions and individuals to build and preserve collections of digital content and create digital archives. Archive-It allows the user to customize their capture or exclusion of web content they want to preserve for cultural heritage reasons. Through a web application, Archive-It partners can harvest, catalog, manage, browse, search, and view their archived collections. In terms of accessibility, the archived web sites are full text searchable within seven days of capture. Content collected through Archive-It is captured and stored as a WARC file. A primary and back-up copy is stored at the Internet Archive data centers. A copy of the WARC file can be given to subscribing partner institutions for geo-redundant preservation and storage purposes to their best practice standards. Periodically, the data captured through Archive-It is indexed into the Internet Archive's general archive. , Archive-It had more than 275 partner institutions in 46 U.S. states and 16 countries that have captured more than 7.4 billion URLs for more than 2,444 public collections. Archive-It partners are universities and college libraries, state archives, federal institutions, museums, law libraries, and cultural organizations, including the Electronic Literature Organization, North Carolina State Archives and Library,
Stanford University Leland Stanford Junior University, commonly referred to as Stanford University, is a Private university, private research university in Stanford, California, United States. It was founded in 1885 by railroad magnate Leland Stanford (the eighth ...
,
Columbia University Columbia University in the City of New York, commonly referred to as Columbia University, is a Private university, private Ivy League research university in New York City. Established in 1754 as King's College on the grounds of Trinity Churc ...
,
American University in Cairo The American University in Cairo (AUC; ) is a private research university in New Cairo, Egypt. The university offers American-style learning programs at undergraduate, graduate, and professional levels, along with a continuing education program. ...
, Georgetown Law Library, and many others.


Internet Archive Scholar

In September 2020, Internet Archive announced a new initiative to archive and preserve
open access Open access (OA) is a set of principles and a range of practices through which nominally copyrightable publications are delivered to readers free of access charges or other barriers. With open access strictly defined (according to the 2001 de ...
academic journals, called
Internet Archive Scholar The Internet Archive Scholar is a scholarly search engine created by the Internet Archive in 2020. , it contained over 35 million research articles with full text access. The materials available come from three different forms: content identif ...
. Its full-text search index includes over 25 million research articles and other scholarly documents preserved in the Internet Archive. The collection spans from digitized copies of eighteenth century journals through the latest open access conference proceedings and pre-prints crawled from the World Wide Web.


General Index

In 2021, the Internet Archive announced the initial version of the General Index, a publicly available
index Index (: indexes or indices) may refer to: Arts, entertainment, and media Fictional entities * Index (''A Certain Magical Index''), a character in the light novel series ''A Certain Magical Index'' * The Index, an item on the Halo Array in the ...
to a collection of 107 million academic
journal article An article or piece is a written work published in a print or electronic medium, for the propagation of news, research results, academic analysis or debate. News A news article discusses current or recent news of either general interest (i.e. ...
s.


Items and collections

The Archive stores files inside so-called items, which are similar to directories in that they can contain multiple files, but can have additional
metadata Metadata (or metainformation) is "data that provides information about other data", but not the content of the data itself, such as the text of a message or the image itself. There are many distinct types of metadata, including: * Descriptive ...
such as a description and tags which make them more searchable. Some file types can be previewed directly on the site, where as others have to be downloaded in order to be opened. If multiple multimedia files exist in an item, the website generates a playlist for video or audio files, or a slide show for pictures. If an item contains at least one video or picture, the Archive generates a preview
thumbnail Thumbnails are reduced-size versions of pictures or videos, used to help in recognizing and organizing them, serving the same role for images as a normal text index does for words. In the age of digital images, visual search engines and image-o ...
that can be seen on collection pages and in searches. Items can contain mixed data such as music files with an album cover picture, in which case the picture is used as thumbnail. Staff members of the Internet Archive organize items by placing them into so-called collections, which are pages listing multiple items.


Book collections


Text collection

The scanning performed by the Internet Archive is financially supported by libraries and foundations. , when there were approximately 1 million texts, the entire collection was greater than 500 terabytes, which included raw camera images, cropped and skewed images,
PDF Portable document format (PDF), standardized as ISO 32000, is a file format developed by Adobe Inc., Adobe in 1992 to present documents, including text formatting and images, in a manner independent of application software, computer hardware, ...
s, and raw OCR data. , the Internet Archive was operating 33 scanning centers in five countries, digitizing about 1,000 books a day for a total of more than 2 million books, in a total collection of 4.4 million booksincluding material digitized by others and fed into the Internet Archive; at that time, users were performing more than 15 million downloads per month. The material digitized by others includes more than 300,000 books that were contributed to the collection, between about 2006 and 2008, by
Microsoft Microsoft Corporation is an American multinational corporation and technology company, technology conglomerate headquartered in Redmond, Washington. Founded in 1975, the company became influential in the History of personal computers#The ear ...
through its Live Search Books project, which also included financial support and scanning equipment directly donated to the Internet Archive. On May 23, 2008, Microsoft announced it would be ending its Live Book Search project and would no longer be scanning books, donating its remaining scanning equipment to its former partners. Around October 2007, Archive users began uploading
public domain The public domain (PD) consists of all the creative work to which no Exclusive exclusive intellectual property rights apply. Those rights may have expired, been forfeited, expressly Waiver, waived, or may be inapplicable. Because no one holds ...
books from
Google Book Search Google Books (previously known as Google Book Search, Google Print, and by its code-name Project Ocean) is a service from Google that searches the full text of books and magazines that Google has scanned, converted to text using optical charac ...
. , there were more than 900,000 Google-digitized books in the Archive's collection; the books are identical to the copies found on Google, except without the Google watermarks, and are available for unrestricted use and download. Brewster Kahle revealed in 2013 that this archival effort was coordinated by
Aaron Swartz Aaron Hillel Swartz (; November 8, 1986January 11, 2013), also known as AaronSw, was an American computer programmer, entrepreneur, writer, political organizer, and Internet hacktivism, hacktivist. As a programmer, Swartz helped develop the we ...
, who, with a "bunch of friends", downloaded the public domain books from Google slowly enough and from enough computers to stay within Google's restrictions. They did this to ensure public access to the
public domain The public domain (PD) consists of all the creative work to which no Exclusive exclusive intellectual property rights apply. Those rights may have expired, been forfeited, expressly Waiver, waived, or may be inapplicable. Because no one holds ...
. The Archive ensured the items were attributed and linked back to Google, which never complained, while libraries "grumbled". According to Kahle, this is an example of Swartz's "genius" to work on what could give the most to the public good for millions of people.Brewster Kahle,
Aaron Swartz memorial at the Internet Archive
", 2013-01-24, vi
The well-prepared mind
, vi
S.I.Lex
.
In addition to books, the Archive offers free and anonymous public access to more than four million court opinions, legal briefs, or exhibits uploaded from the
United States Federal Courts The federal judiciary of the United States is one of the three branches of the federal government of the United States organized under the Constitution of the United States, United States Constitution and Law of the United States, laws of the fed ...
' PACER electronic document system via the
RECAP Recap may refer to: * Retread a resurfaced tire * Recap sequence * Dividend recapitalization * RECAP, archiving software for United States court documents *'' The Recap'' album See also * Summary (disambiguation) {{disambig ...
web browser plugin. These documents had been kept behind a federal court paywall. On the Archive, they had been accessed by more than six million people by 2013. The Archive's BookReader
web app A web application (or web app) is application software that is created with World Wide Web, web technologies and runs via a web browser. Web applications emerged during the late 1990s and allowed for the server to Dynamic web page, dynamically ...
, built into its website, has features such as single-page, two-page, and
thumbnail Thumbnails are reduced-size versions of pictures or videos, used to help in recognizing and organizing them, serving the same role for images as a normal text index does for words. In the age of digital images, visual search engines and image-o ...
modes; fullscreen mode; page zooming of
high-resolution Image resolution is the level of detail of an image. The term applies to digital images, film images, and other types of images. "Higher resolution" means more image detail. Image resolution can be measured in various ways. Resolution quantifies ...
images; and
flip page A flip page effect is a software GUI effect that visually shows a representation of a newspaper, book or leaflet as virtual paper pages that appear to be turned manually through computer animation. It is an alternative to scrolling pages. Flip ...
animation. In October 2024, the Internet Archive agreed to accept the paper copies of 400,000 uncatalogued dissertations from the
Leiden University Library Leiden University Libraries is the set of libraries of Leiden University, founded in 1575 in Leiden, Netherlands. A later edition entitled ''The bastion of liberty : a history of Leiden University'', was published in 2018. Full-text at archive ...
, from the period 1851–2004, that the library wanted to dispose of. The University had received them from foreign Universities as part of a dissertation exchange program that had begun with its foundation in 1575, continuing for nearly 430 years. The Archive plans to digitise them and make them accessible online. The original full collection included theses by
Niels Bohr Niels Henrik David Bohr (, ; ; 7 October 1885 – 18 November 1962) was a Danish theoretical physicist who made foundational contributions to understanding atomic structure and old quantum theory, quantum theory, for which he received the No ...
,
Marie Curie Maria Salomea Skłodowska-Curie (; ; 7 November 1867 – 4 July 1934), known simply as Marie Curie ( ; ), was a Polish and naturalised-French physicist and chemist who conducted pioneering research on radioactivity. She was List of female ...
,
Émile Durkheim David Émile Durkheim (; or ; 15 April 1858 – 15 November 1917) was a French Sociology, sociologist. Durkheim formally established the academic discipline of sociology and is commonly cited as one of the principal architects of modern soci ...
,
Albert Einstein Albert Einstein (14 March 187918 April 1955) was a German-born theoretical physicist who is best known for developing the theory of relativity. Einstein also made important contributions to quantum mechanics. His mass–energy equivalence f ...
,
Otto Hahn Otto Hahn (; 8 March 1879 – 28 July 1968) was a German chemist who was a pioneer in the field of radiochemistry. He is referred to as the father of nuclear chemistry and discoverer of nuclear fission, the science behind nuclear reactors and ...
,
Carl Jung Carl Gustav Jung ( ; ; 26 July 1875 – 6 June 1961) was a Swiss psychiatrist, psychotherapist, and psychologist who founded the school of analytical psychology. A prolific author of Carl Jung publications, over 20 books, illustrator, and corr ...
,
J. Robert Oppenheimer J. Robert Oppenheimer (born Julius Robert Oppenheimer ; April 22, 1904 – February 18, 1967) was an American theoretical physics, theoretical physicist who served as the director of the Manhattan Project's Los Alamos Laboratory during World ...
,
Max Planck Max Karl Ernst Ludwig Planck (; ; 23 April 1858 – 4 October 1947) was a German Theoretical physics, theoretical physicist whose discovery of energy quantum, quanta won him the Nobel Prize in Physics in 1918. Planck made many substantial con ...
,
Luigi Pirandello Luigi Pirandello (; ; 28 June 1867 – 10 December 1936) was an Italians, Italian dramatist, novelist, poet, and short story writer whose greatest contributions were his plays. He was awarded the 1934 Nobel Prize in Literature "for his bold and ...
,
Gustav Stresemann Gustav Ernst Stresemann (; 10 May 1878 – 3 October 1929) was a German statesman during the Weimar Republic who served as Chancellor of Germany#First German Republic (Weimar Republic, 1919–1933), chancellor of Germany from August to November 1 ...
and
Max Weber Maximilian Carl Emil Weber (; ; 21 April 186414 June 1920) was a German Sociology, sociologist, historian, jurist, and political economy, political economist who was one of the central figures in the development of sociology and the social sc ...
.


Open Library

The Open Library is another project of the Internet Archive. The project seeks to include a web page for every book ever published: it holds 25 million catalog records of editions. It also seeks to be a web-accessible public library: it contains the full texts of approximately 1,600,000 public domain books (out of the more than five million from the main texts collection), as well as in-print and in-copyright books, many of which are fully readable, downloadable and
full-text search In Document retrieval, text retrieval, full-text search refers to techniques for searching a single computer-stored document or a collection in a full-text database. Full-text search is distinguished from searches based on metadata or on parts of ...
able; it offers a two-week loan of
e-book An ebook (short for electronic book), also spelled as e-book or eBook, is a book publication made available in electronic form, consisting of text, images, or both, readable on the flat-panel display of computers or other electronic devices. Al ...
s in its
controlled digital lending Controlled digital lending (CDL) is a model by which libraries digitize materials in their collection and make them available for lending. It is based on interpretations of the United States copyright principles of fair use and copyright exhau ...
program for over 647,784 books not in the public domain, in partnership with over 1,000 library partners from six countries after a free registration on the web site. Open Library is a
free and open-source software Free and open-source software (FOSS) is software available under a license that grants users the right to use, modify, and distribute the software modified or not to everyone free of charge. FOSS is an inclusive umbrella term encompassing free ...
project, with its source code freely available on
GitHub GitHub () is a Proprietary software, proprietary developer platform that allows developers to create, store, manage, and share their code. It uses Git to provide distributed version control and GitHub itself provides access control, bug trackin ...
. The Open Library faces objections from some authors and the
Society of Authors The Society of Authors (SoA) is a United Kingdom trade union for professional writers, illustrators and literary translators, founded in 1884 to protect the rights and further the interests of authors. Membership of the society is open to "anyon ...
, who hold that the project is distributing books without authorization and is thus in violation of copyright laws, and four major publishers initiated a copyright infringement lawsuit against the Internet Archive in June 2020 to stop the Open Library project.


Digitizing sponsors for books

Many large institutional sponsors have helped the Internet Archive provide millions of scanned publications (text items). Some sponsors that have digitized large quantities of texts include the University of Toronto's
Robarts Library The John P. Robarts Research Library, commonly referred to as Robarts Library, is the main humanities and social sciences library of the University of Toronto Libraries and the largest individual library in the university, located at the Universit ...
,
University of Alberta Libraries The University of Alberta (also known as U of A or UAlberta, ) is a public research university located in Edmonton, Alberta, Canada. It was founded in 1908 by Alexander Cameron Rutherford, the first premier of Alberta, and Henry Marshall Tory, t ...
,
University of Ottawa The University of Ottawa (), often referred to as uOttawa or U of O, is a Official bilingualism in Canada, bilingual public research university in Ottawa, Ontario, Canada. The main campus is located on directly to the northeast of Downtown Ot ...
,
Library of Congress The Library of Congress (LOC) is a research library in Washington, D.C., serving as the library and research service for the United States Congress and the ''de facto'' national library of the United States. It also administers Copyright law o ...
,
Boston Library Consortium The Boston Library Consortium (BLC) is a library consortium based in the Boston area with 26 member institutions across New England. Membership and governance The Boston Library Consortium is a consortium of twenty-six institutions: sixteen i ...
member libraries,
Boston Public Library The Boston Public Library is a municipal public library system in Boston, Massachusetts, founded in 1848. The Boston Public Library is also Massachusetts' Library for the Commonwealth (formerly ''library of last recourse''), meaning all adult re ...
, Princeton Theological Seminary Library, and many others. In 2017, the
MIT Press The MIT Press is the university press of the Massachusetts Institute of Technology (MIT), a private research university in Cambridge, Massachusetts. The MIT Press publishes a number of academic journals and has been a pioneer in the Open Ac ...
authorized the Internet Archive to digitize and lend books from the press's backlist, with financial support from the Arcadia Fund. A year later, the Internet Archive received further funding from the Arcadia Fund to invite some other university presses to partner with the Internet Archive to digitize books, a project called "Unlocking University Press Books". The
Library of Congress The Library of Congress (LOC) is a research library in Washington, D.C., serving as the library and research service for the United States Congress and the ''de facto'' national library of the United States. It also administers Copyright law o ...
created numerous
Handle System The Handle System is a proprietary registry assigning persistent identifiers, or ''handles'', to information resources, and for resolving "those handles into the information necessary to locate, access, and otherwise make use of the resources". ...
identifiers that pointed to free digitized books in the Internet Archive. The Internet Archive and Open Library are listed on the Library of Congress website as a source of e-books.


Media collections

In addition to web archives, the Internet Archive maintains extensive collections of digital media that are attested by the uploader to be in the
public domain The public domain (PD) consists of all the creative work to which no Exclusive exclusive intellectual property rights apply. Those rights may have expired, been forfeited, expressly Waiver, waived, or may be inapplicable. Because no one holds ...
in the United States or licensed under a license that allows redistribution, such as
Creative Commons Creative Commons (CC) is an American non-profit organization and international network devoted to educational access and expanding the range of creative works available for others to build upon legally and to share. The organization has release ...
licenses. Media are organized into collections by media type (moving images, audio, text, etc.), and into sub-collections by various criteria. Each of the main collections includes a "Community" sub-collection (formerly named "Open Source") where general contributions by the public are stored.


Audio


Audio Archive

The Audio Archive includes music,
audiobook An audiobook (or a talking book) is a recording of a book or other work being read out loud. A reading of the complete text is described as "unabridged", while readings of shorter versions are abridgements. Spoken audio has been available in sch ...
s, news broadcasts, old time radio shows,
podcasts A podcast is a program made available in digital format for download over the Internet. Typically, a podcast is an episodic series of digital audio files that users can download to a personal device or stream to listen to at a time of their ...
, and a wide variety of other audio files. , there are more than 15,000,000 free
digital recording In digital recording, an audio signal, audio or video signal is converted into a stream of discrete numbers representing the changes over time in air pressure for audio, or Color, chroma and luminance values for video. This number stream is s ...
s in the collection. The subcollections include audio books and poetry, podcasts, non-English audio, and many others. The sound collections are curated by B. George, director of the ARChive of Contemporary Music.


Digital Library of Amateur Radio and Communications

A project to preserve recordings of amateur radio transmissions, with funding from the Amateur Radio Digital Communications foundation.


Live Music Archive

The Live Music Archive sub-collection includes more than 170,000 concert recordings from independent musicians, as well as more established artists and musical ensembles with permissive rules about recording their concerts, such as the
Grateful Dead The Grateful Dead was an American rock music, rock band formed in Palo Alto, California, in 1965. Known for their eclectic style that fused elements of rock, blues, jazz, Folk music, folk, country music, country, bluegrass music, bluegrass, roc ...
, and more recently,
The Smashing Pumpkins The Smashing Pumpkins (also simply known as Smashing Pumpkins) are an American alternative rock band formed in Chicago in 1988 by frontman and guitarist Billy Corgan, guitarist James Iha, bassist D'arcy Wretzky and drummer Jimmy Chamberlin. The ...
. Also, Jordan Zevon has allowed the Internet Archive to host a definitive collection of his father
Warren Zevon Warren William Zevon (January 24, 1947 – September 7, 2003) was an American rock singer and songwriter. His most famous compositions include "Werewolves of London", "Lawyers, Guns and Money" and "Roland the Headless Thompson Gunner". All t ...
's concert recordings. The Zevon collection ranges from 1976 to 2001 and contains 126 concerts including 1,137 songs.


The Great 78 Project

The Great 78 Project aims to digitize 250,000 78 rpm singles (500,000 songs) from the period between 1880 and 1960, donated by various collectors and institutions. It has been developed in collaboration with the Archive of Contemporary Music and George Blood Audio, responsible for the audio digitization.


Netlabels

The Archive has a collection of freely distributable music that is streamed and available for download via its ''Netlabels'' service. The music in this collection generally has Creative Commons-license catalogs of virtual record labels.


Images collection

This collection contains more than 3.5 million items. Cover Art Archive,
Metropolitan Museum of Art The Metropolitan Museum of Art, colloquially referred to as the Met, is an Encyclopedic museum, encyclopedic art museum in New York City. By floor area, it is the List of largest museums, third-largest museum in the world and the List of larg ...
– Gallery Images, NASA Images,
Occupy Wall Street Occupy Wall Street (OWS) was a left-wing populist movement against economic inequality, capitalism, corporate greed, big finance, and the influence of money in politics that began in Zuccotti Park, located in New York City's Financial ...
Flickr Flickr ( ) is an image hosting service, image and Online video platform, video hosting service, as well as an online community, founded in Canada and headquartered in the United States. It was created by Ludicorp in 2004 and was previously a co ...
Archive, and USGS Maps are some sub-collections of Image collection.


Cover Art Archive

The Cover Art Archive is a joint project between the Internet Archive and
MusicBrainz MusicBrainz is a MetaBrainz project that aims to create a collaborative music database that is similar to the freedb project. MusicBrainz was founded in response to the restrictions placed on the CDDB, Compact Disc Database (CDDB), a database for ...
, whose goal is to make cover art images on the Internet. this collection contains more than 1,400,000 items.


Metropolitan Museum of Art images

The images of this collection are from the
Metropolitan Museum of Art The Metropolitan Museum of Art, colloquially referred to as the Met, is an Encyclopedic museum, encyclopedic art museum in New York City. By floor area, it is the List of largest museums, third-largest museum in the world and the List of larg ...
. This collection contains more than 140,000 items.


NASA Images

The NASA Images archive was created through a Space Act Agreement between the Internet Archive and NASA to bring public access to NASA's image, video, and audio collections in a single, searchable resource. The Internet Archive NASA Images team worked closely with all of the NASA centers to keep adding to the ever-growing collection. The nasaimages.org site launched in July 2008 and had more than 100,000 items online at the end of its hosting in 2012.


Occupy Wall Street Flickr archive

This collection contains
Creative Commons Creative Commons (CC) is an American non-profit organization and international network devoted to educational access and expanding the range of creative works available for others to build upon legally and to share. The organization has release ...
-licensed photographs from Flickr related to the
Occupy Wall Street Occupy Wall Street (OWS) was a left-wing populist movement against economic inequality, capitalism, corporate greed, big finance, and the influence of money in politics that began in Zuccotti Park, located in New York City's Financial ...
movement. This collection contains more than 15,000 items.


USGS Maps

This collection contains more than 59,000 items from Libre Map Project.


Machinima Archive

One of the sub-collections of the Internet Archive's Video Archive is the
Machinima Machinima () is the use of Real-time computing, real-time computer graphics engines to create a cinematic production. The word "Machinima" is a portmanteau of the words ''machine'' and ''Film, cinema''. According to Guinness World Records, ma ...
Archive. This small section hosts many Machinima videos. Machinima is a digital artform in which
computer games A video game or computer game is an electronic game that involves interaction with a user interface or input device (such as a joystick, game controller, controller, computer keyboard, keyboard, or motion sensing device) to generate visual fe ...
,
game engine A game engine is a software framework primarily designed for the development of video games which generally includes relevant libraries and support programs such as a level editor. The "engine" terminology is akin to the term " software engine" u ...
s, or software engines are used in a sandbox-like mode to create motion pictures, recreate plays, or even publish presentations or keynotes. The archive collects a range of Machinima films from internet publishers such as
Rooster Teeth Rooster Teeth Productions, LLC was an American entertainment company headquartered in Austin, Texas. Founded in 2003 by Burnie Burns, Matt Hullum, Geoff Ramsey, Jason Saldaña, Gus Sorola, and Joel Heyman, Rooster Teeth was a subsidiary o ...
and Machinima.com as well as independent producers. The sub-collection is a collaborative effort among the Internet Archive, the How They Got Game research project at Stanford University, the Academy of Machinima Arts and Sciences, and Machinima.com.


Microfilm collection

This collection contains approximately 160,000
microfilm A microform is a scaled-down reproduction of a document, typically either photographic film or paper, made for the purposes of transmission, storage, reading, and printing. Microform images are commonly reduced to about 4% or of the original d ...
ed items from a variety of libraries including the University of Chicago Libraries,
University of Illinois at Urbana-Champaign The University of Illinois Urbana-Champaign (UIUC, U of I, Illinois, or University of Illinois) is a public land-grant research university in the Champaign–Urbana metropolitan area, Illinois, United States. Established in 1867, it is the f ...
,
University of Alberta The University of Alberta (also known as U of A or UAlberta, ) is a public research university located in Edmonton, Alberta, Canada. It was founded in 1908 by Alexander Cameron Rutherford, the first premier of Alberta, and Henry Marshall Tory, t ...
, Allen County Public Library, and
National Technical Information Service The National Technical Information Service (NTIS) is an agency within the United States Department of Commerce, U.S. Department of Commerce. The primary mission of NTIS is to collect and organize scientific, technical, engineering, and busines ...
.


Moving image collection

The Internet Archive holds a collection of approximately 3,863 feature films. Additionally, the Internet Archive's Moving Image collection includes:
newsreel A newsreel is a form of short documentary film, containing news, news stories and items of topical interest, that was prevalent between the 1910s and the mid 1970s. Typically presented in a Movie theater, cinema, newsreels were a source of cu ...
s, classic
cartoon A cartoon is a type of visual art that is typically drawn, frequently Animation, animated, in an realism (arts), unrealistic or semi-realistic style. The specific meaning has evolved, but the modern usage usually refers to either: an image or s ...
s, pro- and anti-war
propaganda Propaganda is communication that is primarily used to influence or persuade an audience to further an agenda, which may not be objective and may be selectively presenting facts to encourage a particular synthesis or perception, or using loaded l ...
, The Video Cellar Collection, Skip Elsheimer's "A.V. Geeks" collection, early television, and ephemeral material from
Prelinger Archives The Prelinger Archives is a collection of films relating to U.S. cultural history, the evolution of the American landscape, everyday life, and social history. Originally based in New York City from 1982 through 2002, it is now based in San Franci ...
, such as
advertising Advertising is the practice and techniques employed to bring attention to a Product (business), product or Service (economics), service. Advertising aims to present a product or service in terms of utility, advantages, and qualities of int ...
, educational, and industrial films, as well as amateur and home movie collections. Subcategories of this collection include: * IA's ''Brick Films'' collection, which contains
stop-motion Stop-motion (also known as stop frame animation) is an animation, animated filmmaking and special effects technique in which objects are physically manipulated in small increments between individually photographed frames so that they will appe ...
animation filmed with
Lego Lego (, ; ; stylised as LEGO) is a line of plastic construction toys manufactured by the Lego Group, a privately held company based in Billund, Denmark. Lego consists of variously coloured interlocking plastic bricks made of acrylonitri ...
bricks, some of which are "remakes" of feature films. * IA's ''Election 2004'' collection, a non-partisan public resource for sharing video materials related to the
2004 United States presidential election Presidential elections were held in the United States on November 2, 2004. Incumbent Republican President George W. Bush and his running mate, incumbent Vice President Dick Cheney, were re-elected to a second term. They narrowly defeated ...
. * IA's ''FedFlix'' collection, Joint Venture NTIS-1832 between the National Technical Information Service and Public.Resource.Org that features "the best movies of the United States Government, from training films to history, from our national parks to the U.S. Fire Academy and the Postal Inspectors" * IA's ''Independent News'' collection, which includes sub-collections such as the Internet Archive's World At War competition from 2001, in which contestants created short films demonstrating "why access to history matters". Among their most-downloaded video files are eyewitness recordings of the devastating
2004 Indian Ocean earthquake On 26 December 2004, at 07:58:53 local time (UTC+07:00, UTC+7), a major earthquake with a magnitude of 9.2–9.3 struck with an epicenter, epicentre off the west coast of Aceh in northern Sumatra, Indonesia. The Submarine earthquake, undersea ...
. * IA's ''September 11 Television Archive'', which contains archival footage from the world's major television networks of the terrorist attacks of September 11, 2001, as they unfolded on live television.


Open Educational Resources

Open Educational Resources is a digital collection at archive.org. This collection contains hundreds of free courses, video lectures, and supplemental materials from universities in the United States and
China China, officially the People's Republic of China (PRC), is a country in East Asia. With population of China, a population exceeding 1.4 billion, it is the list of countries by population (United Nations), second-most populous country after ...
. The contributors of this collection are ArsDigita University,
Hewlett Foundation The William and Flora Hewlett Foundation, commonly known as the Hewlett Foundation, is a private foundation, established by Hewlett-Packard cofounder William Redington Hewlett and his wife Flora Lamson Hewlett in 1966. The Hewlett Foundation awa ...
,
MIT The Massachusetts Institute of Technology (MIT) is a private research university in Cambridge, Massachusetts, United States. Established in 1861, MIT has played a significant role in the development of many areas of modern technology and sc ...
,
Monterey Institute Established in 1955, the Middlebury Institute of International Studies at Monterey (MIIS), formerly the Monterey Institute of International Studies, located in Monterey, California, is a graduate institute and satellite campus of Middlebury C ...
, and
Naropa University Naropa University is a private university in Boulder, Colorado, United States. Founded in 1974 by Tibetan Buddhist teacher Chögyam Trungpa, it is named after the 11th-century Indian Buddhist sage Naropa, an abbot of Nalanda. The university ...
.


TV News Search & Borrow

In September 2012, the Internet Archive launched the TV News Search & Borrow service for searching U.S. national news programs. The service is built on closed captioning transcripts and allows users to search and stream 30-second video clips. Upon launch, the service contained "350,000 news programs collected over 3 years from national U.S. networks and stations in San Francisco and Washington D.C." According to Kahle, the service was inspired by the Vanderbilt Television News Archive, a similar library of televised network news programs. In contrast to Vanderbilt, which limits access to streaming video to individuals associated with subscribing colleges and universities, the TV News Search & Borrow allows open access to its streaming video clips. In 2013, the Archive received an additional donation of "approximately 40,000 well-organized tapes" from the estate of a
Philadelphia Philadelphia ( ), colloquially referred to as Philly, is the List of municipalities in Pennsylvania, most populous city in the U.S. state of Pennsylvania and the List of United States cities by population, sixth-most populous city in the Unit ...
woman, Marion Stokes. Stokes "had recorded more than 35 years of TV news in Philadelphia and
Boston Boston is the capital and most populous city in the Commonwealth (U.S. state), Commonwealth of Massachusetts in the United States. The city serves as the cultural and Financial centre, financial center of New England, a region of the Northeas ...
with her
VHS VHS (Video Home System) is a discontinued standard for consumer-level analog video recording on tape cassettes, introduced in 1976 by JVC. It was the dominant home video format throughout the tape media period of the 1980s and 1990s. Ma ...
and
Betamax Betamax (also known as Beta, and stylized as the Greek letter Beta, β in its logo) is a discontinued consumer analog Videotape, video cassette recording format developed by Sony. It was one of the main competitors in the videotape format war ag ...
machines."


Miscellaneous collections

Brooklyn Museum collection contains approximately 3,000 items from
Brooklyn Museum The Brooklyn Museum is an art museum in the New York City borough (New York City), borough of Brooklyn. At , the museum is New York City's second largest and contains an art collection with around 500,000 objects. Located near the Prospect Heig ...
. In December 2020, the film research library of Lillian Michelson was donated to the archive.


Other services and endeavors


Physical media

Voicing a strong reaction to the idea of books simply being thrown away, and inspired by the
Svalbard Global Seed Vault The Svalbard Global Seed Vault () is a secure backup facility for the world's crop diversity on the Norwegian island of Spitsbergen in the remote Arctic Svalbard archipelago. The Seed Vault provides long-term storage for duplicates of seeds fro ...
, Kahle now envisions collecting one copy of every book ever published. "We're not going to get there, but that's our goal", he said. Alongside the books, Kahle plans to store the Internet Archive's old servers, which were replaced in 2010.


Software

The Internet Archive has "the largest collection of historical software online in the world", spanning 50 years of computer history in
terabyte The byte is a unit of digital information that most commonly consists of eight bits. Historically, the byte was the number of bits used to encode a single character of text in a computer and for this reason it is the smallest addressable un ...
s of computer magazines and journals, books,
shareware Shareware is a type of proprietary software that is initially shared by the owner for trial use at little or no cost. Often the software has limited functionality or incomplete documentation until the user sends payment to the software developer. ...
discs, FTP sites,
video game A video game or computer game is an electronic game that involves interaction with a user interface or input device (such as a joystick, game controller, controller, computer keyboard, keyboard, or motion sensing device) to generate visual fe ...
s, etc. The Internet Archive has created an archive of what it describes as "vintage software", as a way to preserve them. The project advocated an exemption from the United States
Digital Millennium Copyright Act The Digital Millennium Copyright Act (DMCA) is a 1998 United States copyright law that implements two 1996 treaties of the World Intellectual Property Organization (WIPO). It criminalizes production and dissemination of technology, devices, or ...
to permit them to bypass
copy protection Copy protection, also known as content protection, copy prevention and copy restriction, is any measure to enforce copyright by preventing the reproduction of software, films, music, and other media. Copy protection is most commonly found on vid ...
, which the
United States Copyright Office The United States Copyright Office (USCO), a part of the Library of Congress, is a United States government body that registers copyright claims, records information about copyright ownership, provides information to the public, and assists ...
approved in 2003 for a period of three years. The Archive does not offer the software for download, as the exemption is solely "for the purpose of preservation or archival reproduction of published digital works by a library or archive." The
Library of Congress The Library of Congress (LOC) is a research library in Washington, D.C., serving as the library and research service for the United States Congress and the ''de facto'' national library of the United States. It also administers Copyright law o ...
renewed the exemption in 2006, and in 2009 indefinitely extended it pending further rulemakings. The Library reiterated the exemption as a "Final Rule" with no expiration date in 2010. In 2013, the Internet Archive began to provide select video games browser-playable via
MESS The mess (also called a mess deck aboard ships) is a designated area where military personnel socialize, eat and (in some cases) live. The term is also used to indicate the groups of military personnel who belong to separate messes, such as the o ...
, for instance the
Atari 2600 The Atari 2600 is a home video game console developed and produced by Atari, Inc. Released in September 1977 as the Atari Video Computer System (Atari VCS), it popularized microprocessor-based hardware and games stored on swappable ROM cartridg ...
game ''E.T. the Extra-Terrestrial''. Since December 23, 2014, the Internet Archive presents, via a browser-based
DOSBox DOSBox is a free and open-source MS-DOS emulator. It supports running programs primarily video games that are otherwise inaccessible since hardware for running a compatible disk operating system (DOS) is obsolete and generally unavailab ...
emulation, thousands of
DOS DOS (, ) is a family of disk-based operating systems for IBM PC compatible computers. The DOS family primarily consists of IBM PC DOS and a rebranded version, Microsoft's MS-DOS, both of which were introduced in 1981. Later compatible syste ...
/PC games for "scholarship and research purposes only". In November 2020, the Archive introduced a new emulator for
Adobe Flash Adobe Flash (formerly Macromedia Flash and FutureSplash) is a mostly discontinuedAlthough it is discontinued by Adobe Inc., for the Chinese market it is developed by Zhongcheng and for the international enterprise market it is developed by Ha ...
called Ruffle, and began archiving Flash animations and games ahead of the December 31, 2020, end-of-life for the Flash plugin across all computer systems.


Table Top Scribe System

A combined hardware software system has been developed that performs a safe method of digitizing content.


Credit Union

From 2012 to November 2015, the Internet Archive operated the Internet Archive Federal Credit Union, a federal credit union based in
New Brunswick, New Jersey New Brunswick is a city (New Jersey), city in and the county seat of Middlesex County, New Jersey, Middlesex County, in the U.S. state of New Jersey.National Credit Union Administration The National Credit Union Administration (NCUA) is an American government-backed insurer of Credit unions in the United States, credit unions in the United States, one of two agencies that provide deposit insurance to depositors in U.S. deposi ...
, which severely limited the IAFCU's loan portfolio and concerns over serving
Bitcoin Bitcoin (abbreviation: BTC; Currency symbol, sign: ₿) is the first Decentralized application, decentralized cryptocurrency. Based on a free-market ideology, bitcoin was invented in 2008 when an unknown entity published a white paper under ...
firms. At the time of its dissolution, it consisted of 395 members and was worth $2.5 million.


Decentralization

Since 2019, the Internet Archive organizes an event called Decentralized Web Camp (DWeb Camp). It is an annual camp that brings together a diverse global community of contributors in a natural setting. The camp aims to tackle real-world challenges facing the web and co-create decentralized technologies for a better internet. It aims to foster collaboration, learning, and fun while promoting principles of trust, human agency, mutual respect, and ecological awareness.


Wayforward Machine

On September 30, 2021, as a part of its 25th anniversary celebration, Internet Archive launched the "Wayforward Machine", a
satirical Satire is a genre of the visual arts, visual, literature, literary, and performing arts, usually in the form of fiction and less frequently Nonfiction, non-fiction, in which vices, follies, abuses, and shortcomings are held up to ridicule, ...
, fictional website covered with pop-ups asking for personal information. The site was intended to depict a
fiction Fiction is any creative work, chiefly any narrative work, portraying character (arts), individuals, events, or setting (narrative), places that are imagination, imaginary or in ways that are imaginary. Fictional portrayals are thus inconsistent ...
al
dystopia A dystopia (lit. "bad place") is an imagined world or society in which people lead wretched, dehumanized, fearful lives. It is an imagined place (possibly state) in which everything is unpleasant or bad, typically a totalitarian or environmen ...
n timeline of real-world events leading to such a future, such as the repeal of Section 230 of the
United States Code The United States Code (formally The Code of Laws of the United States of America) is the official Codification (law), codification of the general and permanent Law of the United States#Federal law, federal statutes of the United States. It ...
in 2022 and the introduction of advertising implants in 2041.


Ceramic archivists collection

The Great Room of the Internet Archive features a collection of more than 100 ceramic figures representing employees of the Internet Archive, with the 100th statue immortalizing
Aaron Swartz Aaron Hillel Swartz (; November 8, 1986January 11, 2013), also known as AaronSw, was an American computer programmer, entrepreneur, writer, political organizer, and Internet hacktivism, hacktivist. As a programmer, Swartz helped develop the we ...
. This collection, inspired by the statues of the Xian warriors in China, was commissioned by Brewster Kahle, sculpted by Nuala Creed, and as of 2014, is ongoing.


Artists in residence

The Internet Archive visual arts residency, organized by Amir Saber Esfahani, is designed to connect emerging and mid-career artists with the Archive's millions of collections and to show what is possible when open
access to information Access may refer to: Companies and organizations * ACCESS (Australia), an Australian youth network * Access (credit card), a former credit card in the United Kingdom * Access Co., a Japanese software company * Access International Advisors, a h ...
intersects with the arts. During this one-year residency, selected artists develop a body of work that responds to and utilizes the Archive's collections in their own practice. * 2024–2025 Residency Artist Swilk * 2021–2022 Residency Artist Casey Gray * 2019 Residency Artists: Caleb Duarte, Whitney Lynn, and Jeffrey Alan Scudder * 2018 Residency Artists: Mieke Marple, Chris Sollars, and Taravat Talepasand * 2017 Residency Artists: Laura Kim, Jeremiah Jenkins, and Jenny Odell


Controversies, legal disputes, and activism


Opposition to National security letters, bills and settlements

On May 8, 2008, it was revealed that the Internet Archive had successfully challenged an
FBI The Federal Bureau of Investigation (FBI) is the domestic Intelligence agency, intelligence and Security agency, security service of the United States and Federal law enforcement in the United States, its principal federal law enforcement ag ...
national security letter asking for logs on an undisclosed user. On November 28, 2016, it was revealed that a second FBI national security letter had been successfully challenged that had been asking for logs on another undisclosed user. The Internet Archive blacked out its web site for 12 hours on January 18, 2012, in protest of the
Stop Online Piracy Act The Stop Online Piracy Act (SOPA) was a proposed United States congressional bill to expand the ability of U.S. law enforcement to combat online copyright infringement and online trafficking in counterfeit goods. Introduced on October 26, 20 ...
and the
PROTECT IP Act The PROTECT IP Act (Preventing Real Online Threats to Economic Creativity and Theft of Intellectual Property Act, or PIPA) was a proposed law with the stated goal of giving the US government and copyright holders additional tools to curb acce ...
bills, two pieces of legislation in the
United States Congress The United States Congress is the legislature, legislative branch of the federal government of the United States. It is a Bicameralism, bicameral legislature, including a Lower house, lower body, the United States House of Representatives, ...
that they argued would "negatively affect the ecosystem of web publishing that led to the emergence of the Internet Archive". This occurred in conjunction with the English Wikipedia blackout, as well as numerous other protests across the Internet. The Internet Archive is a member of the Open Book Alliance, which has been among the most outspoken critics of the Google Book Settlement. The Archive advocates an alternative digital library project.


Hosting of disputed media

On October 9, 2016, the Internet Archive was temporarily blocked in
Turkey Turkey, officially the Republic of Türkiye, is a country mainly located in Anatolia in West Asia, with a relatively small part called East Thrace in Southeast Europe. It borders the Black Sea to the north; Georgia (country), Georgia, Armen ...
after it was used (amongst other file hosting services) by hackers to host 17 GB of leaked government emails. Because the Internet Archive only lightly moderates uploads, it includes resources that may be valued by extremists and the site may be used by them to evade block listing. In February 2018, the Counter Extremism Project said that the Archive hosted terrorist videos, including the beheading of Alan Henning, and had declined to respond to requests about the videos. In May 2018, a report published by the cyber-security firm Flashpoint stated that the
Islamic State The Islamic State (IS), also known as the Islamic State of Iraq and the Levant (ISIL), the Islamic State of Iraq and Syria (ISIS) and Daesh, is a transnational Salafi jihadism, Salafi jihadist organization and unrecognized quasi-state. IS ...
was using the Internet Archive to share its propaganda. Chris Butler, from the Internet Archive, responded that they regularly spoke to the US and EU governments about sharing information on terrorism. In April 2019,
Europol Europol, officially the European Union Agency for Law Enforcement Cooperation, is the law enforcement agency of the European Union (EU). Established in 1998, it is based in The Hague, Netherlands, and serves as the central hub for coordinating c ...
, acting on a referral from French police, asked the Internet Archive to remove 550 sites of "terrorist propaganda". The Archive rejected the request, saying that the reports were wrong about the content they pointed to, or were too broad for the organization to comply with. On July 14, 2021, the Internet Archive held a joint "Referral Action Day" with Europol to target terrorist videos. A 2021 article said that
jihadist Jihadism is a neologism for modern, armed militant Political aspects of Islam, Islamic movements that seek to Islamic state, establish states based on Islamic principles. In a narrower sense, it refers to the belief that armed confrontation ...
s regularly used the Internet Archive for "
dead drop A dead drop or dead letter box is a method of espionage tradecraft used to pass items or information between two individuals (e.g., a case officer and an agent, or two agents) via a secret location. By avoiding direct meetings, individuals can ...
s" of terrorist videos. In January 2022, a former
UCLA The University of California, Los Angeles (UCLA) is a public land-grant research university in Los Angeles, California, United States. Its academic roots were established in 1881 as a normal school then known as the southern branch of the C ...
lecturer's 800-page manifesto, containing racist ideas and threats against UCLA staff, was uploaded to the Internet Archive. The manifesto was removed by the Internet Archive after a week, amidst discussion about whether such documents should be preserved by archivists or not. Another 2022 paper found "an alarming volume of terrorist, extremist, and racist material on the Internet Archive". A 2023 paper reported that Neo-Nazis collect links to online, publicly available resources to be shared with new recruits. As the Internet Archive hosts uploaded texts that are not allowed on other websites, Nazi and neo-Nazi books in the Archive (e.g., '' The Turner Diaries'') frequently appear on these lists. These lists also feature older, public domain material created when white supremacist views were more mainstream.


2020 National Emergency Library

In the midst of the
COVID-19 pandemic The COVID-19 pandemic (also known as the coronavirus pandemic and COVID pandemic), caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), began with an disease outbreak, outbreak of COVID-19 in Wuhan, China, in December ...
which closed many schools, universities, and libraries, the Archive announced on March 24, 2020, that it was creating the National Emergency Library by removing the lending restrictions it had in place for 1.4 million digitized books in its Open Library but otherwise limiting users to the number of books they could check out and enforcing their return; normally, the site would only allow one digital lending for each physical copy of the book they had, by use of an encrypted file that would become unusable after the lending period was completed. This Library would remain as such until at least June 30, 2020, or until the US national emergency was over, whichever came later. At launch, the Internet Archive allowed authors and rightholders to submit opt-out requests for their works to be omitted from the National Emergency Library. The Internet Archive said the National Emergency Library addressed an "unprecedented global and immediate need for access to reading and research material" due to the closures of physical libraries worldwide. They justified the move in a number of ways. Legally, they said they were promoting access to those inaccessible resources, which they claimed was an exercise in
fair use Fair use is a Legal doctrine, doctrine in United States law that permits limited use of copyrighted material without having to first acquire permission from the copyright holder. Fair use is one of the limitations to copyright intended to bal ...
principles. The Archive continued implementing their
controlled digital lending Controlled digital lending (CDL) is a model by which libraries digitize materials in their collection and make them available for lending. It is based on interpretations of the United States copyright principles of fair use and copyright exhau ...
policy that predated the National Emergency Library, meaning they still encrypted the lent copies and it was no easier for users to create new copies of the books than before. An ultimate determination of whether or not the National Emergency Library constituted fair use could only be made by a court. Morally, they also pointed out that the Internet Archive was a registered library like any other, that they either paid for the books themselves or received them as donations, and that lending through libraries predated copyright restrictions. The Archive had already been criticized by authors and publishers for its prior lending approach, and upon announcement of the National Emergency Library, authors, publishers, and groups representing both took further issue with The Archive and its
Open Library Open Library is an online project intended to create "one web page for every book ever published". Created by Aaron Swartz, Brewster Kahle, Alexis Rossi, Anand Chitipothu, and Rebecca Hargrave Malamud, Open Library is a project of the Internet ...
project, equating the move to
copyright infringement Copyright infringement (at times referred to as piracy) is the use of Copyright#Scope, works protected by copyright without permission for a usage where such permission is required, thereby infringing certain exclusive rights granted to the c ...
and digital piracy, and using the COVID-19 pandemic as a reason to push the boundaries of copyright. After the works of some of these authors were ridiculed in responses, the Internet Archive's Jason Scott requested that supporters of the National Emergency Library not denigrate anyone's books: "I realize there's strong debate and disagreement here, but books are life-giving and life-changing and these writers made them."


Access blocking in Indonesia

On 27 May 2025, the Ministry of Communication and Digital Affairs of Indonesia (Kominfo) blocked access to the Internet Archive in Indonesia. Alexander Sabar, the Director of the Supervision of Digital Space (Komdigi), stated that the reason was the presence of pornography and online gambling on the site. He denied a rumor that there was motive to rewrite or hide history. He also acknowledged the importance of the Internet Archive, and claimed that the blocking was temporary and would be rescinded if they removed the offending content, and they only blocked it after the Internet Archive didn't respond to requests.


Copyright issues

In November 2005, free downloads of
Grateful Dead The Grateful Dead was an American rock music, rock band formed in Palo Alto, California, in 1965. Known for their eclectic style that fused elements of rock, blues, jazz, Folk music, folk, country music, country, bluegrass music, bluegrass, roc ...
concerts were removed from the site, following what seemed to be disagreements between some of the former band members.
John Perry Barlow John Perry Barlow (October 3, 1947February 7, 2018) was an American poet, essayist, cattle rancher, and cyberlibertarian political activist who had been associated with both the Democratic and Republican parties. He was also a lyricist for th ...
identified Bob Weir,
Mickey Hart Mickey Hart (born Michael Steven Hartman, September 11, 1943) is an American percussionist. He is best known as one of the two drummers of the rock band Grateful Dead. He was a member of the Grateful Dead from September 1967 until February 19 ...
, and
Bill Kreutzmann William Kreutzmann Jr. ( ; born May 7, 1946) is an American drummer and founding member of the rock band Grateful Dead. He played with the band for its entire thirty-year career, usually alongside fellow drummer Mickey Hart, and has continued to ...
as the instigators of the change, according to an article in ''
The New York Times ''The New York Times'' (''NYT'') is an American daily newspaper based in New York City. ''The New York Times'' covers domestic, national, and international news, and publishes opinion pieces, investigative reports, and reviews. As one of ...
''. Phil Lesh, a founding member of the band, commented on the change in a November 30, 2005, posting to his personal web site: A November 30 forum post from Brewster Kahle summarized what appeared to be the compromise reached among the band members. Audience recordings could be downloaded or streamed, but soundboard recordings were to be available for streaming only. Concerts have since been re-added. ''Authors and date indicate the first posting in the forum thread''. In February 2016, Internet Archive users had begun archiving digital copies of ''
Nintendo Power ''Nintendo Power'' was a video game news and strategy magazine from Nintendo of America, first published in July/August 1988 as Nintendo's official print magazine for North America. The magazine's publication was initially done monthly by Ninte ...
'',
Nintendo is a Japanese Multinational corporation, multinational video game company headquartered in Kyoto. It develops, publishes, and releases both video games and video game consoles. The history of Nintendo began when craftsman Fusajiro Yamauchi ...
's official magazine for their games and products, which ran from 1988 to 2012. The first 140 issues had been collected, before Nintendo had the archive removed on August 8, 2016. In response to the take-down, Nintendo told gaming website ''
Polygon In geometry, a polygon () is a plane figure made up of line segments connected to form a closed polygonal chain. The segments of a closed polygonal chain are called its '' edges'' or ''sides''. The points where two edges meet are the polygon ...
'', " intendomust protect our own characters, trademarks and other content. The unapproved use of Nintendo's intellectual property can weaken our ability to protect and preserve it, or to possibly use it for new projects". In August 2017, the
Department of Telecommunications The Department of Telecommunications, abbreviated to DoT, is a department of the Ministry of Communications of the executive branch of the Government of India The Government of India (ISO 15919, ISO: Bhārata Sarakāra, legally the Union ...
of the
Government of India The Government of India (ISO 15919, ISO: Bhārata Sarakāra, legally the Union Government or Union of India or the Central Government) is the national authority of the Republic of India, located in South Asia, consisting of States and union t ...
blocked the Internet Archive along with other file-sharing websites, in accordance with two court orders issued by the
Madras High Court The High Court of Judicature at Madras is a High Courts of India, High Court located in Chennai, India. It has appellate jurisdiction over the state of Tamil Nadu and the union territory of Puducherry (union territory), Puducherry. It is one of ...
, citing piracy concerns after copies of two
Bollywood Hindi cinema, popularly known as Bollywood and formerly as Bombay cinema, is primarily produced in Mumbai. The popular term Bollywood is a portmanteau of "Bombay" (former name of Mumbai) and "Cinema of the United States, Hollywood". The in ...
films were allegedly shared via the service. The
HTTP HTTP (Hypertext Transfer Protocol) is an application layer protocol in the Internet protocol suite model for distributed, collaborative, hypermedia information systems. HTTP is the foundation of data communication for the World Wide Web, wher ...
version of the Archive was blocked but it remained accessible using the
HTTPS Hypertext Transfer Protocol Secure (HTTPS) is an extension of the Hypertext Transfer Protocol (HTTP). It uses encryption for secure communication over a computer network, and is widely used on the Internet. In HTTPS, the communication protoc ...
protocol. In 2023, the Internet Archive became a popular site for Indians to watch the first episode of '' India: The Modi Question'', a BBC documentary released on January 17 and banned in India by January 20. The video was reported to have been removed by the Archive on January 23. The Internet Archive then stated, on January 27, that they had removed the video in response to a BBC request under the
Digital Millennium Copyright Act The Digital Millennium Copyright Act (DMCA) is a 1998 United States copyright law that implements two 1996 treaties of the World Intellectual Property Organization (WIPO). It criminalizes production and dissemination of technology, devices, or ...
.


Book publishers' lawsuit

The operation of the National Emergency Library was part of a lawsuit filed against the Internet Archive by four major book publishers—Hachette, HarperCollins, John Wiley & Sons, and Penguin Random House—in June 2020, challenging the copyright validity of the controlled digital lending program. In response, the Internet Archive closed the National Emergency Library on June 16, 2020, rather than the planned June 30, 2020, due to the lawsuit. The plaintiffs, supported by the Copyright Alliance, claimed in their lawsuit that the Internet Archive's actions constituted a "willful mass copyright infringement." Judge Koeltl ruled on March 24, 2023, against Internet Archive in the case, saying the National Emergency Library concept was not fair use, so the Archive infringed their copyrights by lending out the books without the waitlist restriction. An agreement was then reached for the Internet Archive to pay an undisclosed amount to the publishers. The Internet Archive appealed the ruling. On September 4, 2024, the U.S. Court of Appeals for the Second Circuit upheld the district court's ruling, calling the Internet Archive's argument that they were shielded by fair use doctrine "unpersuasive".


Music publishers' lawsuit

In August 2023, the
music industry The music industry are individuals and organizations that earn money by Songwriter, writing songs and musical compositions, creating and selling Sound recording and reproduction, recorded music and sheet music, presenting live music, concerts, ...
corporations
Universal Music Group Universal Music Group N.V. (often abbreviated as UMG and referred to as Universal Music Group or Universal Music) is a Netherlands, Dutch–United States, American multinational Music industry, music corporation under Law of the Netherlands, ...
(UMG),
Sony Music Sony Music Entertainment (SME), commonly known as Sony Music, is an American multinational music company owned by Japanese conglomerate Sony Group Corporation. It is the recording division of Sony Music Group, with the other half being the ...
and Concord sued the Internet Archive over its Great 78 Project, asserting the project was engaged in copyright infringement. The Great 78 Project stores digitized versions of pre-1972 songs and albums from 78 rpm
phonograph record A phonograph record (also known as a gramophone record, especially in British English) or a vinyl record (for later varieties only) is an analog sound storage medium in the form of a flat disc with an inscribed, modulated spiral groove. The g ...
s, for "the preservation, research and discovery of 78rpm records." The project had started in 2016, when pre-1972 recordings had not been protected by copyright; in 2018, the U.S. Congress passed the
Music Modernization Act The Orrin G. Hatch–Bob Goodlatte Music Modernization Act, or Music Modernization Act or MMA (, ) is United States legislation signed into law on October 11, 2018, aimed to modernize copyright-related issues for music and audio recordings due ...
(MMA) which enabled legal remedies for unauthorised use of pre-1972 recordings until 2067, thus effectively covering them with copyright. UMG and Sony had been the two largest companies in this sector for more than a decade, with respective market shares of 31.8% and 22.1% in 2023. Concord was a rapidly expanding music business closely partnered with UMG since its transformation into Concord Music Group in 2004 and backed since at least 2000 by
J.P. Morgan JP may refer to: Arts and media * ''JP'' (album), 2001, by American singer Jesse Powell * ''Jp'' (magazine), an American Jeep magazine * '' Jönköpings-Posten'', a Swedish newspaper * Judas Priest, an English heavy metal band * ''Jurassic Pa ...
. It was the first music company to perform an asset-backed
securitization Securitization is the financial practice of pooling various types of contractual debt such as residential mortgages, commercial mortgages, auto loans, or credit card debt obligations (or other non-debt assets which generate receivables) and sellin ...
, led by
Apollo Global Management Apollo Global Management, Inc. is an American asset management firm that primarily invests in alternative assets. , the company had $548 billion of assets under management, including $392 billion invested in credit, including mezzanine capita ...
, in December 2022. Its assets consisted of over 1 million copyrights to music older than 18 months. According to its CEO Bob Valentine, Concord derived about 85% of its revenue "from catalog, rather than newly-developed, music". As Valentine stated in his first interview, "The phenomenon of artists' IP has never been more liquid; it is now a real and proven asset class. Investment bankers are focused on it, financiers are financing it, and then there's entities like us, that know how to buy rights, but also know how to manage them and have the relationships to do so." The share of catalog music in total album equivalent consumption in the United States rose from 62.8% to 72.6% between 2019 and 2023. The publishers are seeking statutory damages for nearly 4,142 songs named in the suit, with a maximum possible fine of $621 million. The Internet Archive has argued that the primitive sound quality of the original recordings falls within the doctrine of "fair use" to digitize for preservation, that the number of downloads is so small it has almost no impact on the publishers' revenue, and over 95% of the collection is not readily available anywhere else. The plaintiffs said in response, "if ever there were a theory of fair use invented for litigation, this is it." According to a legal source at
Mayer Brown Mayer Brown is a global white-shoe law firm, founded in Chicago, Illinois, United States. It has offices in 27 cities throughout the Americas, Asia, Europe, and the Middle East, with its largest offices being in Chicago, Washington, D.C., New ...
, the music publishers' case could be challenged as
unconstitutional In constitutional law, constitutionality is said to be the condition of acting in accordance with an applicable constitution; "Webster On Line" the status of a law, a procedure, or an act's accordance with the laws or set forth in the applic ...
, since the granting of copyright to pre-1972 works in the MMA only benefitted record companies without having a systemic effect.


See also

* List of online image archives *
Public domain music Public domain music is music to which no Exclusive exclusive intellectual property rights apply. Background The length of copyright protection varies from country to country, but music, along with most other creative works, generally enters the ...


Similar projects

* archive.today *
Internet Memory Foundation The Internet Memory Foundation (formerly the European Archive Foundation) was a non-profit foundation whose purpose was archiving content of the World Wide Web. It hosted projects and research that included the preservation and protection of d ...
*
LibriVox LibriVox is a group of worldwide volunteers who read and record public domain texts, creating free public domain audiobooks for download from their website and other digital library hosting sites on the internet. It was founded in 2005 by Hugh M ...
*
National Digital Information Infrastructure and Preservation Program The National Digital Information Infrastructure and Preservation Program (NDIIPP) of the United States was an archival program led by the Library of Congress to preserve and provide access to digital resources. The program convened several workin ...
(NDIIPP) *
National Digital Library Program The National Digital Library Program (NDLP) is a project by the United States Library of Congress to assemble a digital library of reproductions of primary source materials to support the study of the history and culture of the United States. ...
(NDLP) *
Project Gutenberg Project Gutenberg (PG) is a volunteer effort to digitize and archive cultural works, as well as to "encourage the creation and distribution of eBooks." It was founded in 1971 by American writer Michael S. Hart and is the oldest digital li ...
*
UK Government Web Archive The UK Government Web Archive (UKGWA) is part of The National Archives of the United Kingdom. The National Archives collects records from all UK government departments and bodies creating records defined as Public Records under the British Publ ...
at
The National Archives (United Kingdom) The National Archives (TNA; ) is a non-ministerial government department, non-ministerial department of the Government of the United Kingdom. Its parent department is the Department for Culture, Media and Sport of the United Kingdom, United K ...
* UK Web Archive *
WebCite WebCite is an intermittently available archive site, originally designed to digitally preserve scientific and educationally important material on the web by taking snapshots of Internet contents as they existed at the time when a blogger or ...


Other

*
Anna's Archive Anna's Archive is an open source search engine for shadow library, shadow libraries that was launched by Anna shortly after law enforcement efforts to Z-Library#United States, shut down Z-Library in 2022. The site aggregates records from major ...
* Archive Team * Digital dark age * Digital preservation * Heritrix *
Library Genesis Library Genesis (shortened to LibGen) is a shadow library project for file-sharing access to scholarly journal articles, academic and general-interest books, images, comics, audiobooks, and magazines. The site enables free access to content th ...
*
Link rot Link rot (also called link death, link breaking, or reference rot) is the phenomenon of hyperlinks tending over time to cease to point to their originally targeted file, web page, or server due to that resource being relocated to a new address ...
*
Memory hole A memory hole is any mechanism for the deliberate alteration or disappearance of inconvenient or embarrassing documents, photographs, transcripts or other records, such as from a website or other archive, particularly as part of an attempt to giv ...
* PetaBox * Search engine cache


Notes


References


Further reading

* * * *


External links

*
Internet Archive Scholar
* {{Authority control 1996 establishments in California 1996 in San Francisco 501(c)(3) organizations Access to Knowledge movement Articles containing video clips Charities based in California Foundations based in the United States Internet properties established in 1996 Online archives of the United States Organizations established in 1996 Public libraries in California Richmond District, San Francisco Sound archives Web archiving initiatives Tor onion services Webby Award winners File sharing communities Libraries established in 1996