HOME

TheInfoList



OR:

The LOCKSS ("Lots of Copies Keep Stuff Safe") project, under the auspices of
Stanford University Stanford University, officially Leland Stanford Junior University, is a private research university in Stanford, California. The campus occupies , among the largest in the United States, and enrolls over 17,000 students. Stanford is consider ...
, is a peer-to-peer network that develops and supports an
open source Open source is source code that is made freely available for possible modification and redistribution. Products include permission to use the source code, design documents, or content of the product. The open-source model is a decentralized sof ...
system allowing libraries to collect, preserve and provide their readers with access to material published on the Web. Its main goal is
digital preservation In library and archival science, digital preservation is a formal endeavor to ensure that digital information of continuing value remains accessible and usable. It involves planning, resource allocation, and application of preservation methods an ...
. The system attempts to replicate the way libraries do this for material published on paper. It was originally designed for scholarly journals, but is now also used for a range of other materials. Examples include the SOLINET project to preserve theses and dissertations at eight universities, US government documents, and the
MetaArchive Cooperative The MetaArchive Cooperative is an international digital preservation network composed of libraries, archives, and other memory institutions. As of August 2011, the MetaArchive preservation network is composed of 24 secure servers (referred to as â ...
program preserving at-risk digital archival collections, including Electronic Theses and Dissertations (ETDs), newspapers, photograph collections, and audio-visual collections. Free PDF download. A similar project called CLOCKSS (Controlled LOCKSS) "is a tax-exempt,
501(c)3 A 501(c)(3) organization is a United States corporation, trust, unincorporated association or other type of organization exempt from federal income tax under section 501(c)(3) of Title 26 of the United States Code. It is one of the 29 types of 50 ...
, not-for-profit organization, governed by a Board of Directors made up of librarians and publishers." CLOCKSS runs on LOCKSS technology.


Problem

Traditionally, academic libraries have retained issues of scholarly journals, either individually or collaboratively, providing their readers access to the content received even after the publisher has ceased or the subscription has been canceled. In the digital age, libraries often subscribe to journals that are only available digitally over the
Internet The Internet (or internet) is the global system of interconnected computer networks that uses the Internet protocol suite (TCP/IP) to communicate between networks and devices. It is a '' network of networks'' that consists of private, pub ...
. Although convenient for patron access, the model for digital subscriptions does not allow the libraries to retain a copy of the journal. If the publisher ceases to publish, or the library cancels the subscription, or if the publisher's website is down for the day, the content that has been paid for is no longer available.


Methods

The LOCKSS system allows a library, with permission from the publisher, to collect, preserve and disseminate to its patrons a copy of the materials to which it has subscribed as well as open access material (perhaps published under a
Creative Commons Creative Commons (CC) is an American non-profit organization and international network devoted to educational access and expanding the range of creative works available for others to build upon legally and to share. The organization has release ...
license). Each library's system collects a copy using a specialized
web crawler A Web crawler, sometimes called a spider or spiderbot and often shortened to crawler, is an Internet bot that systematically browses the World Wide Web and that is typically operated by search engines for the purpose of Web indexing (''web spid ...
that verifies that the publisher has granted suitable permission. The system is format-agnostic, collecting whatever formats the publisher delivers via
HTTP The Hypertext Transfer Protocol (HTTP) is an application layer protocol in the Internet protocol suite model for distributed, collaborative, hypermedia information systems. HTTP is the foundation of data communication for the World Wide Web, ...
. Libraries which have collected the same material cooperate in a
peer-to-peer Peer-to-peer (P2P) computing or networking is a distributed application architecture that partitions tasks or workloads between peers. Peers are equally privileged, equipotent participants in the network. They are said to form a peer-to-peer n ...
network to ensure its preservation. Peers in the network vote on
cryptographic hash functions A cryptographic hash function (CHF) is a hash algorithm (a map of an arbitrary binary string to a binary string with fixed size of n bits) that has special properties desirable for cryptography: * the probability of a particular n-bit output ...
of preserved content and a nonce; a peer that is outvoted regards its copy as damaged and repairs it from the publisher or other peers. The LOCKSS license used by most publishers allows a library's readers access to its own copy, but does not allow similar access to other libraries or unaffiliated readers; the system does not support
file sharing File sharing is the practice of distributing or providing access to digital media, such as computer programs, multimedia (audio, images and video), documents or electronic books. Common methods of storage, transmission and dispersion include r ...
. On request, a library may supply another library with content to effect a repair, but only if the requesting library proved that in the past that it had a good copy by voting with the majority. If the reader's browser no longer supports the format in which the copy was collected, a ''format migration process'' can convert it to a current format. These limits on the use that may be made of preserved copies of copyright material have been effective in persuading copyright owners to grant the necessary permission. The LOCKSS approach of selective collection with permission from the publisher, distributed storage, and restricted dissemination contrasts with, for example, the
Internet Archive The Internet Archive is an American digital library with the stated mission of "universal access to all knowledge". It provides free public access to collections of digitized materials, including websites, software applications/games, music, ...
's approach of omnivorous collection without permission from the publisher, centralized storage, and unrestricted dissemination. The LOCKSS system is far smaller, but it can preserve subscription materials to which the Internet Archive has no access. Since each library administers its own LOCKSS peer and maintains its own copy of preserved material, and since there are libraries doing so worldwide (see the list of participating libraries below), the system provides a much higher degree of replication than is usual in a
fault-tolerant system Fault tolerance is the property that enables a system to continue operating properly in the event of the failure of one or more faults within some of its components. If its operating quality decreases at all, the decrease is proportional to the ...
. The voting process makes use of this high degree of replication to eliminate the need for
backup In information technology, a backup, or data backup is a copy of computer data taken and stored elsewhere so that it may be used to restore the original after a data loss event. The verb form, referring to the process of doing so, is "back up", w ...
s to off-line media, and to provide robust defenses against attacks aimed at corrupting preserved content.


Importance

In addition to preserving access, libraries have traditionally made it difficult to rewrite or suppress printed material. The existence of an indeterminate but large number of identical copies on a somewhat tamper-resistant medium under many independent administrations meant that attempts to alter or remove all copies of a published work would likely both fail and be detected. Web publishing, based on a single copy under a single administration, provides none of these safeguards against subversion. Web publishing is, therefore, an amenable tool for rewriting history. By preserving many copies under diverse administration, by automatically auditing the copies at intervals against each other (and, in the future, against the publisher's copy), and by alerting libraries when changes are detected, the LOCKSS system attempts to restore many of these safeguards in the now digital world of publication.


Implementation

Prior to implementing a LOCKSS system, some questions need to be considered carefully in order to make sure the content is verified, evaluated, and auditable by users. The user must ask questions such as, "What are your procedures?", "What are your methods?", "How is this system evaluated?", and "What is your disaster preparedness program?". These questions will enable the user to evaluate the system, create a successful maintenance plan for their materials, and enable the system to be reinforced by a carefully evaluated support structure. The source code for the entire LOCKSS system carries BSD-style
open-source license An open-source license is a type of license for computer software and other products that allows the source code, blueprint or design to be used, modified and/or shared under defined terms and conditions. This allows end users and commercial compa ...
s and is available from
GitHub GitHub, Inc. () is an Internet hosting service for software development and version control using Git. It provides the distributed version control of Git plus access control, bug tracking, software feature requests, task management, continuous ...
. LOCKSS is a trademark of Stanford University.


See also

*
Clock of the Long Now The Clock of the Long Now, also called the 10,000-year clock, is a mechanical clock under construction that is designed to keep time for 10,000 years. It is being built by the Long Now Foundation. A two-meter prototype is on display at the Sci ...
*
Digital library A digital library, also called an online library, an internet library, a digital repository, or a digital collection is an online database of digital objects that can include text, still images, audio, video, digital documents, or other digital me ...
*
Digital preservation In library and archival science, digital preservation is a formal endeavor to ensure that digital information of continuing value remains accessible and usable. It involves planning, resource allocation, and application of preservation methods an ...
*
National Digital Information Infrastructure and Preservation Program The National Digital Information Infrastructure and Preservation Program (NDIIPP) of the United States was an archival program led by the Library of Congress to archive and provide access to digital resources. The program convened several working ...
*
Portico (service) Ithaka Harbors, Inc. is a US not-for-profit organization whose stated mission is to "help the academic community use digital technologies to preserve the scholarly record and to advance research and teaching in sustainable ways". It is the parent ...


References


Further reading

*


External links

*{{official website, http://www.lockss.org/
LOCKSS (YouTube video, June 2007)LOCKSS Part I: Why Libraries Should Care About LOCKSS (YouTube video, December 2007)CLOCKSS
"Controlled LOCKSS", a federated global (vice local) LOCKSS archive Digital libraries Free institutional repository software Peer-to-peer computing Software using the BSD license