HOME

TheInfoList



OR:

An Archival Resource Key (ARK) is a multi-purpose URL suited to being a
persistent identifier A persistent identifier (PI or PID) is a long-lasting reference to a document, file, web page, or other object. The term "persistent identifier" is usually used in the context of digital objects that are accessible over the Internet. Typically, s ...
for information objects of any type. It is widely used by libraries, data centers, archives, museums, publishers, and government agencies to provide reliable references to scholarly, scientific, and cultural objects. In 2019 it was registered as a Uniform Resource Identifier (URI). A URL that is an ARK is distinguished by the label ark: after the URL's hostname, which sets the expectation that, when submitted to a web browser, the URL terminated by '?' returns a brief metadata record, and the URL terminated by '??' returns metadata that includes a commitment statement from the current service provider. The ARK and its inflections ('?' and '??') provide access to three facets of a provider's ability to provide persistence. Implicit in the design of the ARK scheme is that persistence is purely a matter of service and not a property of a naming syntax. Moreover, that a "persistent identifier" cannot be born persistent, but an identifier from any scheme may only be proved persistent over time. The inflections provide information with which to judge an identifier's likelihood of persistence. ARKs can be maintained and resolved locally using open source software such a
Noid (Nice Opaque Identifiers)
or via services such a
EZID
Most implementations are decentralized and no fees are charged for the right to assign ARKs. Some implementations choose to publish ARKs via the centralize
N2T (Name-to-Thing)
resolver.


History

Throughout the 1990s, the
Internet Engineering Task Force The Internet Engineering Task Force (IETF) is a standards organization for the Internet and is responsible for the technical standards that make up the Internet protocol suite (TCP/IP). It has no formal membership roster or requirements and a ...
and other organizations developed standards for
persistent identifier A persistent identifier (PI or PID) is a long-lasting reference to a document, file, web page, or other object. The term "persistent identifier" is usually used in the context of digital objects that are accessible over the Internet. Typically, s ...
s for web resources, including
URN An urn is a vase, often with a cover, with a typically narrowed neck above a rounded body and a footed pedestal. Describing a vessel as an "urn", as opposed to a vase or other terms, generally reflects its use rather than any particular shape or ...
,
PURL A persistent uniform resource locator (PURL) is a uniform resource locator (URL) (i.e., location-based uniform resource identifier or URI) that is used to redirect to the location of the requested web resource. PURLs redirect HTTP clients using H ...
,
Handle A handle is a part of, or attachment to, an object that allows it to be grasped and manipulated by hand. The design of each type of handle involves substantial ergonomic issues, even where these are dealt with intuitively or by following tr ...
, and DOI. In each of these standards, indirect identifiers would resolve to URLs, which themselves changed over time. Many believed that such systems would contribute to the persistence of web resources over time. In 2001, John Kunze of the
University of California The University of California (UC) is a public land-grant research university system in the U.S. state of California. The system is composed of the campuses at Berkeley, Davis, Irvine, Los Angeles, Merced, Riverside, San Diego, San Franci ...
and R. P. Channing Rodgers of the
United States National Library of Medicine The United States National Library of Medicine (NLM), operated by the United States federal government, is the world's largest medical library. Located in Bethesda, Maryland, the NLM is an institute within the National Institutes of Health. Its ...
released the first draft of �
The ARK Persistent Identifier Scheme
” designed in response to the needs of their two organizations, as an IETF working document. In explaining their motivations for creating a new system, Kunze later wrote that “each {{bracket, persistent identifier system had specific problems.” In contrast to the decentralized structure of the web, with many independent publishers, Handle and DOI were related centralized systems which charged for inclusion; they were “antithetical,” according to Kunze, “to an implicit principle that Internet standards must not endorse control by any one entity, over access to the networked resources of another entity.” URNs were free, but lacked a resolver discovery services, and, wrote Kunze, “it seemed to me that the IETF community lost interest in creating a whole new Internet indirection infrastructure that would add little to existing web and DNS mechanisms, especially in light of the small part that indirection plays in keeping links from breaking.” In contrast to these other systems, the ARK scheme proposed that “persistence is purely a matter of service,… neither inherent in an object nor conferred on it by a particular naming syntax.” The most an identifier could do to solve the problem of persistence, then, was to indicate an organization’s commitment. Accordingly, in the ARK standard, identifiers would refer not only to a web resource, but also to “a promise of stewardship” and metadata about the resource. If a web server was queried with an ARK, it should return the resource itself or some surrogate for it, such as “a table of contents instead of a large complex document.” If a question mark was appended to the ARK, though, it should return a description—metadata—instead, which “must at minimum answer the who, what, when, and why questions concern an expression of the object.” (The scheme also included a guide to Electronic Resource Citations, a simple format for structuring this metadata.) If two question marks were appended, the server should return the provider’s policies regarding “object persistence, object naming, object fragment addressing, and operational service support.”
California Digital Library The California Digital Library (CDL) was founded by the University of California in 1997. Under the leadership of then UC President Richard C. Atkinson, the CDL's original mission was to forge a better system for scholarly information management a ...
began using ARKs in 2002, and released the Noid (Nice Opaque IDentifiers) software for managing ARKs and other identifiers in 2004. Other early adopters of ARKs included Portico, the
Internet Archive The Internet Archive is an American digital library with the stated mission of "universal access to all knowledge". It provides free public access to collections of digitized materials, including websites, software applications/games, music, ...
, and the Bibliothèque nationale de France, the first of several francophone institutions to adopt the scheme. In 2018, the California Digital Library and
DuraSpace DuraSpace was a 501(c)(3) not-for-profit organization founded in 2009 when the Fedora Commons organization and the DSpace Foundation, two of the largest providers of open source repository software for managing and providing access to digital co ...
announced a collaboration, initially named ARKs-in-the-Open and then the ARK Alliance, to build an international community around ARKs and their use in open scholarship. By 2021, over 800 institutions registered to use ARKs.


Structure

https://NMA/ark:/NAAN/Name ualifier/nowiki> * NAAN: Name Assigning Authority Number - mandatory unique identifier of the organization that originally named the object * NMA: Name Mapping Authority - optional and replaceable hostname of an organization that currently provides service for the object * Qualifier: optional string that extends the base ARK to support access to individual hierarchical subcomponents of an object, and to variants (versions, languages, formats) of components. A complete NAAN registryName Assigning Authority Number registry
/ref> is maintained by th
ARK Alliance
and replicated at the Bibliothèque Nationale de France and the
US National Library of Medicine The United States National Library of Medicine (NLM), operated by the United States federal government, is the world's largest medical library. Located in Bethesda, Maryland, the NLM is an institute within the National Institutes of Health. Its ...
. It contained 530 entries in June 2018, 633 in July 2020, and 754 in April 2021.


Application

ARKs may be assigned to anything digital, physical, or abstract. Below are examples, as reported (2020) to th
ARK Alliance
by the linked organizations. * genealogical records (8 billio
FamilySearch
* publisher content (100 millio
Portico
* scientific records (22 millio
INIST
* scanned texts (20 millio
Internet Archive
* bibliographic records (15 millio
BnF main catalog
* museum specimens (11 million going on 100 millio
Smithsonian
* public health documents, many from legal discovery (15 millio
UCSF IDL
* digitized documents and objects (5 millio
BnF Gallica
* historical persons, families, and organizations (4 millio
SNACC
* finding aids and special collections (4 millio
Merritt
* resource maps (1.5 millio
RMap Hub
* educational resources (1.1 millio
University of Utah
* fine art (483,00
Louvre museum
* historic maps (334,00
Princeton University Libraries
* vocabulary terms (9,00
PeriodoYAMZ


Generic Services

Three generic ARK services have been defined. They are described below in protocol-independent terms. Delivering these services may be implemented through many possible methods given available technology (today's or future).


Access Service (access, location)

*Returns (a copy of) the object or a redirect to the same, although a sensible object proxy may be substituted (for instance a table of contents instead of a large document). *May also return a discriminated list of alternate object locators. *If access is denied, returns an explanation of the object's current (perhaps permanent) inaccessibility.


Policy Service (permanence, naming, etc.)

*Returns declarations of policy and support commitments for given ARKs. *Declarations are returned in either a structured metadata format or a human readable text format; sometimes one format may serve both purposes. *Policy subareas may be addressed in separate requests, but the following areas should be covered: **object permanence, **object naming, **object fragment addressing, and **operational service support.


Description Service

*Returns a description of the object. Descriptions are returned in either a structured metadata format or a human readable text format; sometimes one format may serve both purposes. *A description must at a minimum answer the who, what, when, and where questions concerning an expression of the object. *Standalone descriptions should be accompanied by the modification date and source of the description itself. *May also return discriminated lists of ARKs that are related to the given ARK.


See also

*
Persistent identifier A persistent identifier (PI or PID) is a long-lasting reference to a document, file, web page, or other object. The term "persistent identifier" is usually used in the context of digital objects that are accessible over the Internet. Typically, s ...
*
Digital object identifier A digital object identifier (DOI) is a persistent identifier or handle used to uniquely identify various objects, standardized by the International Organization for Standardization (ISO). DOIs are an implementation of the Handle System; they a ...
(DOI) *
Handle System The Handle System is the Corporation for National Research Initiatives's proprietary registry assigning persistent identifiers, or handles, to information resources, and for resolving "those handles into the information necessary to locate, acces ...
(Handle) *
Persistent uniform resource locator A persistent uniform resource locator (PURL) is a uniform resource locator (URL) (i.e., location-based uniform resource identifier or URI) that is used to redirect to the location of the requested web resource. PURLs redirect HTTP clients using ...
(PURL) *
Uniform resource name A Uniform Resource Name (URN) is a Uniform Resource Identifier (URI) that uses the scheme. URNs are globally unique persistent identifiers assigned within defined namespaces so they will be available for a long period of time, even after the res ...
(URN) *
Info URI scheme In computing, info is a Uniform Resource Identifier (URI) scheme which enables identifiers from public namespaces to be represented as URIs, when they would otherwise have no canonical URL form, such as Library of Congress identifiers, Handle Syst ...


Notes and references


External links


ARK (Archival Resource Key)
ARK Alliance
Towards Electronic Persistence Using ARK Identifiers

The ARK Identifier Scheme
Internet Engineering Task Force The Internet Engineering Task Force (IETF) is a standards organization for the Internet and is responsible for the technical standards that make up the Internet protocol suite (TCP/IP). It has no formal membership roster or requirements and a ...

The “ark” URI scheme
(specification of ARK as URI).
Name-to-Thing Resolver

Noid (Nice Opaque Identifiers) open source software

EZID identifier manager
Electronic documents Identifiers Index (publishing)