HOME

TheInfoList



OR:

The Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH) is a protocol developed for harvesting metadata descriptions of records in an archive so that services can be built using metadata from many archives. An implementation of OAI-PMH must support representing metadata in Dublin Core, but may also support additional representations. The protocol is usually just referred to as the OAI Protocol. OAI-PMH uses
XML Extensible Markup Language (XML) is a markup language and file format for storing, transmitting, and reconstructing arbitrary data. It defines a set of rules for encoding documents in a format that is both human-readable and machine-readable. T ...
over
HTTP The Hypertext Transfer Protocol (HTTP) is an application layer protocol in the Internet protocol suite model for distributed, collaborative, hypermedia information systems. HTTP is the foundation of data communication for the World Wide Web, w ...
. Version 2.0 of the protocol was released in 2002; the document was last updated in 2015. It has a
Creative Commons license A Creative Commons (CC) license is one of several public copyright licenses that enable the free distribution of an otherwise copyrighted "work".A "work" is any creative material made by a person. A painting, a graphic, a book, a song/lyric ...
BY-SA.


History

In the late 1990s, Herbert Van de Sompel ( Ghent University) was working with researchers and librarians at Los Alamos National Laboratory (US) and called a meeting to address difficulties related to
interoperability Interoperability is a characteristic of a product or system to work with other products or systems. While the term was initially defined for information technology or systems engineering services to allow for information exchange, a broader defi ...
issues of e-print servers and digital repositories. The meeting was held in Santa Fe, New Mexico, in October 1999. A key development from the meeting was the definition of an interface that permitted e-print servers to expose metadata for the papers it held in a structured fashion so other repositories could identify and copy papers of interest with each other. This interface/protocol was named the "Santa Fe Convention". Several workshops were held in 2000 at the ACM Digital Libraries conference, at the 1st ACM/IEEE-CS joint conference on Digital libraries and elsewhere to share the ideas from the Santa Fe Convention. It was discovered at the workshops that the problems faced by the e-print community were also shared by libraries, museums, journal publishers, and others who needed to share distributed resources. To address these needs, the
Coalition for Networked Information The Coalition for Networked Information (CNI) is an organization whose mission is to promote networked information technology as a way to further the advancement of intellectual collaboration and productivity. Overview The Coalition for Network ...
and the
Digital Library Federation The Digital Library Federation (DLF) is a program of the Council on Library and Information Resources (CLIR) that brings together a consortium of college and university libraries, public libraries, museums, and related institutions with the stated ...
provided funding to establish an Open Archives Initiative (OAI) secretariat managed by Herbert Van de Sompel and Carl Lagoze. The OAI held a meeting at
Cornell University Cornell University is a private statutory land-grant research university based in Ithaca, New York. It is a member of the Ivy League. Founded in 1865 by Ezra Cornell and Andrew Dickson White, Cornell was founded with the intention to teac ...
( Ithaca, New York) in September 2000 aimed to improve the interface developed at the Santa Fe Convention. The specifications were refined over e-mail. OAI-PMH version 1.0 was introduced to the public in January 2001 at a workshop in
Washington D.C. ) , image_skyline = , image_caption = Clockwise from top left: the Washington Monument and Lincoln Memorial on the National Mall, United States Capitol, Logan Circle, Jefferson Memorial, White House, Adams Morgan, N ...
, and another in February in
Berlin, Germany Berlin ( , ) is the capital and largest city of Germany by both area and population. Its 3.7 million inhabitants make it the European Union's most populous city, according to population within city limits. One of Germany's sixteen constitue ...
. Subsequent modifications to the
XML Extensible Markup Language (XML) is a markup language and file format for storing, transmitting, and reconstructing arbitrary data. It defines a set of rules for encoding documents in a format that is both human-readable and machine-readable. T ...
standard by the W3C required making minor modifications to OAI-PMH resulting in version 1.1. The current version, 2.0, was released in June 2002. It contained several technical changes and enhancements and is not backward compatible. From 2001
CERN The European Organization for Nuclear Research, known as CERN (; ; ), is an intergovernmental organization that operates the largest particle physics laboratory in the world. Established in 1954, it is based in a northwestern suburb of Gene ...
, and later in collaboration with
University of Geneva The University of Geneva (French: ''Université de Genève'') is a public research university located in Geneva, Switzerland. It was founded in 1559 by John Calvin as a theological seminary. It remained focused on theology until the 17th centur ...
, has organized bi-annual OAI workshops, which over time have developed to cover most aspects of open science.


Uses

Some commercial
search engine A search engine is a software system designed to carry out web searches. They search the World Wide Web in a systematic way for particular information specified in a textual web search query. The search results are generally presented in a ...
s use OAI-PMH to acquire more resources. Google initially included support for OAI-PMH when launching sitemaps, however decided to support only the standard XML Sitemaps format in May 2008. In 2004, Yahoo! acquired content from OAIster ( University of Michigan) that was obtained through metadata harvesting with OAI-PMH. Wikimedia uses an OAI-PMH repository to provide feeds of Wikipedia and related site updates for search engines and other bulk analysis/republishing endeavors. Especially when dealing with thousands of files being harvested every day, OAI-PMH can help in reducing the network traffic and other resource usage by doing incremental harvesting. NASA's Mercury metadata search system uses OAI-PMH to index thousands of metadata records from Global Change Master Directory (GCMD) every day. The mod_oai project is using OAI-PMH to expose content to web crawlers that is accessible from Apache Web servers. OAI-PMH has later been applied to sharing of scientific data.


Software

OAI-PMH is based on a client–server architecture, in which "harvesters" request information on updated records from "repositories". Requests for data can be based on a datestamp range, and can be restricted to named sets defined by the provider. Data providers are required to provide
XML Extensible Markup Language (XML) is a markup language and file format for storing, transmitting, and reconstructing arbitrary data. It defines a set of rules for encoding documents in a format that is both human-readable and machine-readable. T ...
metadata in Dublin Core format, and may also provide it in other XML formats. A number of software systems support the OAI-PMH, including
Fedora A fedora () is a hat with a soft brim and indented crown.Kilgour, Ruth Edwards (1958). ''A Pageant of Hats Ancient and Modern''. R. M. McBride Company. It is typically creased lengthwise down the crown and "pinched" near the front on both sides ...
,
EThOS Ethos ( or ) is a Greek word meaning "character" that is used to describe the guiding beliefs or ideals that characterize a community, nation, or ideology; and the balance between caution, and passion. The Greeks also used this word to refer to ...
from the
British Library The British Library is the national library of the United Kingdom and is one of the largest libraries in the world. It is estimated to contain between 170 and 200 million items from many countries. As a legal deposit library, the British ...
, GNU EPrints from the
University of Southampton , mottoeng = The Heights Yield to Endeavour , type = Public research university , established = 1862 – Hartley Institution1902 – Hartley University College1913 – Southampton University Coll ...
, Open Journal Systems from the Public Knowledge Project, Desire2Learn,
DSpace DSpace is an open source repository software package typically used for creating open access repositories for scholarly and/or published digital content. While DSpace shares some feature overlap with content management systems and document manage ...
from
MIT The Massachusetts Institute of Technology (MIT) is a private land-grant research university in Cambridge, Massachusetts. Established in 1861, MIT has played a key role in the development of modern technology and science, and is one of the m ...
, HyperJournal from the University of Pisa, Digibib from Digibis, MyCoRe, Koha, Primo, DigiTool, Rosetta and MetaLib from
Ex Libris Ex Libris may refer to: *An Ex Libris (bookplate), a label affixed to a book to indicate ownership * Ex Libris (band), a Dutch metal band * Ex Libris (bookshop), a Swiss retail company * "Ex Libris" (''Charmed''), a 2000 episode of the television ...
, ArchivalWare fro
PTFS
DOOR from the eLab in Lugano, Switzerland, panFMP from the PANGAEA (data library), SimpleDL from Roaring Development, and jOAI from the National Center for Atmospheric Research.


Archives

A number of large archives support the protocol including
arXiv arXiv (pronounced "archive"—the X represents the Greek letter chi ⟨χ⟩) is an open-access repository of electronic preprints and postprints (known as e-prints) approved for posting after moderation, but not peer review. It consists of s ...
and the
CERN The European Organization for Nuclear Research, known as CERN (; ; ), is an intergovernmental organization that operates the largest particle physics laboratory in the world. Established in 1954, it is based in a northwestern suburb of Gene ...
Document Server.


See also

* Data format management * Digital curation * Digital preservation * File format * Dublin Core, an ISO metadata standard *
National Digital Information Infrastructure and Preservation Program The National Digital Information Infrastructure and Preservation Program (NDIIPP) of the United States was an archival program led by the Library of Congress to archive and provide access to digital resources. The program convened several working ...
(NDIIPP) *
National Digital Library Program The Library of Congress National Digital Library Program (NDLP) is assembling a digital library of reproductions of primary source materials to support the study of the history and culture of the United States. Begun in 1995 after a five-year p ...
(NDLP) * Metadata Encoding and Transmission Standard (METS) maintained by the Library of Congress * Preservation Metadata: Implementation Strategies (PREMIS) *
LOCKSS The LOCKSS ("Lots of Copies Keep Stuff Safe") project, under the auspices of Stanford University, is a peer-to-peer network that develops and supports an open source system allowing libraries to collect, preserve and provide their readers with acc ...
*
Search as a service Search as a service is a branch of software as a service (SaaS), focussed on enterprise search or site-specific web search. The need for search Searching is an important part of any business database function, either through internal databases, ...
*
Web archiving Web archiving is the process of collecting portions of the World Wide Web to ensure the information is preserved in an archive for future researchers, historians, and the public. Web archivists typically employ web crawlers for automated captur ...
* Object Reuse and Exchange (OAI-ORE)


References


External links


Suleyman Demirel University Open Archives Harvester




* ttp://www.digitalpreservation.gov/ Library of Congress, National Digital Information Infrastructure and Preservation Program
Library of Congress, Web Capture
{{open access navbox Online archives Internet protocols Metadata Open access projects Archival science de:OAI-PMH