PRONOM technical registry
   HOME

TheInfoList



OR:

PRONOM (
Public Record Office The Public Record Office (abbreviated as PRO, pronounced as three letters and referred to as ''the'' PRO), Chancery Lane in the City of London, was the guardian of the national archives of the United Kingdom from 1838 until 2003, when it was ...
and Nôm 喃) is a web-based technical registry to support
digital preservation In library and archival science, digital preservation is a formal endeavor to ensure that digital information of continuing value remains accessible and usable. It involves planning, resource allocation, and application of preservation methods and ...
services, developed by The National Archives of the United Kingdom. PRONOM was the first and remains, to date, the only operational public file format registry in the world, although the "Magic File" repository of the File Command has served this role in a less formal capacity for two decades. Other projects to develop technical registries, including the UK
Digital Curation Centre The Digital Curation Centre (DCC) was established to help solve the extensive challenges of digital preservation and digital curation and to lead research, development, advice, and support services for higher education institutions in the Unite ...
's Representation Information Registry, and the Global Digital Format Registry project at
Harvard University Harvard University is a private Ivy League research university in Cambridge, Massachusetts. Founded in 1636 as Harvard College and named for its first benefactor, the Puritan clergyman John Harvard, it is the oldest institution of high ...
, are now in progress. PRONOM's origins lie in a requirement to have access to reliable technical information about the electronic records held by The National Archives. By definition, electronic records are not inherently human-readable - file formats encode information into a form which can only be processed and rendered comprehensible by very specific technological environments. The accessibility of that information is therefore highly vulnerable to technological obsolescence. Technical information about the structure of those file formats, and the
software Software is a set of computer programs and associated software documentation, documentation and data (computing), data. This is in contrast to Computer hardware, hardware, from which the system is built and which actually performs the work. ...
and hardware environments required to support them, is therefore a prerequisite for any digital preservation regime. PRONOM was developed to provide this function, initially as an internal resource for National Archives staff, and subsequently as public, web-based resource. __TOC__


Development

The first version of PRONOM was developed by The National Archives digital preservation department led by Adrian Brown in March 2002. PRONOM 2 was released in December 2002, and provided support for the development of multi-lingual versions of the registry. The web-enabling of PRONOM (PRONOM 3) in February 2004 represented the starting point for the development of PRONOM as a major online resource for the international digital preservation community. PRONOM 4, released in October 2005, includes a significant reworking of the underlying data model to allow the capture of detailed technical information on file formats and support future interoperability with other planned registry systems, and the release of the DROID software for automatic file format identification. The latest version PRONOM 5 was a relatively minor update to support improvements to DROID and was released in 2006. A much more substantial update is planned for 2007, which will include the exposure of core PRONOM functions through web services interfaces. This work forms part of the Seamless Flow programme to position The National Archives to receive and manage future government records in electronic formats. In future, PRONOM may participate as a node in the planned Global Digital Format Registry project. The National Archives won the 2007 Digital Preservation Award sponsored by the
Digital Preservation Coalition The Digital Preservation Coalition (DPC) is a UK-based non-profit that works with global partners to provide the necessary resources to educate various public and private entities on the best practices for long term digital preservation. Backgr ...
, for its work on PRONOM and DROID.


Services

The core technical registry supports a number of specific services: The PRONOM registry provides a searchable web database of technical information about file formats, the software tools required to access them, and the technical environments required to access them. Users can search for formats and software using a variety of criteria, such as format or software name and
file extension A filename extension, file name extension or file extension is a suffix to the name of a computer file (e.g., .txt, .docx, .md). The extension indicates a characteristic of the file contents or its intended use. A filename extension is typically d ...
. PRONOM also holds information about support periods for software products, and can also be queried on this basis. In addition to on-screen viewing, registry information can be exported in
XML Extensible Markup Language (XML) is a markup language and file format for storing, transmitting, and reconstructing arbitrary data. It defines a set of rules for encoding documents in a format that is both human-readable and machine-readable ...
, CSV and printer-friendly formats. The PRONOM website allows users to submit new information for inclusion in PRONOM.


The PRONOM Persistent Unique Identifier (PUID) scheme

The PRONOM Persistent Unique Identifier (PUID) is an extensible scheme of persistent, unique and unambiguous identifiers for records in the PRONOM registry. Such identifiers are fundamental to the exchange and management of digital objects, by allowing human or automated user agents to unambiguously identify, and share that identification of, the representation information required to support access to an object. This is a virtue both of the inherent uniqueness of the identifier, and of its binding to a definitive description of the representation information in a registry such as PRONOM. At present, the PUID scheme is limited to one particular class of representation information: the format in which a digital object is encoded. Formats were considered a particular priority for such a scheme, as no existing, universally applicable system provides for this.
Unix Unix (; trademarked as UNIX) is a family of multitasking, multiuser computer operating systems that derive from the original AT&T Unix, whose development started in 1969 at the Bell Labs research center by Ken Thompson, Dennis Ritchie, an ...
magic numbers and
Macintosh The Mac (known as Macintosh until 1999) is a family of personal computers designed and marketed by Apple Inc. Macs are known for their ease of use and minimalist designs, and are popular among students, creative professionals, and software en ...
data forks do provide some of this functionality, but the same is not true within
DOS DOS is shorthand for the MS-DOS and IBM PC DOS family of operating systems. DOS may also refer to: Computing * Data over signalling (DoS), multiplexing data onto a signalling channel * Denial-of-service attack (DoS), an attack on a communicat ...
or Microsoft Windows environments. The three-character
file extension A filename extension, file name extension or file extension is a suffix to the name of a computer file (e.g., .txt, .docx, .md). The extension indicates a characteristic of the file contents or its intended use. A filename extension is typically d ...
is neither standardised nor unique, and is interpreted differently by different environments. Equally, the IANA
MIME Multipurpose Internet Mail Extensions (MIME) is an Internet standard that extends the format of email messages to support text in character sets other than ASCII, as well as attachments of audio, video, images, and application programs. Message ...
-type scheme does not provide sufficient granularity or coverage to satisfy the requirements for unique identifiers. The PUID scheme has been developed for the single purpose of providing such identifiers. The scheme has been adopted as the recommended encoding scheme for describing file formats in the latest version of the ''UK e-Government Metadata Standard''. The scheme is designed to be extensible, and may be expanded in future to include other classes of representation information in PRONOM, such as compression methods, character encoding schemes, and
operating system An operating system (OS) is system software that manages computer hardware, software resources, and provides common services for computer programs. Time-sharing operating systems schedule tasks for efficient use of the system and may also i ...
s. PUIDs can be expressed as Uniform Resource Identifiers using the info:pronom/ namespace, details of which are available from the info URI registry. Neither the PUID scheme, nor its expression as an info URI, supports any inherent dereferencing mechanism, i.e. a PUID does not resolve to a
Uniform Resource Locator A Uniform Resource Locator (URL), colloquially termed as a web address, is a reference to a web resource that specifies its location on a computer network and a mechanism for retrieving it. A URL is a specific type of Uniform Resource Identifi ...
. However, The National Archives is planning to develop a range of services to expose PRONOM registry content, including a resolution service for PUIDs.


DROID

DROID (Digital Record Object Identification) is a software tool developed by The National Archives to perform automated batch identification of file formats. It is one of a planned series of tools utilising PRONOM to provide specific digital preservation services. DROID uses internal (byte sequence) and external (file extension) signatures to identify and report the specific file format versions of digital files. These signatures are stored in an XML signature file, generated from information recorded in the PRONOM technical registry. New and updated signatures are regularly added to PRONOM, and DROID can be configured to automatically download updated signature files from the PRONOM website via web services. DROID allows files and folders to be selected from a file system for identification. After the identification process had been run, the results can be output in
XML Extensible Markup Language (XML) is a markup language and file format for storing, transmitting, and reconstructing arbitrary data. It defines a set of rules for encoding documents in a format that is both human-readable and machine-readable ...
, CSV or printer-friendly formats. DROID is a platform-independent
Java Java (; id, Jawa, ; jv, ꦗꦮ; su, ) is one of the Greater Sunda Islands in Indonesia. It is bordered by the Indian Ocean to the south and the Java Sea to the north. With a population of 151.6 million people, Java is the world's mos ...
tool. It includes a documented, public
API An application programming interface (API) is a way for two or more computer programs to communicate with each other. It is a type of software interface, offering a service to other pieces of software. A document or standard that describes how ...
, and can be invoked from both
GUI The GUI ( "UI" by itself is still usually pronounced . or ), graphical user interface, is a form of user interface that allows users to interact with electronic devices through graphical icons and audio indicator such as primary notation, inste ...
and
command line A command-line interpreter or command-line processor uses a command-line interface (CLI) to receive commands from a user in the form of lines of text. This provides a means of setting parameters for the environment, invoking executables and pro ...
interfaces.


Future services

Proposed future services include format risk assessments and preservation planning, and the automated generation of migration pathways for converting between formats.


See also

*
Digital curation Digital curation is the selection, preservation, maintenance, collection and archiving of digital assets. Digital curation establishes, maintains and adds value to repositories of digital data for present and future use. This is often accomplished ...
*
Digital preservation In library and archival science, digital preservation is a formal endeavor to ensure that digital information of continuing value remains accessible and usable. It involves planning, resource allocation, and application of preservation methods and ...
* File format *
File (command) The file command is a standard program of Unix and Unix-like operating systems for recognizing the type of data contained in a computer file. History The original version of file originated in Unix Research Version 4 in 1973. System V brought ...


References

{{Reflist


External links


PRONOM technical registry

info:pronom/ namespace registration



Global Digital Format Registry project
Preservation (library and archival science) Web applications Computer archives The National Archives (United Kingdom)