A machine-readable document is a
document
A document is a writing, written, drawing, drawn, presented, or memorialized representation of thought, often the manifestation of nonfiction, non-fictional, as well as fictional, content. The word originates from the Latin ', which denotes ...
whose content can be readily processed by
computer
A computer is a machine that can be Computer programming, programmed to automatically Execution (computing), carry out sequences of arithmetic or logical operations (''computation''). Modern digital electronic computers can perform generic set ...
s. Such documents are distinguished from more general
machine-readable data
In communications and computing, a machine-readable medium (or computer-readable medium) is a medium capable of storing data in a format easily readable by a digital computer or a sensor.
It contrasts with ''human-readable'' medium and data ...
by virtue of having further structure to provide the necessary context to support the business processes for which they are created.
Definition
Data
Data ( , ) are a collection of discrete or continuous values that convey information, describing the quantity, quality, fact, statistics, other basic units of meaning, or simply sequences of symbols that may be further interpreted for ...
without
context
In semiotics, linguistics, sociology and anthropology, context refers to those objects or entities which surround a ''focal event'', in these disciplines typically a communicative event, of some kind. Context is "a frame that surrounds the event ...
is meaningless and lacks the four essential characteristics of trustworthy
business record
A business record is a document (hard copy or digital) that records an "act, condition, or event" related to business. Business records include meeting minutes, memoranda, employment contracts, and accounting source documents.
It must be retrie ...
s specified in
ISO 15489 Information and documentation – Records management:
* Reliability
* Authenticity
* Integrity
*
Usability
Usability can be described as the capacity of a system to provide a condition for its users to perform the tasks safely, effectively, and efficiently while enjoying the experience. In software engineering, usability is the degree to which a softw ...
The vast bulk of information is
unstructured data
Unstructured data (or unstructured information) is information that either does not have a pre-defined data model or is not organized in a pre-defined manner. Unstructured information is typically plain text, text-heavy, but may contain data such ...
and, from a business perspective, that means it is "immature", i.e., Level 1 (chaotic) of the
Capability Maturity Model
The Capability Maturity Model (CMM) is a development model created in 1986 after a study of data collected from organizations that contracted with the U.S. Department of Defense, who funded the research. The term "maturity" relates to the degree ...
. Such immaturity fosters inefficiency, diminishes quality, and limits effectiveness. Unstructured information is also ill-suited for
records management
Records management, also known as records and information management, is an organizational function devoted to the information management, management of information in an organization throughout its records life-cycle, life cycle, from the time of ...
functions, provides inadequate
evidence
Evidence for a proposition is what supports the proposition. It is usually understood as an indication that the proposition is truth, true. The exact definition and role of evidence vary across different fields. In epistemology, evidence is what J ...
for legal purposes, drives up the cost of
discovery
Discovery may refer to:
* Discovery (observation), observing or finding something unknown
* Discovery (fiction), a character's learning something unknown
* Discovery (law), a process in courts of law relating to evidence
Discovery, The Discovery ...
in
litigation
A lawsuit is a proceeding by one or more parties (the plaintiff or claimant) against one or more parties (the defendant) in a civil court of law. The archaic term "suit in law" is found in only a small number of laws still in effect today. ...
, and makes access and usage needlessly cumbersome in routine, ongoing
business process
A business process, business method, or business function is a collection of related, structured activities or tasks performed by people or equipment in which a specific sequence produces a service or product (that serves a particular business g ...
es.
There are at least four aspects to machine-readability:
* First, words or phrases should be discretely delineated (tagged) so that computer software and/or hardware logic can be applied to them as individual conceptual elements.
* Second, the semantics of each element should be specified so that computers can help human beings achieve a common understanding of their meanings and potential usages.
* Third, if the relationships among the individual elements are also specified, computers can automatically apply inferences to them, thereby further relieving human beings of the burden of trying to understand them, particularly for purposes of inquiry, discovery, and analysis.
* Fourth, if the structures of the documents in which the elements occur are also specified, human understanding is further enhanced and the data becomes more reliable for legal and business-quality purposes.
As early as 1983, the U.S.
Government Accountability Office
The United States Government Accountability Office (GAO) is an independent, nonpartisan government agency within the legislative branch that provides auditing, evaluative, and investigative services for the United States Congress. It is the s ...
(GAO) began emphasizing the benefits of machine-readable information. Still sooner, in 1981, GAO began reporting on the problem of inadequate record-keeping practices in the
U.S. federal government. Such deficiencies are not unique to government and advances in information technology mean that most information is now "born digital" and thus potentially far more easily managed by automated means. However, in testimony to Congress in 2010, GAO highlighted problems with managing electronic records, and as recently as 2015, GAO has continued to report inadequacies in the performance of Executive Branch agencies in meeting records management requirements. Moreover, more than two decades after a major and formerly highly respected auditing firm,
Arthur Andersen
Arthur Andersen LLP was an American accounting firm based in Chicago that provided auditing, tax advising, consulting and other professional services to large corporations. By 2001, it had become one of the world's largest multinational corpo ...
, met its demise due to a records destruction scandal, record-keeping practices became a central issue in the 2016 Presidential election.
On January 4, 2011, President Obama signed H.R. 2142, the
Government Performance and Results Act (GPRA) Modernization Act of 2010 (GPRAMA), into law as P.L. 111-352. Section 10 of GPRAMA requires U.S. federal agencies to publish their strategic and performance plans and reports in searchable, machine-readable format.
Additionally, in 2013, he issued
Executive Order
In the United States, an executive order is a directive by the president of the United States that manages operations of the federal government. The legal or constitutional basis for executive orders has multiple sources. Article Two of the ...
13642, Making Open and Machine Readable the New Default for Government Information in general.
On July 28, 2016, the
Office of Management and Budget
The Office of Management and Budget (OMB) is the largest office within the Executive Office of the President of the United States (EOP). The office's most prominent function is to produce the president's budget, while it also examines agency pro ...
(OMB) followed up by including in the revised issuance of Circular A-130 direction for agencies to use open, machine-readable formats, and to publish "public information online in a manner that promotes analysis and reuse for the widest possible range of purposes", meaning that the information is both publicly accessible and machine-readable. On January 14, 2019, President Trump signed into law H.R. 4174, the
OPEN Government Data Act (OGDA), which codifies in law the requirement for agencies to make their public data assets available in machine-readable format. On June 28, 2019, in Circular A-11, OMB expressed intent to begin complying with section 10 of GPRAMA.
In support of such policy direction, technological advancement is enabling more efficient and effective management and use of machine-readable electronic records.
Document-oriented database
A document-oriented database, or document store, is a computer program and data storage system designed for storing, retrieving and managing document-oriented information, also known as semi-structured data.
Document-oriented databases are one ...
s have been developed for storing, retrieving, and managing document-oriented information, also known as semi-structured data. Extensible Markup Language (
XML
Extensible Markup Language (XML) is a markup language and file format for storing, transmitting, and reconstructing data. It defines a set of rules for encoding electronic document, documents in a format that is both human-readable and Machine-r ...
) is a World Wide Web Consortium (
W3C
The World Wide Web Consortium (W3C) is the main international standards organization for the World Wide Web. Founded in 1994 by Tim Berners-Lee, the consortium is made up of member organizations that maintain full-time staff working together in ...
)
Recommendation setting forth rules for encoding documents in a format that is both
human-readable and machine-readable. Many
XML editor
An XML editor is a markup language editor with added functionality to facilitate the editing of XML. This can be done using a plain text editor, with all the code visible, but XML editors have added facilities like tag completion and menus and bu ...
tools have been developed and most, if not all major information technology applications support XML to greater or lesser degrees. The fact that XML itself is an open, standard, machine-readable format makes it relatively easy for application developers to do so.
The W3C's accompanying XML Schema (
XSD
XSD (XML Schema Definition), a recommendation of the World Wide Web Consortium (W3C), specifies how to formally describe the elements in an Extensible Markup Language (XML) document. It can be used by programmers to verify each piece of item cont ...
) Recommendation specifies how to formally describe the elements in an XML document. With respect to the specification of XML schemas, the
Organization for the Advancement of Structured Information Standards (OASIS) is a leading
standards-developing organization
A standards organization, standards body, standards developing organization (SDO), or standards setting organization (SSO) is an organization whose primary function is developing, coordinating, promulgating, revising, amending, reissuing, interpr ...
. However, many technical developers prefer to work with
JSON
JSON (JavaScript Object Notation, pronounced or ) is an open standard file format and electronic data interchange, data interchange format that uses Human-readable medium and data, human-readable text to store and transmit data objects consi ...
, and to define the structure of JSON data for validation, documentation, and interaction control,
JSON Schema
JSON (JavaScript Object Notation, pronounced or ) is an open standard file format and data interchange format that uses human-readable text to store and transmit data objects consisting of name–value pairs and arrays (or other serializa ...
was developed by the
Internet Engineering Task Force
The Internet Engineering Task Force (IETF) is a standards organization for the Internet standard, Internet and is responsible for the technical standards that make up the Internet protocol suite (TCP/IP). It has no formal membership roster ...
(IETF).
The
Portable Document Format
Portable document format (PDF), standardized as ISO 32000, is a file format developed by Adobe in 1992 to present documents, including text formatting and images, in a manner independent of application software, hardware, and operating syste ...
(PDF) is a file format used to present documents in a manner independent of application software, hardware, and operating systems. Each PDF file encapsulates a complete description of the presentation of the document, including the text, fonts, graphics, and other information needed to display it.
PDF/A
PDF/A is an International Organization for Standardization, ISO-standardized version of the Portable Document Format (PDF) specialized for use in the archive, archiving and long-term digital preservation, preservation of electronic documents. PDF ...
is an ISO-standardized version of the PDF specialized for use in the archiving and long-term preservation of electronic documents. PDF/A-3 allows embedding of other file formats, including
XML
Extensible Markup Language (XML) is a markup language and file format for storing, transmitting, and reconstructing data. It defines a set of rules for encoding electronic document, documents in a format that is both human-readable and Machine-r ...
, into
PDF/A
PDF/A is an International Organization for Standardization, ISO-standardized version of the Portable Document Format (PDF) specialized for use in the archive, archiving and long-term digital preservation, preservation of electronic documents. PDF ...
conforming documents, thus potentially providing the best of both human- and machine-readability. The W3C's
XSL-FO (XSL Formatting Objects)
markup language
A markup language is a Encoding, text-encoding system which specifies the structure and formatting of a document and potentially the relationships among its parts. Markup can control the display of a document or enrich its content to facilitate au ...
is commonly used to generate PDF files
Metadata
Metadata (or metainformation) is "data that provides information about other data", but not the content of the data itself, such as the text of a message or the image itself. There are many distinct types of metadata, including:
* Descriptive ...
, data about data, can be used to organize electronic resources, provide digital identification, and support the archiving and preservation of resources. In well-structured, machine-readable electronic records, the content can be
repurposed as both data and metadata. In the context of electronic record-keeping systems, the terms "management" and "metadata" are virtually synonymous. Given proper metadata, records management functions can be automated, thereby reducing the risk of
spoliation of evidence and other fraudulent manipulations of records. Moreover, such records can be used to automate the process of
audit
An audit is an "independent examination of financial information of any entity, whether profit oriented or not, irrespective of its size or legal form when such an examination is conducted with a view to express an opinion thereon." Auditing al ...
ing data maintained in
database
In computing, a database is an organized collection of data or a type of data store based on the use of a database management system (DBMS), the software that interacts with end users, applications, and the database itself to capture and a ...
s, thereby reducing the risk of single points of failure associated with the
Machiavellian concept of a
single source of truth
In information science and information technology, single source of truth (SSOT) architecture, or single point of truth (SPOT) architecture, for information systems is the practice of structuring information models and associated data schemas s ...
.
Blockchain
The blockchain is a distributed ledger with growing lists of Record (computer science), records (''blocks'') that are securely linked together via Cryptographic hash function, cryptographic hashes. Each block contains a cryptographic hash of th ...
s allow to create and maintain continuously-growing lists of records secured from tampering and revision. A key feature is that every node in a decentralized system has a copy of the blockchain so there is no
single point of failure
A single point of failure (SPOF) is a part of a system that would Cascading failure, stop the entire system from working if it were to fail. The term single point of failure implies that there is not a backup or redundant option that would enab ...
subject to manipulation and
fraud
In law, fraud is intent (law), intentional deception to deprive a victim of a legal right or to gain from a victim unlawfully or unfairly. Fraud can violate Civil law (common law), civil law (e.g., a fraud victim may sue the fraud perpetrato ...
.
See also
*
Budapest Declaration on Machine Readable Travel Documents
*
Comparison of XML editors
*
Four corners (law)
*
Integrity
Integrity is the quality of being honest and having a consistent and uncompromising adherence to strong moral and ethical principles and values.
In ethics, integrity is regarded as the honesty and Honesty, truthfulness or of one's actions. Integr ...
and particularly
Data integrity
Data integrity is the maintenance of, and the assurance of, data accuracy and consistency over its entire Information Lifecycle Management, life-cycle. It is a critical aspect to the design, implementation, and usage of any system that stores, proc ...
*
Linked data
In computing, linked data is structured data which is interlinked with other data so it becomes more useful through semantic queries. It builds upon standard Web technologies such as HTTP, RDF and URIs, but rather than using them to serve web ...
*
Machine-readable passport
A machine-readable passport (MRP) is a machine-readable travel document (MRTD) with the data on the identity page encoded in optical character recognition format. Many countries began to issue machine-readable travel documents in the 1980s. Most ...
*
Markup language
A markup language is a Encoding, text-encoding system which specifies the structure and formatting of a document and potentially the relationships among its parts. Markup can control the display of a document or enrich its content to facilitate au ...
*
Open data
Open data are data that are openly accessible, exploitable, editable and shareable by anyone for any purpose. Open data are generally licensed under an open license.
The goals of the open data movement are similar to those of other "open(-so ...
*
Reliability (statistics)
In statistics and psychometrics, reliability is the overall consistency of a measure. A measure is said to have a high reliability if it produces similar results under consistent conditions:It is the characteristic of a set of test scores that ...
,
Data integrity
Data integrity is the maintenance of, and the assurance of, data accuracy and consistency over its entire Information Lifecycle Management, life-cycle. It is a critical aspect to the design, implementation, and usage of any system that stores, proc ...
,
Reliability (computer networking)
In computer networking, a reliable protocol is a communication protocol that notifies the sender whether or not the delivery of data to intended recipients was successful. Reliability is a synonym for assurance, which is the term used by the I ...
, and
Reliability (research methods)
In statistics and psychometrics, reliability is the overall consistency of a measure. A measure is said to have a high reliability if it produces similar results under consistent conditions:It is the characteristic of a set of test scores that r ...
*
Strategy Markup Language (StratML)
*
Structured document
A structured document is an electronic document where some method of markup language, markup is used to identify the whole and parts of the document as having various meanings beyond their formatting. For example, a structured document might identi ...
*
Tag (metadata)
In information systems, a tag is a keyword or term assigned to a piece of information (such as an Internet bookmark, multimedia, database record, or computer file). This kind of metadata helps describe an item and allows it to be found again ...
*
Universal Business Language
Universal Business Language (UBL), ISO/IEC 19845, is an open library of standard electronic business documents and information models for supply chain, procurement, and Transport, transportation such as purchase orders, invoices, Transportation, tr ...
(UBL)
*
XBRL
XBRL (eXtensible Business Reporting Language) is a freely available global framework for exchanging business information. XBRL allows the expression of semantics commonly required in business reporting. The standard was originally based on X ...
(eXtensible Business Reporting Language)
References
{{reflist
External links
OMB M-13-13 Open Data Policy: Managing Information as an Asset, which requires agencies to use open, machine-readable, data format standards
January 2005, which outlines the characteristics of trustworthy records.
Driving a Stake in the Heart of the Capone Consultancy Method of Records Management: Best Practices for Correcting Non-Records Non-Policy Nonsense March 9, 2015
* The U.S. Code, which includes the term "machine-readable
over 50 timesas of September 10, 2016
__notoc__
Data management
Records management