HOME

TheInfoList



OR:

A machine-readable document is a
document A document is a written, drawn, presented, or memorialized representation of thought, often the manifestation of non-fictional, as well as fictional, content. The word originates from the Latin ''Documentum'', which denotes a "teaching" o ...
whose content can be readily processed by
computer A computer is a machine that can be programmed to carry out sequences of arithmetic or logical operations ( computation) automatically. Modern digital electronic computers can perform generic sets of operations known as programs. These prog ...
s. Such documents are distinguished from
machine-readable data Machine-readable data, or computer-readable data, is data in a format that can be processed by a computer. Machine-readable data must be structured data. Attempts to create machine-readable data occurred as early as the 1960s. At the same time th ...
by virtue of having sufficient structure to provide the necessary context to support the business processes for which they are created.


Definition

Data In the pursuit of knowledge, data (; ) is a collection of discrete values that convey information, describing quantity, quality, fact, statistics, other basic units of meaning, or simply sequences of symbols that may be further interpret ...
without
context (language use) In semiotics, linguistics, sociology and anthropology, context refers to those objects or entities which surround a ''focal event'', in these disciplines typically a communicative event, of some kind. Context is "a frame that surrounds the event a ...
is meaningless and lacks the four essential characteristics of trustworthy
business record A business record is a document (hard copy or digital) that records an "act, condition, or event" related to business. Business records include meeting minutes, memoranda, employment contracts, and accounting source documents. It must be retrievab ...
s specified in ISO 15489 Information and documentation -- Records management: * Reliability * Authenticity * Integrity *
Usability Usability can be described as the capacity of a system to provide a condition for its users to perform the tasks safely, effectively, and efficiently while enjoying the experience. In software engineering, usability is the degree to which a sof ...
The vast bulk of information is unstructured data and, from a business perspective, that means it is "immature", i.e., Level 1 (chaotic) of the
Capability Maturity Model The Capability Maturity Model (CMM) is a development model created in 1986 after a study of data collected from organizations that contracted with the U.S. Department of Defense, who funded the research. The term "maturity" relates to the degree o ...
. Such immaturity fosters inefficiency, diminishes quality, and limits effectiveness. Unstructured information is also ill-suited for
records management Records management, also known as records and information management, is an organizational function devoted to the management of information in an organization throughout its life cycle, from the time of creation or receipt to its eventual dispos ...
functions, provides inadequate
evidence Evidence for a proposition is what supports this proposition. It is usually understood as an indication that the supported proposition is true. What role evidence plays and how it is conceived varies from field to field. In epistemology, evidenc ...
for legal purposes, drives up the cost of
discovery Discovery may refer to: * Discovery (observation), observing or finding something unknown * Discovery (fiction), a character's learning something unknown * Discovery (law), a process in courts of law relating to evidence Discovery, The Discove ...
in
litigation - A lawsuit is a proceeding by a party or parties against another in the civil court of law. The archaic term "suit in law" is found in only a small number of laws still in effect today. The term "lawsuit" is used in reference to a civil act ...
, and makes access and usage needlessly cumbersome in routine, ongoing
business process A business process, business method or business function is a collection of related, structured activities or tasks by people or equipment in which a specific sequence produces a service or product (serves a particular business goal) for a parti ...
es. There are at least four aspects to machine-readability: * First, words or phrases should be discretely delineated (tagged) so that computer software and/or hardware logic can be applied to them as individual conceptual elements. * Second, the semantics of each element should be specified so that computers can help human beings achieve a common understanding of their meanings and potential usages. * Third, if the relationships among the individual elements are also specified, computers can automatically apply inferences to them, thereby further relieving human beings of the burden of trying to understand them, particularly for purposes of inquiry, discovery, and analysis. * Fourth, if the structures of the documents in which the elements occur are also specified, human understanding is further enhanced and the data becomes more reliable for legal and business-quality purposes. As early as 1983, the U.S.
Government Accountability Office The U.S. Government Accountability Office (GAO) is a legislative branch government agency that provides auditing, evaluative, and investigative services for the United States Congress. It is the supreme audit institution of the federal gover ...
(GAO) began emphasizing the benefits of machine-readable information. Still sooner, in 1981, GAO began reporting on the problem of inadequate record-keeping practices in the
U.S. federal government The federal government of the United States (U.S. federal government or U.S. government) is the national government of the United States, a federal republic located primarily in North America, composed of 50 states, a city within a f ...
. Such deficiencies are not unique to government and advances in information technology mean that most information is now "born digital" and thus potentially far more easily managed by automated means. However, in testimony to Congress in 2010, GAO highlighted problems with managing electronic records, and as recently as 2015, GAO has continued to report inadequacies in the performance of Executive Branch agencies in meeting records management requirements. Moreover, more than two decades after a major and formerly highly respected auditing firm, Arthur Andersen, met its demise due to a records destruction scandal, record-keeping practices became a central issue in the 2016 Presidential election. On January 4, 2011, President Obama signed H.R. 2142, the Government Performance and Results Act (GPRA) Modernization Act of 2010 (GPRAMA), into law as P.L. 111-352. Section 10 of GPRAMA requires U.S. federal agencies to publish their strategic and performance plans and reports in searchable, machine-readable format. Additionally, in 2013, he issued
Executive Order In the United States, an executive order is a directive by the president of the United States that manages operations of the federal government. The legal or constitutional basis for executive orders has multiple sources. Article Two of t ...
13642, Making Open and Machine Readable the New Default for Government Information in general. On July 28, 2016, the
Office of Management and Budget The Office of Management and Budget (OMB) is the largest office within the Executive Office of the President of the United States (EOP). OMB's most prominent function is to produce the president's budget, but it also examines agency programs, pol ...
(OMB) followed up by including in the revised issuance of Circular A-130 direction for agencies to use open, machine-readable formats, and to publish "public information online in a manner that promotes analysis and reuse for the widest possible range of purposes", meaning that the information is both publicly accessible and machine-readable. On January 14, 2019, President Trump signed into law H.R. 4174, the OPEN Government Data Act (OGDA), which codifies in law the requirement for agencies to make their public data assets available in machine-readable format. On June 28, 2019, in Circular A-11, OMB expressed intent to begin complying with section 10 of GPRAMA. In support of such policy direction, technological advancement is enabling more efficient and effective management and use of machine-readable electronic records. Document-oriented databases have been developed for storing, retrieving, and managing document-oriented information, also known as semi-structured data. Extensible Markup Language (
XML Extensible Markup Language (XML) is a markup language and file format for storing, transmitting, and reconstructing arbitrary data. It defines a set of rules for encoding documents in a format that is both human-readable and machine-readable. T ...
) is a World Wide Web Consortium ( W3C) Recommendation setting forth rules for encoding documents in a format that is both
human-readable A human-readable medium or human-readable format is any encoding of data or information that can be naturally read by humans. In computing, ''human-readable'' data is often encoded as ASCII or Unicode text, rather than as binary data. In m ...
and machine-readable. Many XML editor tools have been developed and most, if not all major information technology applications support XML to greater or lesser degrees. The fact that XML itself is an open, standard, machine-readable format makes it relatively easy for application developers to do so. The W3C's accompanying XML Schema ( XSD) Recommendation specifies how to formally describe the elements in an XML document. With respect to the specification of XML schemas, the
Organization for the Advancement of Structured Information Standards The Organization for the Advancement of Structured Information Standards (OASIS; ) is a nonprofit consortium that works on the development, convergence, and adoption of open standards for cybersecurity, blockchain, Internet of things (IoT), ...
(OASIS) is a leading
standards-developing organization A standards organization, standards body, standards developing organization (SDO), or standards setting organization (SSO) is an organization whose primary function is developing, coordinating, promulgating, revising, amending, reissuing, interpr ...
. However, many technical developers prefer to work with
JSON JSON (JavaScript Object Notation, pronounced ; also ) is an open standard file format and data interchange format that uses human-readable text to store and transmit data objects consisting of attribute–value pairs and arrays (or other se ...
, and to define the structure of JSON data for validation, documentation, and interaction control,
JSON Schema JSON (JavaScript Object Notation, pronounced ; also ) is an open standard file format and data interchange format that uses human-readable text to store and transmit data objects consisting of attribute–value pairs and arrays (or other ser ...
was developed by the
Internet Engineering Task Force The Internet Engineering Task Force (IETF) is a standards organization for the Internet and is responsible for the technical standards that make up the Internet protocol suite (TCP/IP). It has no formal membership roster or requirements an ...
(IETF). The
Portable Document Format Portable Document Format (PDF), standardized as ISO 32000, is a file format developed by Adobe in 1992 to present documents, including text formatting and images, in a manner independent of application software, hardware, and operating syste ...
(PDF) is a file format used to present documents in a manner independent of application software, hardware, and operating systems. Each PDF file encapsulates a complete description of the presentation of the document, including the text, fonts, graphics, and other information needed to display it.
PDF/A PDF/A is an ISO-standardized version of the Portable Document Format (PDF) specialized for use in the archiving and long-term preservation of electronic documents. PDF/A differs from PDF by prohibiting features unsuitable for long-term archivi ...
is an ISO-standardized version of the PDF specialized for use in the archiving and long-term preservation of electronic documents. PDF/A-3 allows embedding of other file formats, including
XML Extensible Markup Language (XML) is a markup language and file format for storing, transmitting, and reconstructing arbitrary data. It defines a set of rules for encoding documents in a format that is both human-readable and machine-readable. T ...
, into
PDF/A PDF/A is an ISO-standardized version of the Portable Document Format (PDF) specialized for use in the archiving and long-term preservation of electronic documents. PDF/A differs from PDF by prohibiting features unsuitable for long-term archivi ...
conforming documents, thus potentially providing the best of both human- and machine-readability. The W3C's
XSL-FO XSL-FO (XSL Formatting Objects) is a markup language for XML document formatting that is most often used to generate PDF files. XSL-FO is part of XSL (Extensible Stylesheet Language), a set of W3C technologies designed for the transformation and ...
(XSL Formatting Objects)
markup language Markup language refers to a text-encoding system consisting of a set of symbols inserted in a text document to control its structure, formatting, or the relationship between its parts. Markup is often used to control the display of the document ...
is commonly used to generate PDF files
Metadata Metadata is "data that provides information about other data", but not the content of the data, such as the text of a message or the image itself. There are many distinct types of metadata, including: * Descriptive metadata – the descriptive ...
, data about data, can be used to organize electronic resources, provide digital identification, and support the archiving and preservation of resources. In well-structured, machine-readable electronic records, the content can be repurposed as both data and metadata. In the context of electronic record-keeping systems, the terms "management" and "metadata" are virtually synonymous. Given proper metadata, records management functions can be automated, thereby reducing the risk of
spoliation of evidence Tampering with evidence, or evidence tampering, is an act in which a person alters, conceals, falsifies, or destroys evidence with the intent to interfere with an investigation (usually) by a law-enforcement, governmental, or regulatory authority. ...
and other fraudulent manipulations of records. Moreover, such records can be used to automate the process of
audit An audit is an "independent examination of financial information of any entity, whether profit oriented or not, irrespective of its size or legal form when such an examination is conducted with a view to express an opinion thereon.” Auditing ...
ing data maintained in
database In computing, a database is an organized collection of data stored and accessed electronically. Small databases can be stored on a file system, while large databases are hosted on computer clusters or cloud storage. The design of databases ...
s, thereby reducing the risk of single points of failure associated with the Machiavellian concept of a
single source of truth In information science and information technology, single source of truth (SSOT) architecture, or single point of truth (SPOT) architecture, for information systems is the practice of structuring information models and associated data schemas ...
. Blockchain (database) is a new technology for maintaining continuously-growing lists of records secured from tampering and revision. A key feature is that every node in a decentralized system has a copy of the blockchain so there is no
single point of failure A single point of failure (SPOF) is a part of a system that, if it fails, will stop the entire system from working. SPOFs are undesirable in any system with a goal of high availability or reliability, be it a business practice, software ap ...
subject to manipulation and
fraud In law, fraud is intentional deception to secure unfair or unlawful gain, or to deprive a victim of a legal right. Fraud can violate civil law (e.g., a fraud victim may sue the fraud perpetrator to avoid the fraud or recover monetary compen ...
.


See also

* Budapest Declaration on Machine Readable Travel Documents * Comparison of XML editors *
Four corners (law) The Four Corners Rule is a legal doctrine that courts use to determine the meaning of a written instrument such as a contract, will, or deed as represented solely by its textual content. The doctrine states that where there is an ambiguity of term ...
* Integrity and particularly
Data integrity Data integrity is the maintenance of, and the assurance of, data accuracy and consistency over its entire life-cycle and is a critical aspect to the design, implementation, and usage of any system that stores, processes, or retrieves data. The ter ...
*
Linked data In computing, linked data (often capitalized as Linked Data) is structured data which is interlinked with other data so it becomes more useful through semantic queries. It builds upon standard Web technologies such as HTTP, RDF and URIs, but ...
*
Machine-readable passport A machine-readable passport (MRP) is a machine-readable travel document (MRTD) with the data on the identity page encoded in optical character recognition format. Many countries began to issue machine-readable travel documents in the 1980s. Mos ...
*
Markup language Markup language refers to a text-encoding system consisting of a set of symbols inserted in a text document to control its structure, formatting, or the relationship between its parts. Markup is often used to control the display of the document ...
*
Open data Open data is data that is openly accessible, exploitable, editable and shared by anyone for any purpose. Open data is licensed under an open license. The goals of the open data movement are similar to those of other "open(-source)" movements ...
*
Reliability (statistics) In statistics and psychometrics, reliability is the overall consistency of a measure. A measure is said to have a high reliability if it produces similar results under consistent conditions:"It is the characteristic of a set of test scores that ...
,
Data integrity Data integrity is the maintenance of, and the assurance of, data accuracy and consistency over its entire life-cycle and is a critical aspect to the design, implementation, and usage of any system that stores, processes, or retrieves data. The ter ...
,
Reliability (computer networking) In computer networking, a reliable protocol is a communication protocol that notifies the sender whether or not the delivery of data to intended recipients was successful. Reliability is a synonym for assurance, which is the term used by the ...
, and Reliability (research methods) * Strategy Markup Language (StratML) * Structured document *
Tag (metadata) In information systems, a tag is a keyword or term assigned to a piece of information (such as an Internet bookmark, multimedia, database record, or computer file). This kind of metadata helps describe an item and allows it to be found agai ...
*
Universal Business Language Universal Business Language (UBL) is an open library of standard electronic XML business documents for procurement and transportation such as purchase orders, invoices, transport logistics and waybills. UBL was developed by an OASIS Technical C ...
(UBL) *
XBRL XBRL (eXtensible Business Reporting Language) is a freely available and global framework for exchanging business information. XBRL allows the expression of semantic meaning commonly required in business reporting. The language is XML-based an ...
(eXtensible Business Reporting Language)


References

{{reflist


External links


OMB M-13-13
Open Data Policy: Managing Information as an Asset, which requires agencies to use open, machine-readable, data format standards

January 2005, which outlines the characteristics of trustworthy records.
Driving a Stake in the Heart of the Capone Consultancy Method of Records Management: Best Practices for Correcting Non-Records Non-Policy Nonsense
March 9, 2015 * The U.S. Code, which includes the term "machine-readable
over 50 times
as of September 10, 2016 __notoc__ Data management Records management