HOME

TheInfoList



OR:

Machine-readable data, or computer-readable data, is
data In the pursuit of knowledge, data (; ) is a collection of discrete values that convey information, describing quantity, quality, fact, statistics, other basic units of meaning, or simply sequences of symbols that may be further interpreted ...
in a format that can be processed by a
computer A computer is a machine that can be programmed to Execution (computing), carry out sequences of arithmetic or logical operations (computation) automatically. Modern digital electronic computers can perform generic sets of operations known as C ...
. Machine-readable data must be
structured data A data model is an abstract model that organizes elements of data and standardizes how they relate to one another and to the properties of real-world entities. For instance, a data model may specify that the data element representing a car be c ...
. Attempts to create machine-readable data occurred as early as the 1960s. At the same time that seminal developments in machine-reading and natural-language processing were releasing (like Weizenbaum's
ELIZA ELIZA is an early natural language processing computer program created from 1964 to 1966 at the MIT Artificial Intelligence Laboratory by Joseph Weizenbaum. Created to demonstrate the superficiality of communication between humans and machines, E ...
), people were anticipating the success of machine-readable functionality and attempting to create machine-readable documents. One such example was musicologist
Nancy B. Reich Nancy Bassen Reich (July 3, 1924 in New York City - January 31, 2019 in Ossining, NY) was an American musicologist, most renowned for her 1985 biography of Clara Schumann. Biography She attended the High School of Music and Art, where she playe ...
's creation of a machine-readable catalog of composer William Jay Sydeman's works in 1966. In the United States, the OPEN Government Data Act of 14 January 2019 defines machine-readable data as "data in a format that can be easily processed by a computer without human intervention while ensuring no semantic meaning is lost." The law directs U.S. federal agencies to publish public data in such a manner, ensuring that "any public data asset of the agency is machine-readable". Machine-readable data may be classified into two groups: human-readable data that is marked up so that it can also be read by machines (e.g.
microformat Microformats (μF) are a set of defined HTML classes created to serve as consistent and descriptive metadata about an element, designating it as representing a certain type of data (such as contact information, geographic coordinates, events ...
s, RDFa,
HTML The HyperText Markup Language or HTML is the standard markup language for documents designed to be displayed in a web browser. It can be assisted by technologies such as Cascading Style Sheets (CSS) and scripting languages such as JavaScri ...
), and
data file A data file is a computer file which stores data to be used by a computer application or system, including input and output data. A data file usually does not contain instructions or code to be executed (that is, a computer program). Most of the ...
formats intended principally for processing by machines ( CSV, RDF,
XML Extensible Markup Language (XML) is a markup language and file format for storing, transmitting, and reconstructing arbitrary data. It defines a set of rules for encoding documents in a format that is both human-readable and machine-readable ...
,
JSON JSON (JavaScript Object Notation, pronounced ; also ) is an open standard file format and data interchange format that uses human-readable text to store and transmit data objects consisting of attribute–value pairs and arrays (or other ser ...
). These formats are only machine readable if the data contained within them is formally structured; exporting a CSV file from a badly structured spreadsheet does not meet the definition. ''Machine readable'' is not synonymous with ''digitally accessible''. A digitally accessible document may be online, making it easier for humans to access via computers, but its content is much harder to extract, transform, and process via computer programming logic if it is not machine-readable.
Extensible Markup Language Extensible Markup Language (XML) is a markup language and file format for storing, transmitting, and reconstructing arbitrary data. It defines a set of rules for encoding documents in a format that is both human-readable and machine-readable. T ...
(XML) is designed to be both human- and machine-readable, and Extensible Stylesheet Language Transformation (XSLT) is used to improve the presentation of the data for human readability. For example, XSLT can be used to automatically render XML in
Portable Document Format Portable Document Format (PDF), standardized as ISO 32000, is a file format developed by Adobe in 1992 to present documents, including text formatting and images, in a manner independent of application software, hardware, and operating systems. ...
(
PDF Portable Document Format (PDF), standardized as ISO 32000, is a file format developed by Adobe in 1992 to present documents, including text formatting and images, in a manner independent of application software, hardware, and operating systems. ...
). Machine-readable data can be automatically transformed for human-readability but, generally speaking, the reverse is not true. For purposes of implementation of the
Government Performance and Results Act The Government Performance and Results Act of 1993 (GPRA) () is a United States law enacted in 1993,Congress, U. S., and An Act. "Government Performance and Results Act of 1993." In ''103rd Congress. Congressional Record''. 1993. one of a series o ...
(GPRA) Modernization Act, the
Office of Management and Budget The Office of Management and Budget (OMB) is the largest office within the Executive Office of the President of the United States (EOP). OMB's most prominent function is to produce the president's budget, but it also examines agency programs, pol ...
(OMB) defines "machine readable format" as follows: "Format in a standard computer language (not English text) that can be read automatically by a web browser or computer system. (e.g.; xml). Traditional word processing documents and portable document format (PDF) files are easily read by humans but typically are difficult for machines to interpret. Other formats such as extensible markup language (
XML Extensible Markup Language (XML) is a markup language and file format for storing, transmitting, and reconstructing arbitrary data. It defines a set of rules for encoding documents in a format that is both human-readable and machine-readable ...
), (
JSON JSON (JavaScript Object Notation, pronounced ; also ) is an open standard file format and data interchange format that uses human-readable text to store and transmit data objects consisting of attribute–value pairs and arrays (or other ser ...
), or spreadsheets with header columns that can be exported as comma separated values (CSV) are machine readable formats. As HTML is a structural markup language, discreetly labeling parts of the document, computers are able to gather document components to assemble tables of contents, outlines, literature search bibliographies, etc. It is possible to make traditional word processing documents and other formats machine readable but the documents must include enhanced structural elements."OMB Circular A-11, Part 6
, Preparation, Submission, and Execution of the Budget


See also

*
Open data Open data is data that is openly accessible, exploitable, editable and shared by anyone for any purpose. Open data is licensed under an open license. The goals of the open data movement are similar to those of other "open(-source)" movements ...
*
Linked data In computing, linked data (often capitalized as Linked Data) is structured data which is interlinked with other data so it becomes more useful through semantic queries. It builds upon standard Web technologies such as HTTP, RDF and URIs, but ...
*
Machine-readable document A machine-readable document is a document whose content can be readily processed by computers. Such documents are distinguished from machine-readable data by virtue of having sufficient structure to provide the necessary context to support the bu ...
*
Human-readable medium A human-readable medium or human-readable format is any encoding of data or information that can be naturally read by humans. In computing, ''human-readable'' data is often encoded as ASCII or Unicode text, rather than as binary data. In most ...
* Semantic Web


References

Data management {{Comp-stub