metadata
   HOME

TheInfoList



OR:

Metadata is "
data In the pursuit of knowledge, data (; ) is a collection of discrete values that convey information, describing quantity, quality, fact, statistics, other basic units of meaning, or simply sequences of symbols that may be further interpreted. ...
that provides information about other data", but not the content of the data, such as the text of a message or the image itself. There are many distinct types of metadata, including: * Descriptive metadata – the descriptive information about a resource. It is used for discovery and identification. It includes elements such as title, abstract, author, and keywords. * Structural metadata – metadata about containers of data and indicates how compound objects are put together, for example, how pages are ordered to form chapters. It describes the types, versions, relationships, and other characteristics of digital materials. * Administrative metadata – the information to help manage a resource, like resource type, permissions, and when and how it was created. * Reference metadata – the information about the contents and quality of
statistical data In the pursuit of knowledge, data (; ) is a collection of discrete values that convey information, describing quantity, quality, fact, statistics, other basic units of meaning, or simply sequences of symbols that may be further interpreted ...
. * Statistical metadata – also called process data, may describe processes that collect, process, or produce statistical data. * Legal metadata – provides information about the creator, copyright holder, and public licensing, if provided. Metadata is not strictly bounded to one of these categories, as it can describe a piece of data in many other ways.


History

Metadata has various purposes. It can help users find relevant information and discover resources. It can also help organize electronic resources, provide digital identification, and archive and preserve resources. Metadata allows users to access resources by "allowing resources to be found by relevant criteria, identifying resources, bringing similar resources together, distinguishing dissimilar resources, and giving location information". Metadata of
telecommunication Telecommunication is the transmission of information by various types of technologies over wire, radio, optical, or other electromagnetic systems. It has its origin in the desire of humans for communication over a distance greater than that fe ...
activities including
Internet The Internet (or internet) is the global system of interconnected computer networks that uses the Internet protocol suite (TCP/IP) to communicate between networks and devices. It is a '' network of networks'' that consists of private, pub ...
traffic is very widely collected by various national governmental organizations. This data is used for the purposes of
traffic analysis Traffic analysis is the process of intercepting and examining messages in order to deduce information from patterns in communication, it can be performed even when the messages are encrypted. In general, the greater the number of messages observed ...
and can be used for
mass surveillance Mass surveillance is the intricate surveillance of an entire or a substantial fraction of a population in order to monitor that group of citizens. The surveillance is often carried out by local and federal governments or governmental organizatio ...
. Metadata was traditionally used in the card catalogs of
libraries A library is a collection of materials, books or media that are accessible for use and not just for display purposes. A library provides physical (hard copies) or digital access (soft copies) materials, and may be a physical location or a vi ...
until the 1980s when libraries converted their catalog data to digital
databases In computing, a database is an organized collection of data stored and accessed electronically. Small databases can be stored on a file system, while large databases are hosted on computer clusters or cloud storage. The design of databases span ...
. In the 2000s, as data and information were increasingly stored digitally, this digital data was described using metadata standards. The first description of "meta data" for computer systems is purportedly noted by MIT's Center for International Studies experts David Griffel and Stuart McIntosh in 1967: "In summary then, we have statements in an object language about subject descriptions of data and token codes for the data. We also have statements in a meta language describing the data relationships and transformations, and ought/is relations between norm and data." Unique metadata standards exist for different disciplines (e.g.,
museum A museum ( ; plural museums or, rarely, musea) is a building or institution that Preservation (library and archival science), cares for and displays a collection (artwork), collection of artifacts and other objects of artistic, culture, cultu ...
collections,
digital audio file An audio file format is a file format for storing digital audio data on a computer system. The bit layout of the audio data (excluding metadata) is called the audio coding format and can be uncompressed, or compressed to reduce the file size, ofte ...
s,
website A website (also written as a web site) is a collection of web pages and related content that is identified by a common domain name and published on at least one web server. Examples of notable websites are Google, Facebook, Amazon, and Wikipe ...
s, etc.). Describing the contents and context of data or
data files A data file is a computer file which stores data to be used by a computer application or system, including input and output data. A data file usually does not contain instructions or code to be executed (that is, a computer program). Most of the ...
increases its usefulness. For example, a web page may include metadata specifying what software language the page is written in (e.g., HTML), what tools were used to create it, what subjects the page is about, and where to find more information about the subject. This metadata can automatically improve the reader's experience and make it easier for users to find the web page online. A CD may include metadata providing information about the musicians, singers, and songwriters whose work appears on the disc. In many countries, government organizations routinely store metadata about emails, telephone calls, web pages, video traffic, IP connections, and cell phone locations.


Definition

Metadata means "data about data". Metadata is defined as the data providing information about one or more aspects of the data; it is used to summarize basic information about data that can make tracking and working with specific data easier. Some examples include: * Means of creation of the data * Purpose of the data * Time and date of creation * Creator or author of the data * Location on a computer network where the data was created *
Standards Standard may refer to: Symbols * Colours, standards and guidons, kinds of military signs * Standard (emblem), a type of a large symbol or emblem used for identification Norms, conventions or requirements * Standard (metrology), an object th ...
used * File size * Data quality * Source of the data * Process used to create the data For example, a
digital image A digital image is an image composed of picture elements, also known as ''pixels'', each with ''finite'', '' discrete quantities'' of numeric representation for its intensity or gray level that is an output from its two-dimensional functions f ...
may include metadata that describes the size of the image, its color depth, resolution, when it was created, the shutter speed, and other data. A text document's metadata may contain information about how long the document is, who the author is, when the document was written, and a short summary of the document. Metadata within web pages can also contain descriptions of page content, as well as key words linked to the content. These links are often called "Metatags", which were used as the primary factor in determining order for a web search until the late 1990s. The reliance of metatags in web searches was decreased in the late 1990s because of "keyword stuffing", whereby metatags were being largely misused to trick search engines into thinking some websites had more relevance in the search than they really did. Metadata can be stored and managed in a
database In computing, a database is an organized collection of data stored and accessed electronically. Small databases can be stored on a file system, while large databases are hosted on computer clusters or cloud storage. The design of databases span ...
, often called a
metadata registry A metadata registry is a central location in an organization where metadata definitions are stored and maintained in a controlled method. A metadata repository is the database where metadata is stored. The registry also adds relationships with ...
or
metadata repository A metadata repository is a database created to store metadata. Metadata is information about the structures that contain the actual data. Metadata is often said to be "data about data", but this is misleading. Data profiles are an example of actu ...
. However, without context and a point of reference, it might be impossible to identify metadata just by looking at it. For example: by itself, a database containing several numbers, all 13 digits long could be the results of calculations or a list of numbers to plug into an without any other context, the numbers themselves can be perceived as the data. But if given the context that this database is a log of a book collection, those 13-digit numbers may now be identified as information that refers to the book, but is not itself the information within the book. The term "metadata" was coined in 1968 by Philip Bagley, in his book "Extension of Programming Language Concepts" where it is clear that he uses the term in the ISO 11179 "traditional" sense, which is "structural metadata" i.e. "data about the containers of data"; rather than the alternative sense "content about individual instances of data content" or metacontent, the type of data usually found in library catalogs. Since then the fields of information management, information science, information technology, librarianship, and
GIS A geographic information system (GIS) is a type of database containing geographic data (that is, descriptions of phenomena for which location is relevant), combined with software tools for managing, analyzing, and visualizing those data. In a ...
have widely adopted the term. In these fields, the word ''metadata'' is defined as "data about data". While this is the generally accepted definition, various disciplines have adopted their own more specific explanation and uses of the term. ''
Slate Slate is a fine-grained, foliation (geology), foliated, homogeneous metamorphic rock derived from an original shale-type sedimentary rock composed of clay or volcano, volcanic ash (volcanic), ash through low-grade regional metamorphism. It is t ...
'' reported in 2013 that the United States government's interpretation of "metadata" could be broad, and might include message content such as the subject lines of emails.


Types

While the metadata application is manifold, covering a large variety of fields, there are specialized and well-accepted models to specify types of metadata.
Bretherton Bretherton is a small village and civil parish in the Borough of Chorley, Lancashire, England, situated to the south west of Leyland and east of Tarleton. The population of the civil parish at the 2011 census was 669. Its name suggests pre-con ...
& Singley (1994) distinguish between two distinct classes: structural/control metadata and guide metadata. ''Structural metadata'' describes the structure of database objects such as tables, columns, keys and indexes. ''Guide metadata'' helps humans find specific items and is usually expressed as a set of keywords in a natural language. According to
Ralph Kimball Ralph Kimball (born July 18, 1944) is an author on the subject of data warehousing and business intelligence. He is one of the original architects of data warehousing and is known for long-term convictions that data warehouses must be designed to b ...
, metadata can be divided into three categories: ''technical metadata'' (or internal metadata), ''business metadata'' (or external metadata), and ''process metadata''.
NISO The National Information Standards Organization (NISO; ) is a United States non-profit standards organization that develops, maintains and publishes technical standards related to publishing, bibliographic and library applications. It was found ...
distinguishes three types of metadata: descriptive, structural, and administrative. ''Descriptive metadata'' is typically used for discovery and identification, as information to search and locate an object, such as title, authors, subjects, keywords, and publisher. ''Structural metadata'' describes how the components of an object are organized. An example of structural metadata would be how pages are ordered to form chapters of a book. Finally, ''administrative metadata'' gives information to help manage the source. Administrative metadata refers to the technical information, such as file type, or when and how the file was created. Two sub-types of administrative metadata are rights management metadata and preservation metadata. ''Rights management metadata'' explains
intellectual property rights Intellectual property (IP) is a category of property that includes intangible creations of the human intellect. There are many types of intellectual property, and some countries recognize more than others. The best-known types are patents, cop ...
, while ''preservation metadata'' contains information to preserve and save a resource. Statistical data repositories have their own requirements for metadata in order to describe not only the source and quality of the data but also what statistical processes were used to create the data, which is of particular importance to the statistical community in order to both validate and improve the process of statistical data production. An additional type of metadata beginning to be more developed is ''accessibility metadata''. Accessibility metadata is not a new concept to libraries; however, advances in universal design have raised its profile. Projects like Cloud4All and GPII identified the lack of common terminologies and models to describe the needs and preferences of users and information that fits those needs as a major gap in providing universal access solutions. Those types of information are accessibility metadata.Schema.org
has incorporated several accessibility properties based on IMS Global Access for All Information Model Data Element Specification. The Wiki pag
WebSchemas/Accessibility
lists several properties and their values. While the efforts to describe and standardize the varied accessibility needs of information seekers are beginning to become more robust, their adoption into established metadata schemas has not been as developed. For example, while Dublin Core (DC)'s "audience" and MARC 21's "reading level" could be used to identify resources suitable for users with dyslexia and DC's "format" could be used to identify resources available in braille, audio, or large print formats, there is more work to be done.


Structures

Metadata (metacontent) or, more correctly, the vocabularies used to assemble metadata (metacontent) statements, is typically structured according to a standardized concept using a well-defined metadata scheme, including metadata standards and metadata models. Tools such as controlled vocabularies, taxonomies,
thesauri A thesaurus (plural ''thesauri'' or ''thesauruses'') or synonym dictionary is a reference work for finding synonyms and sometimes antonyms of words. They are often used by writers to help find the best word to express an idea: Synonym dictionar ...
, data dictionaries, and metadata registries can be used to apply further standardization to the metadata. Structural metadata commonality is also of paramount importance in
data model A data model is an abstract model that organizes elements of data and Standardization, standardizes how they relate to one another and to the properties of real-world Entity, entities. For instance, a data model may specify that the data element ...
development and in
database design Database design is the organization of data according to a database model. The designer determines what data must be stored and how the data elements interrelate. With this information, they can begin to fit the data to the database model.Teorey, T ...
.


Syntax

Metadata (metacontent) syntax refers to the rules created to structure the fields or elements of metadata (metacontent). A single metadata scheme may be expressed in a number of different markup or programming languages, each of which requires a different syntax. For example, Dublin Core may be expressed in plain text,
HTML The HyperText Markup Language or HTML is the standard markup language for documents designed to be displayed in a web browser. It can be assisted by technologies such as Cascading Style Sheets (CSS) and scripting languages such as JavaScript ...
, XML, and RDF. A common example of (guide) metacontent is the bibliographic classification, the subject, the Dewey Decimal class number. There is always an implied statement in any "classification" of some object. To classify an object as, for example, Dewey class number 514 (Topology) (i.e. books having the number 514 on their spine) the implied statement is: "<514>". This is a subject-predicate-object triple, or more importantly, a class-attribute-value triple. The first 2 elements of the triple (class, attribute) are pieces of some structural metadata having a defined semantic. The third element is a value, preferably from some controlled vocabulary, some reference (master) data. The combination of the metadata and master data elements results in a statement which is a metacontent statement i.e. "metacontent = metadata + master data". All of these elements can be thought of as "vocabulary". Both metadata and master data are vocabularies that can be assembled into metacontent statements. There are many sources of these vocabularies, both meta and master data: UML, EDIFACT, XSD, Dewey/UDC/LoC, SKOS, ISO-25964, Pantone, Linnaean Binomial Nomenclature, etc. Using controlled vocabularies for the components of metacontent statements, whether for indexing or finding, is endorsed by
ISO 25964 ISO 25964 is the international standard for thesauri, published in two parts as follows: ''ISO 25964'' '' Information and documentation - Thesauri and interoperability with other vocabularies'' ''Part 1: Thesauri for information ret ...
: "If both the indexer and the searcher are guided to choose the same term for the same concept, then relevant documents will be retrieved." This is particularly relevant when considering search engines of the internet, such as Google. The process indexes pages and then matches text strings using its complex algorithm; there is no intelligence or "inferencing" occurring, just the illusion thereof.


Hierarchical, linear, and planar schemata

Metadata schemata can be hierarchical in nature where relationships exist between metadata elements and elements are nested so that parent-child relationships exist between the elements. An example of a hierarchical metadata schema is the IEEE LOM schema, in which metadata elements may belong to a parent metadata element. Metadata schemata can also be one-dimensional, or linear, where each element is completely discrete from other elements and classified according to one dimension only. An example of a linear metadata schema is the
Dublin Core 220px, Logo image of DCMI, which formulates Dublin Core The Dublin Core, also known as the Dublin Core Metadata Element Set (DCMES), is a set of fifteen "core" elements (properties) for describing resources. This fifteen-element Dublin Core has ...
schema, which is one-dimensional. Metadata schemata are often 2 dimensional, or planar, where each element is completely discrete from other elements but classified according to 2 orthogonal dimensions.


Granularity

The degree to which the data or metadata is structured is referred to as "granularity". "Granularity" refers to how much detail is provided. Metadata with a high granularity allows for deeper, more detailed, and more structured information and enables a greater level of technical manipulation. A lower level of granularity means that metadata can be created for considerably lower costs but will not provide as detailed information. The major impact of granularity is not only on creation and capture, but moreover on maintenance costs. As soon as the metadata structures become outdated, so too is the access to the referred data. Hence granularity must take into account the effort to create the metadata as well as the effort to maintain it.


Hypermapping

In all cases where the metadata schemata exceed the planar depiction, some type of hypermapping is required to enable display and view of metadata according to chosen aspect and to serve special views. Hypermapping frequently applies to layering of geographical and geological information overlays.


Standards

International standards apply to metadata. Much work is being accomplished in the national and international standards communities, especially
ANSI The American National Standards Institute (ANSI ) is a private non-profit organization that oversees the development of voluntary consensus standards for products, services, processes, systems, and personnel in the United States. The organi ...
(American National Standards Institute) and
ISO ISO is the most common abbreviation for the International Organization for Standardization. ISO or Iso may also refer to: Business and finance * Iso (supermarket), a chain of Danish supermarkets incorporated into the SuperBest chain in 2007 * Is ...
(International Organization for Standardization) to reach a consensus on standardizing metadata and registries. The core metadata registry standard is
ISO ISO is the most common abbreviation for the International Organization for Standardization. ISO or Iso may also refer to: Business and finance * Iso (supermarket), a chain of Danish supermarkets incorporated into the SuperBest chain in 2007 * Is ...
/ IEC 11179 Metadata Registries (MDR), the framework for the standard is described in ISO/IEC 11179-1:2004. A new edition of Part 1 is in its final stage for publication in 2015 or early 2016. It has been revised to align with the current edition of Part 3, ISO/IEC 11179-3:2013 which extends the MDR to support the registration of Concept Systems. (see
ISO/IEC 11179 The ISO/IEC 11179 Metadata Registry (MDR) standard is an international ISO/IEC standard for representing metadata for an organization in a metadata registry. It documents the standardization and registration of metadata to make data understandabl ...
). This standard specifies a schema for recording both the meaning and technical structure of the data for unambiguous usage by humans and computers. ISO/IEC 11179 standard refers to metadata as information objects about data, or "data about data". In ISO/IEC 11179 Part-3, the information objects are data about Data Elements, Value Domains, and other reusable semantic and representational information objects that describe the meaning and technical details of a data item. This standard also prescribes the details for a metadata registry, and for registering and administering the information objects within a Metadata Registry. ISO/IEC 11179 Part 3 also has provisions for describing compound structures that are derivations of other data elements, for example through calculations, collections of one or more data elements, or other forms of derived data. While this standard describes itself originally as a "data element" registry, its purpose is to support describing and registering metadata content independently of any particular application, lending the descriptions to being discovered and reused by humans or computers in developing new applications, databases, or for analysis of data collected in accordance with the registered metadata content. This standard has become the general basis for other kinds of metadata registries, reusing and extending the registration and administration portion of the standard. The Geospatial community has a tradition of specialized
geospatial metadata Geospatial metadata (also geographic metadata) is a type of metadata applicable to geographic data and information. Such objects may be stored in a geographic information system (GIS) or may simply be documents, data-sets, images or other objects, ...
standards, particularly building on traditions of map- and image-libraries and catalogs. Formal metadata is usually essential for geospatial data, as common text-processing approaches are not applicable. The
Dublin Core 220px, Logo image of DCMI, which formulates Dublin Core The Dublin Core, also known as the Dublin Core Metadata Element Set (DCMES), is a set of fifteen "core" elements (properties) for describing resources. This fifteen-element Dublin Core has ...
metadata terms are a set of vocabulary terms that can be used to describe resources for the purposes of discovery. The original set of 15 classic metadata terms, known as the Dublin Core Metadata Element Set are endorsed in the following standards documents: * IETF RFC 5013 * ISO Standard 15836-2009 * NISO Standard Z39.85. The W3C Data Catalog Vocabulary (DCAT) is an RDF vocabulary that supplements Dublin Core with classes for Dataset, Data Service, Catalog, and Catalog Record. DCAT also uses elements from FOAF, PROV-O, and OWL-Time. DCAT provides an RDF model to support the typical structure of a catalog that contains records, each describing a dataset or service. Although not a standard,
Microformat Microformats (μF) are a set of defined HTML classes created to serve as consistent and descriptive metadata about an element, designating it as representing a certain type of data (such as contact information, geographic coordinates, event ...
(also mentioned in the section metadata on the internet below) is a web-based approach to semantic markup which seeks to re-use existing HTML/XHTML tags to convey metadata. Microformat follows XHTML and HTML standards but is not a standard in itself. One advocate of microformats,
Tantek Çelik Tantek Çelik is a Turkish-American computer scientist, currently the Web standards lead at Mozilla Corporation. Çelik was previously the chief technologist at Technorati. He worked on microformats and is one of the principal editors of severa ...
, characterized a problem with alternative approaches:


Use


Photographs

Metadata may be written into a
digital photo Digital photography uses cameras containing arrays of electronic photodetectors interfaced to an analog-to-digital converter (ADC) to produce images focused by a lens, as opposed to an exposure on photographic film. The digitized image is sto ...
file that will identify who owns it, copyright and contact information, what brand or model of camera created the file, along with exposure information (shutter speed, f-stop, etc.) and descriptive information, such as keywords about the photo, making the file or image searchable on a computer and/or the Internet. Some metadata is created by the camera such as, color space, color channels, exposure time, and aperture (EXIF), while some is input by the photographer and/or software after downloading to a computer. Most digital cameras write metadata about the model number, shutter speed, etc., and some enable you to edit it; this functionality has been available on most Nikon DSLRs since the
Nikon D3 The Nikon D3 is a 12.0-megapixel professional-grade full frame (35 mm) digital single lens reflex camera (DSLR) announced by the Nikon Corporation on 23 August 2007 along with the Nikon D300 DX format camera. It was Nikon's first full-frame ...
, on most new Canon cameras since the
Canon EOS 7D The Canon EOS 7D is an APS-C digital single-lens reflex camera made by Canon. It was announced on 1 September 2009 with a suggested retail price of US$1,699. Among its features are an 18.0 effective megapixel CMOS sensor, HD video recording, its ...
, and on most Pentax DSLRs since the Pentax K-3. Metadata can be used to make organizing in post-production easier with the use of key-wording. Filters can be used to analyze a specific set of photographs and create selections on criteria like rating or capture time. On devices with geolocation capabilities like
GPS The Global Positioning System (GPS), originally Navstar GPS, is a satellite-based radionavigation system owned by the United States government and operated by the United States Space Force. It is one of the global navigation satellite sy ...
(smartphones in particular), the location the photo was taken from may also be included. Photographic Metadata Standards are governed by organizations that develop the following standards. They include, but are not limited to: *
IPTC Information Interchange Model The Information Interchange Model (IIM) is a file structure and set of metadata attributes that can be applied to text, images and other media types. It was developed in the early 1990s by the International Press Telecommunications Council (IPTC) ...
IIM (International Press Telecommunications Council) * IPTC Core Schema for XMP * XMP – Extensible Metadata Platform (an ISO standard) *
Exif Exchangeable image file format (officially Exif, according to JEIDA/JEITA/CIPA specifications) is a standard that specifies formats for images, sound, and ancillary tags used by digital cameras (including smartphones), scanners and other syste ...
– Exchangeable image file format, Maintained by CIPA (Camera & Imaging Products Association) and published by JEITA (Japan Electronics and Information Technology Industries Association) *
Dublin Core 220px, Logo image of DCMI, which formulates Dublin Core The Dublin Core, also known as the Dublin Core Metadata Element Set (DCMES), is a set of fifteen "core" elements (properties) for describing resources. This fifteen-element Dublin Core has ...
(Dublin Core Metadata Initiative – DCMI) * PLUS (Picture Licensing Universal System)
VRA Core
(Visual Resource Association)


Telecommunications

Information on the times, origins and destinations of phone calls, electronic messages, instant messages, and other modes of telecommunication, as opposed to message content, is another form of metadata. Bulk collection of this
call detail record A call detail record (CDR) is a data record produced by a telephone exchange or other telecommunications equipment that documents the details of a telephone call or other telecommunications transactions (e.g., text message) that passes through that ...
metadata by intelligence agencies has proven controversial after disclosures by
Edward Snowden Edward Joseph Snowden (born June 21, 1983) is an American and naturalized Russian former computer intelligence consultant who leaked highly classified information from the National Security Agency (NSA) in 2013, when he was an employee and su ...
of the fact that certain Intelligence agencies such as the
NSA The National Security Agency (NSA) is a national-level intelligence agency of the United States Department of Defense, under the authority of the Director of National Intelligence (DNI). The NSA is responsible for global monitoring, collectio ...
had been (and perhaps still are) keeping online metadata on millions of internet users for up to a year, regardless of whether or not they
ver Ver or VER may refer to: * Voluntary Export Restraints, in international trade * VER, the IATA airport code for General Heriberto Jara International Airport * Volk's Electric Railway, Brighton, England * VerPublishing, of the German group VDM Pu ...
were persons of interest to the agency.


Video

Metadata is particularly useful in video, where information about its contents (such as transcripts of conversations and text descriptions of its scenes) is not directly understandable by a computer, but where an efficient search of the content is desirable. This is particularly useful in video applications such as
Automatic Number Plate Recognition Automatic number-plate recognition (ANPR; see also other names below) is a technology that uses optical character recognition on images to read vehicle registration plates to create vehicle location data. It can use existing closed-circuit te ...
and Vehicle Recognition Identification software, wherein license plate data is saved and used to create reports and alerts. There are 2 sources in which video metadata is derived: (1) operational gathered metadata, that is information about the content produced, such as the type of equipment, software, date, and location; (2) human-authored metadata, to improve search engine visibility, discoverability, audience engagement, and providing advertising opportunities to video publishers. Today most professional video editing software has access to metadata. Avid's MetaSync and Adobe's Bridge are 2 prime examples of this.


Geospatial metadata

Geospatial metadata relates to Geographic Information Systems (GIS) files, maps, images, and other data that is location-based. Metadata is used in GIS to document the characteristics and attributes of geographic data, such as database files and data that is developed within a GIS. It includes details like who developed the data, when it was collected, how it was processed, and what formats it's available in, and then delivers the context for the data to be used effectively.


Creation

Metadata can be created either by automated information processing or by manual work. Elementary metadata captured by computers can include information about when an object was created, who created it, when it was last updated, file size, and file extension. In this context an ''object'' refers to any of the following: * A physical item such as a book, CD, DVD, a paper map, chair, table, flower pot, etc. * An electronic file such as a digital image, digital photo, electronic document, program file, database table, etc. A metadata engine collects, stores and analyzes information about data and metadata (data about data) in use within a domain.


Data virtualization

Data virtualization emerged in the 2000s as the new software technology to complete the virtualization "stack" in the enterprise. Metadata is used in data virtualization servers which are enterprise infrastructure components, alongside database and application servers. Metadata in these servers is saved as persistent repository and describe
business object A business object is an entity within a multi-tiered software application that works in conjunction with the data access and business logic layers to transport data. For example, a "Manager" would be a ''business object'' where its attributes c ...
s in various enterprise systems and applications. Structural metadata commonality is also important to support data virtualization.


Statistics and census services

Standardization and harmonization work has brought advantages to industry efforts to build metadata systems in the statistical community. Several metadata guidelines and standards such as the European Statistics Code of Practice and ISO 17369:2013 ( Statistical Data and Metadata Exchange or SDMX) provide key principles for how businesses, government bodies, and other entities should manage statistical data and metadata. Entities such as
Eurostat Eurostat ('European Statistical Office'; DG ESTAT) is a Directorate-General of the European Commission located in the Kirchberg quarter of Luxembourg City, Luxembourg. Eurostat's main responsibilities are to provide statistical information to ...
,
European System of Central Banks The European System of Central Banks (ESCB) is an institution that comprises the European Central Bank (ECB) and the national central banks (NCBs) of all 27 member states of the European Union (EU). Its objective is to ensure price stability t ...
, and the
U.S. Environmental Protection Agency The Environmental Protection Agency (EPA) is an independent executive agency of the United States federal government tasked with environmental protection matters. President Richard Nixon proposed the establishment of EPA on July 9, 1970; it b ...
have implemented these and other such standards and guidelines with the goal of improving "efficiency when managing statistical business processes".


Library and information science

Metadata has been used in various ways as a means of cataloging items in libraries in both digital and analog formats. Such data helps classify, aggregate, identify, and locate a particular book, DVD, magazine, or any object a library might hold in its collection. Until the 1980s, many library catalogs used 3x5 inch cards in file drawers to display a book's title, author, subject matter, and an abbreviated alpha-numeric string (
call number A library classification is a system of organization of knowledge by which library resources are arranged and ordered systematically. Library classifications are a notational system that represents the order of topics in the classification and al ...
) which indicated the physical location of the book within the library's shelves. The
Dewey Decimal System The Dewey Decimal Classification (DDC), colloquially known as the Dewey Decimal System, is a proprietary library classification system which allows new books to be added to a library in their appropriate location based on subject. Section 4. ...
employed by libraries for the classification of library materials by subject is an early example of metadata usage. The early paper catalog had information regarding whichever item was described on said card: title, author, subject, and a number as to where to find said item. Beginning in the 1980s and 1990s, many libraries replaced these paper file cards with computer databases. These computer databases make it much easier and faster for users to do keyword searches. Another form of older metadata collection is the use by the US Census Bureau of what is known as the "Long Form". The Long Form asks questions that are used to create demographic data to find patterns of distribution.
Libraries A library is a collection of materials, books or media that are accessible for use and not just for display purposes. A library provides physical (hard copies) or digital access (soft copies) materials, and may be a physical location or a vi ...
employ metadata in
library catalog A library catalog (or library catalogue in British English) is a register of all bibliographic items found in a library or group of libraries, such as a network of libraries at several locations. A catalog for a group of libraries is als ...
ues, most commonly as part of an Integrated Library Management System. Metadata is obtained by cataloging resources such as books, periodicals, DVDs, web pages or digital images. This data is stored in the integrated library management system, ILMS, using the
MARC Marc or MARC may refer to: People * Marc (given name), people with the first name * Marc (surname), people with the family name Acronyms * MARC standards, a data format used for library cataloging, * MARC Train, a regional commuter rail system o ...
metadata standard. The purpose is to direct patrons to the physical or electronic location of items or areas they seek as well as to provide a description of the item/s in question. More recent and specialized instances of library metadata include the establishment of
digital libraries A digital library, also called an online library, an internet library, a digital repository, or a digital collection is an online database of digital objects that can include text, still images, audio, video, digital documents, or other digital m ...
including
e-print In academic publishing, an eprint or e-print is a digital version of a research document (usually a journal article, but could also be a thesis, conference paper, book chapter, or a book) that is accessible online, usually as green open access, w ...
repositories and digital image libraries. While often based on library principles, the focus on non-librarian use, especially in providing metadata, means they do not follow traditional or common cataloging approaches. Given the custom nature of included materials, metadata fields are often specially created e.g. taxonomic classification fields, location fields, keywords, or copyright statement. Standard file information such as file size and format are usually automatically included. Library operation has for decades been a key topic in efforts toward
international standardization international standard is a technical standard developed by one or more international standards organization, standards organizations. International standards are available for consideration and use worldwide. The most prominent such organization ...
. Standards for metadata in digital libraries include
Dublin Core 220px, Logo image of DCMI, which formulates Dublin Core The Dublin Core, also known as the Dublin Core Metadata Element Set (DCMES), is a set of fifteen "core" elements (properties) for describing resources. This fifteen-element Dublin Core has ...
,
METS The New York Mets are an American professional baseball team based in the New York City borough of Queens. The Mets compete in Major League Baseball (MLB) as a member of the National League (NL) East division. They are one of two major league ...
, MODS,
DDI DDI may stand for: Companies and organizations * DD International, international TV channel in India * Development Dimensions International, a talent management company * Direct Democracy Ireland, a political party in Ireland * KDDI, formerly ...
, DOI,
URN An urn is a vase, often with a cover, with a typically narrowed neck above a rounded body and a footed pedestal. Describing a vessel as an "urn", as opposed to a vase or other terms, generally reflects its use rather than any particular shape or ...
, PREMIS schema, EML, and
OAI-PMH The Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH) is a protocol developed for harvesting metadata descriptions of records in an archive so that services can be built using metadata from many archives. An implementation of OAI ...
. Leading libraries in the world give hints on their metadata standards strategies. The use and creation of metadata in library and information science also include scientific publications:


In science

Metadata for scientific publications is often created by journal publishers and citation databases such as
PubMed PubMed is a free search engine accessing primarily the MEDLINE database of references and abstracts on life sciences and biomedical topics. The United States National Library of Medicine (NLM) at the National Institutes of Health maintain the ...
and Web of Science. The data contained within manuscripts or accompanying them as supplementary material is less often subject to metadata creation, though they may be submitted to e.g. biomedical databases after publication. The original authors and database curators then become responsible for metadata creation, with the assistance of automated processes. Comprehensive metadata for all experimental data is the foundation of the FAIR Guiding Principles, or the standards for ensuring research data are findable, accessible,
interoperable Interoperability is a characteristic of a product or system to work with other products or systems. While the term was initially defined for information technology or systems engineering services to allow for information exchange, a broader defi ...
, and reusable. Such metadata can then be utilized, complemented, and made accessible in useful ways. OpenAlex is a free online index of over 200 million scientific documents that integrates and provides metadata such as sources,
citation A citation is a reference to a source. More precisely, a citation is an abbreviated alphanumeric expression embedded in the body of an intellectual work that denotes an entry in the bibliographic references section of the work for the purpose of ...
s, author information,
scientific field The branches of science, also referred to as sciences, scientific fields or scientific disciplines, are commonly divided into three major groups: *Formal sciences: the study of formal systems, such as those under the branches of logic and math ...
s, and research topics. Its
API An application programming interface (API) is a way for two or more computer programs to communicate with each other. It is a type of software interface, offering a service to other pieces of software. A document or standard that describes how ...
and open source website can be used for metascience,
scientometrics Scientometrics is the field of study which concerns itself with measuring and analysing scholarly literature. Scientometrics is a sub-field of informetrics. Major research issues include the measurement of the impact of research papers and academi ...
, and novel tools that query this
semantic Semantics (from grc, σημαντικός ''sēmantikós'', "significant") is the study of reference, meaning, or truth. The term can be used to refer to subfields of several distinct disciplines, including philosophy, linguistics and comput ...
web of papers. Another project under development,
Scholia Scholia (singular scholium or scholion, from grc, σχόλιον, "comment, interpretation") are grammatical, critical, or explanatory comments – original or copied from prior commentaries – which are inserted in the margin of th ...
, uses the metadata of scientific publications for various visualizations and aggregation features such as providing a simple user interface summarizing literature about a specific feature of the SARS-CoV-2 virus using
Wikidata Wikidata is a collaboratively edited multilingual knowledge graph hosted by the Wikimedia Foundation. It is a common source of open data that Wikimedia projects such as Wikipedia, and anyone else, can use under the CC0 public domain license ...
's "main subject" property. In research labor, transparent metadata about authors' contributions to works have been proposed – e.g. the role played in the production of the paper, the level of contribution and the responsibilities. Moreover, various metadata about scientific outputs can be created or complemented – for instance, scite.ai attempts to track and link citations of papers as 'Supporting', 'Mentioning' or 'Contrasting' the study. Other examples include developments of alternative metrics – which, beyond providing help for assessment and findability, also aggregate many of the public discussions about a scientific paper on social media such as
Reddit Reddit (; stylized in all lowercase as reddit) is an American social news aggregation, content rating, and discussion website. Registered users (commonly referred to as "Redditors") submit content to the site such as links, text posts, image ...
, citations on Wikipedia, and reports about the study in the news media – and a call for showing whether or not the original findings are confirmed or could get reproduced.


In museums

Metadata in a museum context is the information that trained cultural documentation specialists, such as
archivist An archivist is an information professional who assesses, collects, organizes, preserves, maintains control over, and provides access to records and archives determined to have long-term value. The records maintained by an archivist can consis ...
s,
librarian A librarian is a person who works professionally in a library providing access to information, and sometimes social or technical programming, or instruction on information literacy to users. The role of the librarian has changed much over time, ...
s, museum
registrar A registrar is an official keeper of records made in a register. The term may refer to: Education * Registrar (education), an official in an academic institution who handles student records * Registrar of the University of Oxford, one of the sen ...
s and
curator A curator (from la, cura, meaning "to take care") is a manager or overseer. When working with cultural organizations, a curator is typically a "collections curator" or an "exhibitions curator", and has multifaceted tasks dependent on the parti ...
s, create to index, structure, describe, identify, or otherwise specify works of art, architecture, cultural objects and their images. Descriptive metadata is most commonly used in museum contexts for object identification and resource recovery purposes.


Usage

Metadata is developed and applied within collecting institutions and museums in order to: * Facilitate resource discovery and execute search queries. * Create digital archives that store information relating to various aspects of museum collections and cultural objects, and serve archival and managerial purposes. * Provide public audiences access to cultural objects through publishing digital content online.


Standards

Many museums and cultural heritage centers recognize that given the diversity of artworks and cultural objects, no single model or standard suffices to describe and catalog cultural works. For example, a sculpted Indigenous artifact could be classified as an artwork, an archaeological artifact, or an Indigenous heritage item. The early stages of standardization in archiving, description and cataloging within the museum community began in the late 1990s with the development of standards such as
Categories for the Description of Works of Art Categories for the Description of Works of Art (CDWA) describes the content of art databases by articulating a conceptual framework for describing and accessing information about works of art, architecture, other material culture, groups and collec ...
(CDWA), Spectrum, CIDOC Conceptual Reference Model (CRM), Cataloging Cultural Objects (CCO) and the CDWA Lite XML schema. These standards use
HTML The HyperText Markup Language or HTML is the standard markup language for documents designed to be displayed in a web browser. It can be assisted by technologies such as Cascading Style Sheets (CSS) and scripting languages such as JavaScript ...
and XML markup languages for machine processing, publication and implementation. The Anglo-American Cataloguing Rules (AACR), originally developed for characterizing books, have also been applied to cultural objects, works of art and architecture. Standards, such as the CCO, are integrated within a Museum's Collections Management System (CMS), a database through which museums are able to manage their collections, acquisitions, loans and conservation. Scholars and professionals in the field note that the "quickly evolving landscape of standards and technologies" creates challenges for cultural documentarians, specifically non-technically trained professionals. Most collecting institutions and museums use a relational database to categorize cultural works and their images. Relational databases and metadata work to document and describe the complex relationships amongst cultural objects and multi-faceted works of art, as well as between objects and places, people, and artistic movements. Relational database structures are also beneficial within collecting institutions and museums because they allow for archivists to make a clear distinction between cultural objects and their images; an unclear distinction could lead to confusing and inaccurate searches.


Cultural objects and artworks

An object's materiality, function, and purpose, as well as the size (e.g., measurements, such as height, width, weight), storage requirements (e.g., climate-controlled environment), and focus of the museum and collection, influence the descriptive depth of the data attributed to the object by cultural documentarians. The established institutional cataloging practices, goals, and expertise of cultural documentarians and database structure also influence the information ascribed to cultural objects and the ways in which cultural objects are categorized. Additionally, museums often employ standardized commercial collection management software that prescribes and limits the ways in which archivists can describe artworks and cultural objects. As well, collecting institutions and museums use Controlled Vocabularies to describe cultural objects and artworks in their collections. Getty Vocabularies and the Library of Congress Controlled Vocabularies are reputable within the museum community and are recommended by CCO standards. Museums are encouraged to use controlled vocabularies that are contextual and relevant to their collections and enhance the functionality of their digital information systems. Controlled Vocabularies are beneficial within databases because they provide a high level of consistency, improving resource retrieval. Metadata structures, including controlled vocabularies, reflect the
ontologies In computer science and information science, an ontology encompasses a representation, formal naming, and definition of the categories, properties, and relations between the concepts, data, and entities that substantiate one, many, or all domains ...
of the systems from which they were created. Often the processes through which cultural objects are described and categorized through metadata in museums do not reflect the perspectives of the maker communities.


Museums and the Internet

Metadata has been instrumental in the creation of digital information systems and archives within museums and has made it easier for museums to publish digital content online. This has enabled audiences who might not have had access to cultural objects due to geographic or economic barriers to have access to them. In the 2000s, as more museums have adopted archival standards and created intricate databases, discussions about
Linked Data In computing, linked data (often capitalized as Linked Data) is structured data which is interlinked with other data so it becomes more useful through semantic queries. It builds upon standard Web technologies such as HTTP, RDF and URIs, but ...
between museum databases have come up in the museum, archival, and library science communities. Collection Management Systems (CMS) and
Digital Asset Management Digital asset management (DAM) and the implementation of its use as a computer application is required in the collection of digital assets to ensure that the owner, and possibly their delegates, can perform operations on the data files. Termi ...
tools can be local or shared systems.
Digital Humanities Digital humanities (DH) is an area of scholarly activity at the intersection of computing or digital technologies and the disciplines of the humanities. It includes the systematic use of digital resources in the humanities, as well as the analy ...
scholars note many benefits of interoperability between museum databases and collections, while also acknowledging the difficulties of achieving such interoperability.


Law


United States

Problems involving metadata in
litigation - A lawsuit is a proceeding by a party or parties against another in the civil court of law. The archaic term "suit in law" is found in only a small number of laws still in effect today. The term "lawsuit" is used in reference to a civil act ...
in the
United States The United States of America (U.S.A. or USA), commonly known as the United States (U.S. or US) or America, is a country primarily located in North America. It consists of 50 states, a federal district, five major unincorporated territo ...
are becoming widespread. Courts have looked at various questions involving metadata, including the
discoverability Discoverability is the degree to which something, especially a piece of content or information, can be found in a search of a file, database, or other information system. Discoverability is a concern in library and information science, many aspect ...
of metadata by parties. The Federal Rules of Civil Procedure have specific rules for discovery of electronically-stored information, and subsequent case law applying those rules has elucidated on the litigant's duty to produce metadata when litigating in federal court. In October 2009, the
Arizona Supreme Court The Arizona Supreme Court is the state supreme court of the U.S. state of Arizona. Sitting in the Supreme Court building in downtown Phoenix, the court consists of a chief justice, a vice chief justice, and five associate justices. Each justice i ...
has ruled that metadata records are
public record Public records are documents or pieces of information that are not considered confidential and generally pertain to the conduct of government. For example, in California, when a couple fills out a marriage license application, they have the opti ...
. Document metadata have proven particularly important in legal environments in which litigation has requested metadata, that can include sensitive information detrimental to a certain party in court. Using
metadata removal tool Metadata removal tool or metadata scrubber is a type of privacy software built to protect the privacy of its users by removing potentially privacy-compromising metadata from files before they are shared with others, e.g., by sending them as e-m ...
s to "clean" or redact documents can mitigate the risks of unwittingly sending sensitive data. This process partially (see
data remanence Data remanence is the residual representation of digital data that remains even after attempts have been made to remove or erase the data. This residue may result from data being left intact by a nominal file deletion operation, by reformatting of ...
) protects law firms from potentially damaging leaking of sensitive data through
electronic discovery Electronic discovery (also ediscovery or e-discovery) refers to discovery in legal proceedings such as litigation, government investigations, or Freedom of Information Act requests, where the information sought is in electronic format (often refe ...
. Opinion polls have shown that 45% of Americans are "not at all confident" in the ability of social media sites to ensure their personal data is secure and 40% say that social media sites should not be able to store any information on individuals. 76% of Americans say that they are not confident that the information advertising agencies collect on them is secure and 50% say that online advertising agencies should not be allowed to record any of their information at all.


Australia

In Australia, the need to strengthen national security has resulted in the introduction of a new metadata storage law. This new law means that both security and policing agencies will be allowed to access up to 2 years of an individual's metadata, with the aim of making it easier to stop any terrorist attacks and serious crimes from happening.


In legislation

Legislative metadata has been the subject of some discussion i
law.gov
forums such as workshops held by the
Legal Information Institute The Legal Information Institute (LII) is a non-profit, public service of Cornell Law School that provides no-cost access to current American and international legal research sources online alaw.cornell.edu The organization is a pioneer in the de ...
at the Cornell Law School on 22 and 23 March 2010. The documentation for these forums is titled, "Suggested metadata practices for legislation and regulations". A handful of key points have been outlined by these discussions, section headings of which are listed as follows: * General Considerations * Document Structure * Document Contents * Metadata (elements of) * Layering * Point-in-time versus post-hoc


In healthcare

Australian medical research pioneered the definition of metadata for applications in health care. That approach offers the first recognized attempt to adhere to international standards in medical sciences instead of defining a proprietary standard under the
World Health Organization The World Health Organization (WHO) is a specialized agency of the United Nations responsible for international public health. The WHO Constitution states its main objective as "the attainment by all peoples of the highest possible level of h ...
(WHO) umbrella. The medical community yet did not approve of the need to follow metadata standards despite research that supported these standards.


In biomedical research

Research studies in the fields of biomedicine and
molecular biology Molecular biology is the branch of biology that seeks to understand the molecular basis of biological activity in and between cells, including biomolecular synthesis, modification, mechanisms, and interactions. The study of chemical and physi ...
frequently yield large quantities of data, including results of
genome In the fields of molecular biology and genetics, a genome is all the genetic information of an organism. It consists of nucleotide sequences of DNA (or RNA in RNA viruses). The nuclear genome includes protein-coding genes and non-coding gen ...
or meta-genome
sequencing In genetics and biochemistry, sequencing means to determine the primary structure (sometimes incorrectly called the primary sequence) of an unbranched biopolymer. Sequencing results in a symbolic linear depiction known as a sequence which suc ...
,
proteomics Proteomics is the large-scale study of proteins. Proteins are vital parts of living organisms, with many functions such as the formation of structural fibers of muscle tissue, enzymatic digestion of food, or synthesis and replication of DNA. I ...
data, and even notes or plans created during the course of research itself. Each data type involves its own variety of metadata and the processes necessary to produce these metadata. General metadata standards, such as ISA-Tab, allow researchers to create and exchange experimental metadata in consistent formats. Specific experimental approaches frequently have their own metadata standards and systems: metadata standards for
mass spectrometry Mass spectrometry (MS) is an analytical technique that is used to measure the mass-to-charge ratio of ions. The results are presented as a ''mass spectrum'', a plot of intensity as a function of the mass-to-charge ratio. Mass spectrometry is used ...
include
mzML Mass spectrometry is a scientific technique for measuring the mass-to-charge ratio of ions. It is often coupled to chromatographic techniques such as gas- or liquid chromatography and has found widespread adoption in the fields of analytical che ...
and SPLASH, while XML-based standards such as PDBML and SRA XML serve as standards for macromolecular structure and sequencing data, respectively. The products of biomedical research are generally realized as peer-reviewed manuscripts and these publications are yet another source of data .


Data warehousing

A
data warehouse In computing, a data warehouse (DW or DWH), also known as an enterprise data warehouse (EDW), is a system used for reporting and data analysis and is considered a core component of business intelligence. DWs are central repositories of integr ...
(DW) is a repository of an organization's electronically stored data. Data warehouses are designed to manage and store the data. Data warehouses differ from
business intelligence Business intelligence (BI) comprises the strategies and technologies used by enterprises for the data analysis and management of business information. Common functions of business intelligence technologies include reporting, online analytical pr ...
(BI) systems because BI systems are designed to use data to create reports and analyze the information, to provide strategic guidance to management. Metadata is an important tool in how data is stored in data warehouses. The purpose of a data warehouse is to house standardized, structured, consistent, integrated, correct, "cleaned" and timely data, extracted from various operational systems in an organization. The extracted data are integrated in the data warehouse environment to provide an enterprise-wide perspective. Data are structured in a way to serve the reporting and analytic requirements. The design of structural metadata commonality using a
data modeling Data modeling in software engineering is the process of creating a data model for an information system by applying certain formal techniques. Overview Data modeling is a process used to define and analyze data requirements needed to supp ...
method such as entity-relationship model diagramming is important in any data warehouse development effort. They detail metadata on each piece of data in the data warehouse. An essential component of a
data warehouse In computing, a data warehouse (DW or DWH), also known as an enterprise data warehouse (EDW), is a system used for reporting and data analysis and is considered a core component of business intelligence. DWs are central repositories of integr ...
/
business intelligence Business intelligence (BI) comprises the strategies and technologies used by enterprises for the data analysis and management of business information. Common functions of business intelligence technologies include reporting, online analytical pr ...
system is the metadata and tools to manage and retrieve the metadata.
Ralph Kimball Ralph Kimball (born July 18, 1944) is an author on the subject of data warehousing and business intelligence. He is one of the original architects of data warehousing and is known for long-term convictions that data warehouses must be designed to b ...
describes metadata as the DNA of the data warehouse as metadata defines the elements of the
data warehouse In computing, a data warehouse (DW or DWH), also known as an enterprise data warehouse (EDW), is a system used for reporting and data analysis and is considered a core component of business intelligence. DWs are central repositories of integr ...
and how they work together. Kimball et al. refers to 3 main categories of metadata: Technical metadata, business metadata and process metadata. Technical metadata is primarily
definition A definition is a statement of the meaning of a term (a word, phrase, or other set of symbols). Definitions can be classified into two large categories: intensional definitions (which try to give the sense of a term), and extensional definiti ...
al, while business metadata and process metadata is primarily
descriptive In the study of language, description or descriptive linguistics is the work of objectively analyzing and describing how language is actually used (or how it was used in the past) by a speech community. François & Ponsonnet (2013). All a ...
. The categories sometimes overlap. * Technical metadata defines the objects and processes in a DW/BI system, as seen from a technical point of view. The technical metadata includes the system metadata, which defines the data structures such as tables, fields, data types, indexes, and partitions in the relational engine, as well as databases, dimensions, measures, and data mining models. Technical metadata defines the data model and the way it is displayed for the users, with the reports, schedules, distribution lists, and user security rights. * Business metadata is content from the data warehouse described in more user-friendly terms. The business metadata tells you what data you have, where they come from, what they mean and what their relationship is to other data in the data warehouse. Business metadata may also serve as documentation for the DW/BI system. Users who browse the data warehouse are primarily viewing the business metadata. * Process metadata is used to describe the results of various operations in the data warehouse. Within the ETL process, all key data from tasks is logged on execution. This includes start time, end time, CPU seconds used, disk reads, disk writes, and rows processed. When troubleshooting the ETL or query process, this sort of data becomes valuable. Process metadata is the fact measurement when building and using a DW/BI system. Some organizations make a living out of collecting and selling this sort of data to companies – in that case, the process metadata becomes the business metadata for the fact and dimension tables. Collecting process metadata is in the interest of business people who can use the data to identify the users of their products, which products they are using, and what level of service they are receiving.


On the Internet

The
HTML The HyperText Markup Language or HTML is the standard markup language for documents designed to be displayed in a web browser. It can be assisted by technologies such as Cascading Style Sheets (CSS) and scripting languages such as JavaScript ...
format used to define web pages allows for the inclusion of a variety of types of metadata, from basic descriptive text, dates and keywords to further advanced metadata schemes such as the
Dublin Core 220px, Logo image of DCMI, which formulates Dublin Core The Dublin Core, also known as the Dublin Core Metadata Element Set (DCMES), is a set of fifteen "core" elements (properties) for describing resources. This fifteen-element Dublin Core has ...
,
e-GMS The e-Government Metadata Standard, e-GMS, is the UK e-Government Metadata Standard. It defines how UK public sector bodies should label content such as web pages and documents to make such information more easily managed, found and shared. Th ...
, and AGLS standards. Pages and files can also be
geotagged Geotagging, or GeoTagging, is the process of adding geographical identification metadata to various media such as a geotagged photograph or video, websites, SMS messages, QR Codes or RSS feeds and is a form of geospatial metadata. This data ...
with
coordinates In geometry, a coordinate system is a system that uses one or more numbers, or coordinates, to uniquely determine the position of the points or other geometric elements on a manifold such as Euclidean space. The order of the coordinates is sign ...
, categorized or tagged, including collaboratively such as with folksonomies. When media has
identifier An identifier is a name that identifies (that is, labels the identity of) either a unique object or a unique ''class'' of objects, where the "object" or class may be an idea, physical countable object (or class thereof), or physical noncountable ...
s set or when such can be generated, information such as file tags and descriptions can be pulled or scraped from the Internet – for example about movies. Various online databases are aggregated and provide metadata for various data. The collaboratively built
Wikidata Wikidata is a collaboratively edited multilingual knowledge graph hosted by the Wikimedia Foundation. It is a common source of open data that Wikimedia projects such as Wikipedia, and anyone else, can use under the CC0 public domain license ...
has identifiers not just for media but also abstract concepts, various objects, and other entities, that can be looked up by humans and machines to retrieve useful information and to link knowledge in other knowledge bases and databases. Metadata may be included in the page's header or in a separate file.
Microformat Microformats (μF) are a set of defined HTML classes created to serve as consistent and descriptive metadata about an element, designating it as representing a certain type of data (such as contact information, geographic coordinates, event ...
s allow metadata to be added to on-page data in a way that regular web users do not see, but computers,
web crawler A Web crawler, sometimes called a spider or spiderbot and often shortened to crawler, is an Internet bot that systematically browses the World Wide Web and that is typically operated by search engines for the purpose of Web indexing (''web spid ...
s and
search engine A search engine is a software system designed to carry out web searches. They search the World Wide Web in a systematic way for particular information specified in a textual web search query. The search results are generally presented in a ...
s can readily access. Many search engines are cautious about using metadata in their ranking algorithms because of exploitation of metadata and the practice of search engine optimization, SEO, to improve rankings. See the
Meta element Meta elements are tags used in HTML and XHTML documents to provide structured metadata about a Web page. They are part of a web page's head section. Multiple Meta elements with different attributes can be used on the same page. Meta elements can ...
article for further discussion. This cautious attitude may be justified as people, according to Doctorow, are not executing care and diligence when creating their own metadata and that metadata is part of a competitive environment where the metadata is used to promote the metadata creators own purposes. Studies show that search engines respond to web pages with metadata implementations, and Google has an announcement on its site showing the meta tags that its search engine understands. Enterprise search startup
Swiftype Swiftype is a search and index company based in San Francisco, California, that provides search software for organizations, websites, and computer programs. Notable customers include AT&T, Dr. Pepper, Hubspot and TechCrunch. History Swiftype wa ...
recognizes metadata as a relevance signal that webmasters can implement for their website-specific search engine, even releasing their own extension, known as Meta Tags 2.


In the broadcast industry

In the
broadcast Broadcasting is the distribution of audio or video content to a dispersed audience via any electronic mass communications medium, but typically one using the electromagnetic spectrum (radio waves), in a one-to-many model. Broadcasting began wi ...
industry, metadata is linked to audio and video broadcast media to: * ''identify'' the media:
clip Clip or CLIP may refer to: Fasteners * Hair clip, a device used to hold hair together or attaching materials such as caps to the hair * Binder clip, a device used for holding thicker materials (such as large volumes of paper) together ** Bulldog ...
or
playlist A playlist is a list of video or audio files that can be played back on a media player either sequentially or in a shuffled order. In its most general form, an audio playlist is simply a list of songs, but sometimes a loop. The term has seve ...
names, duration,
timecode A timecode (alternatively, time code) is a sequence of numeric codes generated at regular intervals by a timing synchronization system. Timecode is used in video production, show control and other applications which require temporal coordinatio ...
, etc. * ''describe'' the content: notes regarding the quality of video content, rating, description (for example, during a sport event, keywords like ''goal'', ''red card'' will be associated to some clips) * ''classify'' media: metadata allows producers to sort the media or to easily and quickly find a video content (a
TV news News broadcasting is the medium of broadcasting various news events and other information via television, radio, or the internet in the field of broadcast journalism. The content is usually either produced locally in a radio studio or televis ...
could urgently need some archive content for a subject). For example, the BBC has a large subject classification system,
Lonclass The BBC's Lonclass ("London Classification") is a subject classification system used internally at the BBC throughout its archives. Lonclass is derived from the Universal Decimal Classification (UDC), itself a reworking of the earlier Dewey Deci ...
, a customized version of the more general-purpose
Universal Decimal Classification The Universal Decimal Classification (UDC) is a bibliographic and library classification representing the systematic arrangement of all branches of human knowledge organized as a coherent system in which knowledge fields are related and inter-link ...
. This metadata can be linked to the video media thanks to the video servers. Most major broadcast sporting events like
FIFA World Cup The FIFA World Cup, often simply called the World Cup, is an international association football competition contested by the senior men's national teams of the members of the ' (FIFA), the sport's global governing body. The tournament has ...
or the
Olympic Games The modern Olympic Games or Olympics (french: link=no, Jeux olympiques) are the leading international sporting events featuring summer and winter sports competitions in which thousands of athletes from around the world participate in a var ...
use this metadata to distribute their video content to
TV station A television station is a set of equipment managed by a business, organisation or other entity, such as an amateur television (ATV) operator, that transmits video content and audio content via radio waves directly from a transmitter on the earth ...
s through keywords. It is often the host broadcaster who is in charge of organizing metadata through its ''International Broadcast Centre'' and its video servers. This metadata is recorded with the images and entered by metadata operators (''loggers'') who associate in live metadata available in ''metadata grids'' through
software Software is a set of computer programs and associated documentation and data. This is in contrast to hardware, from which the system is built and which actually performs the work. At the lowest programming level, executable code consists o ...
(such as
Multicam(LSM) Multicam (LSM) is software developed by the Belgian company EVS Broadcast Equipment. Combined with its remote controller, it allows controlling the XT3 video server. This software and the production server allows broadcasters to record, cont ...
or
IPDirector {{no footnotes, date=February 2013 IPDirector is a suite of content management software developed by the Belgian company EVS Broadcast Equipment. The tool groups several video production management applications, providing ingest control and playou ...
used during the FIFA World Cup or Olympic Games).


Geospatial

Metadata that describes geographic objects in electronic storage or format (such as datasets, maps, features, or documents with a geospatial component) has a history dating back to at least 1994 (refer to th
MIT Library page on FGDC Metadata
. This class of metadata is described more fully on the
geospatial metadata Geospatial metadata (also geographic metadata) is a type of metadata applicable to geographic data and information. Such objects may be stored in a geographic information system (GIS) or may simply be documents, data-sets, images or other objects, ...
article.


Ecological and environmental

Ecological and environmental metadata is intended to document the "who, what, when, where, why, and how" of data collection for a particular study. This typically means which organization or institution collected the data, what type of data, which date(s) the data was collected, the rationale for the data collection, and the methodology used for the data collection. Metadata should be generated in a format commonly used by the most relevant science community, such as
Darwin Core Darwin Core (often abbreviated to DwC) is an extension of Dublin Core for biodiversity informatics. It is meant to provide a stable standard reference for sharing information on biological diversity (biodiversity). The terms described in this stand ...
,
Ecological Metadata Language Ecological Metadata Language (EML) is a metadata standard developed by and for the ecology discipline. It is based on prior work done by the Ecological Society of America and others, including the Knowledge Network for Biocomplexity. EML is a set of ...
, or
Dublin Core 220px, Logo image of DCMI, which formulates Dublin Core The Dublin Core, also known as the Dublin Core Metadata Element Set (DCMES), is a set of fifteen "core" elements (properties) for describing resources. This fifteen-element Dublin Core has ...
. Metadata editing tools exist to facilitate metadata generation (e.g. Metavist, Mercury, Morpho). Metadata should describe the
provenance Provenance (from the French ''provenir'', 'to come from/forth') is the chronology of the ownership, custody or location of a historical object. The term was originally mostly used in relation to works of art but is now used in similar senses ...
of the data (where they originated, as well as any transformations the data underwent) and how to give credit for (cite) the data products.


Digital music

When first released in 1982, Compact Discs only contained a Table Of Contents (TOC) with the number of tracks on the disc and their length in samples. Fourteen years later in 1996, a revision of the CD Red Book standard added
CD-Text CD-Text is an extension of the Red Book Compact Disc specifications standard for audio CDs. It allows storage of additional information (e.g. album name, song name, and artist name) on a standards-compliant audio CD. The specification for CD- ...
to carry additional metadata. But CD-Text was not widely adopted. Shortly thereafter, it became common for personal computers to retrieve metadata from external sources (e.g. CDDB,
Gracenote Gracenote, Inc. is a company owned by Nielsen Holdings that provides music, video and sports metadata and automatic content recognition (ACR) technologies to entertainment services and companies, worldwide. Formerly CDDB ("Compact Disc Data Bas ...
) based on the TOC. Digital
audio Audio most commonly refers to sound, as it is transmitted in signal form. It may also refer to: Sound *Audio signal, an electrical representation of sound *Audio frequency, a frequency in the audio spectrum *Digital audio, representation of sound ...
formats such as
digital audio file An audio file format is a file format for storing digital audio data on a computer system. The bit layout of the audio data (excluding metadata) is called the audio coding format and can be uncompressed, or compressed to reduce the file size, ofte ...
s superseded music formats such as
cassette tape The Compact Cassette or Musicassette (MC), also commonly called the tape cassette, cassette tape, audio cassette, or simply tape or cassette, is an analog magnetic tape recording format for audio recording and playback. Invented by Lou Otten ...
s and
CDs The compact disc (CD) is a digital optical disc data storage format that was co-developed by Philips and Sony to store and play digital audio recordings. In August 1982, the first compact disc was manufactured. It was then released in Octobe ...
in the 2000s. Digital audio files could be labeled with more information than could be contained in just the file name. That descriptive information is called the audio tag or audio metadata in general. Computer programs specializing in adding or modifying this information are called
tag editor A tag editor (or tagger) is a piece of software that supports editing metadata of multimedia file formats, rather than the actual file content. These are mainly taggers for common audio tagging formats like ID3, APE, and Vorbis comments (for exam ...
s. Metadata can be used to name, describe, catalog, and indicate ownership or copyright for a digital audio file, and its presence makes it much easier to locate a specific audio file within a group, typically through use of a search engine that accesses the metadata. As different digital audio formats were developed, attempts were made to standardize a specific location within the digital files where this information could be stored. As a result, almost all digital audio formats, including
mp3 MP3 (formally MPEG-1 Audio Layer III or MPEG-2 Audio Layer III) is a coding format for digital audio developed largely by the Fraunhofer Society in Germany, with support from other digital scientists in the United States and elsewhere. Origin ...
, broadcast wav, and
AIFF Audio Interchange File Format (AIFF) is an audio file format standard used for storing sound data for personal computers and other electronic audio devices. The format was developed by Apple Inc. in 1988 based on Electronic Arts' Interchange File ...
files, have similar standardized locations that can be populated with metadata. The metadata for compressed and uncompressed digital music is often encoded in the
ID3 ID3 is a metadata container most often used in conjunction with the MP3 audio file format. It allows information such as the title, artist, album, track number, and other information about the file to be stored in the file itself. There are two ...
tag. Common editors such as
TagLib TagLib is a free library for reading and editing metadata embedded into audio files. It is capable of reading and editing all relevant metadata formats for audio files, including APEv2, ID3 and Vorbis comment. It can find tags in a number of d ...
support MP3, Ogg Vorbis, FLAC, MPC, Speex, WavPack TrueAudio, WAV, AIFF, MP4, and ASF file formats.


Cloud applications

With the availability of cloud applications, which include those to add metadata to content, metadata is increasingly available over the Internet.


Administration and management


Storage

Metadata can be stored either ''internally'', in the same file or structure as the data (this is also called ''embedded metadata''), or ''externally'', in a separate file or field from the described data. A data repository typically stores the metadata ''detached'' from the data but can be designed to support embedded metadata approaches. Each option has advantages and disadvantages: * Internal storage means metadata always travels as part of the data they describe; thus, metadata is always available with the data, and can be manipulated locally. This method creates redundancy (precluding normalization), and does not allow managing all of a system's metadata in one place. It arguably increases consistency, since the metadata is readily changed whenever the data is changed. * External storage allows collocating metadata for all the contents, for example in a database, for more efficient searching and management. Redundancy can be avoided by normalizing the metadata's organization. In this approach, metadata can be united with the content when information is transferred, for example in Streaming media; or can be referenced (for example, as a web link) from the transferred content. On the downside, the division of the metadata from the data content, especially in standalone files that refer to their source metadata elsewhere, increases the opportunities for misalignments between the two, as changes to either may not be reflected in the other. Metadata can be stored in either human-readable or binary form. Storing metadata in a human-readable format such as XML can be useful because users can understand and edit it without specialized tools. However, text-based formats are rarely optimized for storage capacity, communication time, or processing speed. A binary metadata format enables efficiency in all these respects, but requires special software to convert the binary information into human-readable content.


Database management

Each relational database system has its own mechanisms for storing metadata. Examples of relational-database metadata include: * Tables of all tables in a database, their names, sizes, and number of rows in each table. * Tables of columns in each database, what tables they are used in, and the type of data stored in each column. In database terminology, this set of metadata is referred to as the
catalog Catalog or catalogue may refer to: *Cataloging **'emmy on the 'og **in science and technology ***Library catalog, a catalog of books and other media ****Union catalog, a combined library catalog describing the collections of a number of libraries ...
. The SQL standard specifies a uniform means to access the catalog, called the information schema, but not all databases implement it, even if they implement other aspects of the SQL standard. For an example of database-specific metadata access methods, see
Oracle metadata Oracle Database provides information about all of the tables, views, columns, and procedures in a database. This information about information is known as metadata. It is stored in two locations: data dictionary tables (accessed via built-in view ...
. Programmatic access to metadata is possible using APIs such as
JDBC Java Database Connectivity (JDBC) is an application programming interface (API) for the programming language Java, which defines how a client may access a database. It is a Java-based data access technology used for Java database connectivity. ...
, or SchemaCrawler.


In popular culture

One of the first satirical examinations of the concept of Metadata as we understand it today is American Science Fiction author
Hal Draper Hal Draper (born Harold Dubinsky; September 19, 1914 – January 26, 1990) was an American socialist activist and author who played a significant role in the Berkeley, California, Free Speech Movement. He is known for his extensive scholarship on ...
's short story,
MS Fnd in a Lbry ''MS Fnd in a Lbry'' (probably intended to be understood as "Manuscript Found in a Library") is a satirical science fiction short story about the disastrous effects of the exponential growth of information. The story was written by Hal Draper in ...
(1961). Here, the knowledge of all Mankind is condensed into an object the size of a desk drawer, however, the magnitude of the metadata (e.g. catalog of catalogs of... , as well as indexes and histories) eventually leads to dire yet humorous consequences for the human race. The story prefigures the modern consequences of allowing metadata to become more important than the real data it is concerned with, and the risks inherent in that eventuality as a cautionary tale.


See also

* Agris: International Information System for the Agricultural Sciences and Technology * Bibliographic record * Classification scheme *
Crosswalk (metadata) A schema crosswalk is a table that shows equivalent elements (or "fields") in more than one database schema. It maps the elements in one schema to the equivalent elements in another. Crosswalk tables are often employed within or in parallel to ente ...
*
DataONE DataONE is a network of interoperable data repositories facilitating data sharing, data discovery, and open science. Originally supported by $21.2 million in funding from the US National Science Foundation as one of the initial DataNet programs ...
*
Data Dictionary A data dictionary, or metadata repository, as defined in the ''IBM Dictionary of Computing'', is a "centralized repository of information about data such as meaning, relationships to other data, origin, usage, and format". ''Oracle'' defines it a ...
(aka metadata repository) *
Dublin Core 220px, Logo image of DCMI, which formulates Dublin Core The Dublin Core, also known as the Dublin Core Metadata Element Set (DCMES), is a set of fifteen "core" elements (properties) for describing resources. This fifteen-element Dublin Core has ...
*
Folksonomy Folksonomy is a classification system in which end users apply public tags to online items, typically to make those items easier for themselves or others to find later. Over time, this can give rise to a classification system based on those tag ...
* GEOMS – Generic Earth Observation Metadata Standard *
Geospatial metadata Geospatial metadata (also geographic metadata) is a type of metadata applicable to geographic data and information. Such objects may be stored in a geographic information system (GIS) or may simply be documents, data-sets, images or other objects, ...
*
IPDirector {{no footnotes, date=February 2013 IPDirector is a suite of content management software developed by the Belgian company EVS Broadcast Equipment. The tool groups several video production management applications, providing ingest control and playou ...
*
ISO/IEC 11179 The ISO/IEC 11179 Metadata Registry (MDR) standard is an international ISO/IEC standard for representing metadata for an organization in a metadata registry. It documents the standardization and registration of metadata to make data understandabl ...
*
Knowledge tag In information systems, a tag is a keyword or term assigned to a piece of information (such as an Internet bookmark, multimedia, database record, or computer file). This kind of metadata helps describe an item and allows it to be found again ...
*
The medium is the message "The medium is the message" is a phrase coined by the Canadian communication theorist Marshall McLuhan and the name of the first chapter in his '' Understanding Media: The Extensions of Man'', published in 1964.Originally published in 1964 by Men ...
* Mercury: Metadata Search System *
Meta element Meta elements are tags used in HTML and XHTML documents to provide structured metadata about a Web page. They are part of a web page's head section. Multiple Meta elements with different attributes can be used on the same page. Meta elements can ...
* Metadata Access Point Interface *
Metadata discovery In metadata, metadata discovery (also metadata harvesting) is the process of using automated tools to discover the semantics of a data element in data sets. This process usually ends with a set of mappings between the data source elements and a cent ...
*
Metadata facility for Java The Metadata Facility for Java is a specification for Java that defines an API for annotating fields, methods, and classes as having particular attributes that indicate they should be processed in specific ways by development tools, deployment to ...
* Metadata from Wikiversity *
Metadata publishing Metadata publishing is the process of making metadata data elements available to external users, both people and machines using a formal review process and a commitment to change control processes. Metadata publishing is the foundation upon which ...
*
Metadata registry A metadata registry is a central location in an organization where metadata definitions are stored and maintained in a controlled method. A metadata repository is the database where metadata is stored. The registry also adds relationships with ...
*
Metamathematics Metamathematics is the study of mathematics itself using mathematical methods. This study produces metatheories, which are mathematical theories about other mathematical theories. Emphasis on metamathematics (and perhaps the creation of the ter ...
*
METAFOR The Common Metadata for Climate Modelling Digital Repositories, or METAFOR project, is creating a Common Information Model (CIM) for climate data and the models that produce it. The CIM aims to describe climate data and the models that produce it ...
Common Metadata for Climate Modelling Digital Repositories *
Microcontent There are at least two interpretations of the term microcontent. Usability adviser Jakob Nielsen originally referred to microcontent as small groups of words that can be skimmed by a person to get a clear idea of the content of a Web page. He incl ...
*
Microformat Microformats (μF) are a set of defined HTML classes created to serve as consistent and descriptive metadata about an element, designating it as representing a certain type of data (such as contact information, geographic coordinates, event ...
*
Multicam (LSM) Multicam (LSM) is software developed by the Belgian company EVS Broadcast Equipment. Combined with its remote controller, it allows controlling the XT3 video server. This software and the production server allows broadcasters to record, cont ...
*
Observations and Measurements Observations and Measurements (O&M) is an international standard which defines a conceptual schema encoding for observations, and for features involved in sampling when making observations. While the O&M standard was developed in the context of geog ...
*
Ontology (computer science) In computer science and information science, an ontology encompasses a representation, formal naming, and definition of the categories, properties, and relations between the concepts, data, and entities that substantiate one, many, or all domains ...
*
Official statistics Official statistics are statistics published by government agencies or other public bodies such as international organizations as a public good. They provide quantitative or qualitative information on all major areas of citizens' lives, such as e ...
* Paratext * Preservation Metadata *
SDMX SDMX, which stands for Statistical Data and Metadata eXchange, is an international initiative that aims at standardising and modernising ("industrialising") the mechanisms and processes for the exchange of statistical data and metadata among intern ...
* Semantic Web *
SGML The Standard Generalized Markup Language (SGML; ISO 8879:1986) is a standard for defining generalized markup languages for documents. ISO 8879 Annex A.1 states that generalized markup is "based on two postulates": * Declarative: Markup should de ...
*
The Metadata Company Metadata is the name of a US corporation and a registered trademark in the United States. Though the term "metadata" has a common generic use in information technology, claims of trademark have since brought about legal threats against its use in ...
*
Universal Data Element Framework The Universal Data Element Framework (UDEF) was a controlled vocabulary developed by The Open Group. It provided a framework for categorizing, naming, and indexing data. It assigned to every item of data a structured alphanumeric tag plus a contro ...
*
Vocabulary OneSource OneSource is an evolving data analysis tool used internally by the Air Combat Command (ACC) Vocabulary Services Team, and made available to general data management community. It is used by the greater US Department of Defense (DoD) and NATO communi ...
*
XSD XSD (XML Schema Definition), a recommendation of the World Wide Web Consortium (W3C), specifies how to formally describe the elements in an Extensible Markup Language (XML) document. It can be used by programmers to verify each piece of item cont ...


References


Further reading

* *


External links


''Understanding Metadata: What is metadata, and what is it for?''
NISO The National Information Standards Organization (NISO; ) is a United States non-profit standards organization that develops, maintains and publishes technical standards related to publishing, bibliographic and library applications. It was found ...
, 2017
"A Guardian guide to your metadata"
– ''
The Guardian ''The Guardian'' is a British daily newspaper. It was founded in 1821 as ''The Manchester Guardian'', and changed its name in 1959. Along with its sister papers ''The Observer'' and '' The Guardian Weekly'', ''The Guardian'' is part of the G ...
'', Wednesday 12 June 2013.
Metacrap: Putting the torch to 7 straw-men of the meta-utopia
Cory Doctorow's opinion on the limitations of metadata on the
Internet The Internet (or internet) is the global system of interconnected computer networks that uses the Internet protocol suite (TCP/IP) to communicate between networks and devices. It is a '' network of networks'' that consists of private, pub ...
, 2001
DataONE
Investigator Toolkit
''Journal of Library Metadata''
Routledge, Taylor & Francis Group, ISSN 1937-5034
''International Journal of Metadata, Semantics and Ontologies'' (''IJMSO'')
Inderscience Publishers, ISSN 1744-263X * (PDF)
LPR Standards
(PDF), Department of Homeland Security (October 2012) {{Authority control Data management Records management Knowledge representation Library cataloging and classification Technical communication Business intelligence