Linked Data
   HOME

TheInfoList



OR:

In
computing Computing is any goal-oriented activity requiring, benefiting from, or creating computing machinery. It includes the study and experimentation of algorithmic processes, and development of both hardware and software. Computing has scientific, ...
, linked data (often capitalized as Linked Data) is structured data which is interlinked with other data so it becomes more useful through semantic queries. It builds upon standard Web technologies such as
HTTP The Hypertext Transfer Protocol (HTTP) is an application layer protocol in the Internet protocol suite model for distributed, collaborative, hypermedia information systems. HTTP is the foundation of data communication for the World Wide We ...
, RDF and URIs, but rather than using them to serve web pages only for human readers, it extends them to share information in a way that can be read automatically by computers. Part of the vision of linked data is for the
Internet The Internet (or internet) is the global system of interconnected computer networks that uses the Internet protocol suite (TCP/IP) to communicate between networks and devices. It is a '' network of networks'' that consists of private, pub ...
to become a global
database In computing, a database is an organized collection of data stored and accessed electronically. Small databases can be stored on a file system, while large databases are hosted on computer clusters or cloud storage. The design of databases s ...
. Tim Berners-Lee, director of the
World Wide Web Consortium The World Wide Web Consortium (W3C) is the main international standards organization for the World Wide Web. Founded in 1994 and led by Tim Berners-Lee, the consortium is made up of member organizations that maintain full-time staff working ...
(W3C), coined the term in a 2006 design note about the Semantic Web project. Linked data may also be open data, in which case it is usually described as Linked Open Data.


Principles

In his 2006 "Linked Data" note, Tim Berners-Lee outlined four principles of linked data, paraphrased along the following lines: # Uniform Resource Identifiers (URIs) should be used to name and identify individual things. #
HTTP The Hypertext Transfer Protocol (HTTP) is an application layer protocol in the Internet protocol suite model for distributed, collaborative, hypermedia information systems. HTTP is the foundation of data communication for the World Wide We ...
URIs should be used to allow these things to be looked up, interpreted, and subsequently "dereferenced". #Useful information about what a name identifies should be provided through open standards such as RDF,
SPARQL SPARQL (pronounced " sparkle" , a recursive acronym for SPARQL Protocol and RDF Query Language) is an RDF query language—that is, a semantic query language for databases—able to retrieve and manipulate data stored in Resource Description ...
, etc. #When publishing data on the Web, other things should be referred to using their HTTP URI-based names. Tim Berners-Lee later restated these principles at a 2009 TED conference, again paraphrased along the following lines: #All conceptual things should have a name starting with
HTTP The Hypertext Transfer Protocol (HTTP) is an application layer protocol in the Internet protocol suite model for distributed, collaborative, hypermedia information systems. HTTP is the foundation of data communication for the World Wide We ...
. #Looking up an HTTP name should return useful data about the thing in question in a standard format. #Anything else that that same thing has a relationship with through its data should also be given a name beginning with HTTP.


Components

* URIs *
HTTP The Hypertext Transfer Protocol (HTTP) is an application layer protocol in the Internet protocol suite model for distributed, collaborative, hypermedia information systems. HTTP is the foundation of data communication for the World Wide We ...
*
Structured data A data model is an abstract model that organizes elements of data and standardizes how they relate to one another and to the properties of real-world entities. For instance, a data model may specify that the data element representing a car be c ...
using controlled vocabulary terms and dataset definitions expressed in Resource Description Framework serialization formats such as RDFa,
RDF/XML RDF/XML is a syntax,RDF/XML Syntax Specification
N3,
Turtle Turtles are an order of reptiles known as Testudines, characterized by a special shell developed mainly from their ribs. Modern turtles are divided into two major groups, the Pleurodira (side necked turtles) and Cryptodira (hidden necked t ...
, or
JSON-LD JSON-LD (JavaScript Object Notation for Linked Data) is a method of encoding linked data using JSON. One goal for JSON-LD was to require as little effort as possible from developers to transform their existing JSON to JSON-LD. JSON-LD allows data ...
* Linked Data Platform


Linked open data

Linked open data are linked data that are open data. Tim Berners-Lee gives the clearest definition of linked open data in differentiation with linked data. Large linked open data sets include
DBpedia DBpedia (from "DB" for " database") is a project aiming to extract structured content from the information created in the Wikipedia project. This structured information is made available on the World Wide Web. DBpedia allows users to semanti ...
,
Wikibase Wikibase is a set of MediaWiki extensions for working with versioned semi-structured data in a central repository based upon JSON instead of the unstructured data of MediaWiki wikitext. Its primary components are the ''Wikibase Repository'', an ...
,
Wikidata Wikidata is a collaboratively edited multilingual knowledge graph hosted by the Wikimedia Foundation. It is a common source of open data that Wikimedia projects such as Wikipedia, and anyone else, can use under the CC0 public domain license ...
and Open Icecat.


5-star linked open data

Tim Berners-Lee has suggested a 5-star scheme for grading the quality of open data on the web, for which the highest ranking is Linked Open Data: * 1 star: data is openly available in some format. * 2 stars: data is available in a structured format, such as Microsoft Excel file format (.xls). * 3 stars: data is available in a non-proprietary structured format, such as
Comma-separated values A comma-separated values (CSV) file is a delimited text file that uses a comma to separate values. Each line of the file is a data record. Each record consists of one or more fields, separated by commas. The use of the comma as a field separat ...
(.csv). * 4 stars: data follows W3C standards, like using RDF and employing URIs. * 5 stars: all of the other, plus links to other Linked Open Data sources.


History

The term "linked open data" has been in use since at least February 2007, when the "Linking Open Data" mailing list was created. The mailing list was initially hosted by the
SIMILE A simile () is a figure of speech that directly ''compares'' two things. Similes differ from other metaphors by highlighting the similarities between two things using comparison words such as "like", "as", "so", or "than", while other metaphors c ...
project at the
Massachusetts Institute of Technology The Massachusetts Institute of Technology (MIT) is a private land-grant research university in Cambridge, Massachusetts. Established in 1861, MIT has played a key role in the development of modern technology and science, and is one of the ...
.


Linking Open Data community project

The goal of the W3C Semantic Web Education and Outreach group's Linking Open Data community project is to extend the Web with a data commons by publishing various open
dataset A data set (or dataset) is a collection of data. In the case of tabular data, a data set corresponds to one or more database tables, where every column of a table represents a particular variable, and each row corresponds to a given record of the ...
s as RDF on the Web and by setting RDF links between data items from different data sources. In October 2007, datasets consisted of over two billion RDF triples, which were interlinked by over two million RDF links. By September 2011 this had grown to 31 billion RDF triples, interlinked by around 504 million RDF links. A detailed statistical breakdown was published in 2014.


European Union projects

There are a number of
European Union The European Union (EU) is a supranational political and economic union of member states that are located primarily in Europe. The union has a total area of and an estimated total population of about 447million. The EU has often been de ...
projects involving linked data. These include the linked open data around the clock (LATC) project, the PlanetData project, the DaPaaS (Data-and-Platform-as-a-Service) project, and the Linked Open Data 2 (LOD2) project. Data linking is one of the main goals of the
EU Open Data Portal Before data.europa.eu, the EU Open Data Portal was the point of access to public data published by the EU institutions, agencies and other bodies. On April 21, 2021 it was consolidated to the data.europa.eu portal, together with the European Data ...
, which makes available thousands of datasets for anyone to reuse and link.


Ontologies

Ontologies In computer science and information science, an ontology encompasses a representation, formal naming, and definition of the categories, properties, and relations between the concepts, data, and entities that substantiate one, many, or all domains ...
are formal descriptions of data structures. Some of the better known ontologies are: * FOAF – an ontology describing persons, their properties and relationships *
UMBEL In botany, an umbel is an inflorescence that consists of a number of short flower stalks (called pedicels) that spread from a common point, somewhat like umbrella ribs. The word was coined in botanical usage in the 1590s, from Latin ''umbella'' "p ...
– a lightweight reference structure of subject concept classes and their relationships derived from
OpenCyc Cyc (pronounced ) is a long-term artificial intelligence project that aims to assemble a comprehensive ontology and knowledge base that spans the basic concepts and rules about how the world works. Hoping to capture common sense knowledge, Cyc f ...
, which can act as binding classes to external data; also has links to 1.5 million named entities from DBpedia and YAGO


Datasets

*
DBpedia DBpedia (from "DB" for " database") is a project aiming to extract structured content from the information created in the Wikipedia project. This structured information is made available on the World Wide Web. DBpedia allows users to semanti ...
– a dataset containing extracted data from Wikipedia; it contains about 3.4 million concepts described by 1 billion triples, including abstracts in 11 different languages * GeoNames – provides RDF descriptions of more than geographical features worldwide *
Wikidata Wikidata is a collaboratively edited multilingual knowledge graph hosted by the Wikimedia Foundation. It is a common source of open data that Wikimedia projects such as Wikipedia, and anyone else, can use under the CC0 public domain license ...
– a collaboratively-created linked dataset that acts as central storage for the structured data of its
Wikimedia Foundation The Wikimedia Foundation, Inc., or Wikimedia for short and abbreviated as WMF, is an American 501(c)(3) nonprofit organization headquartered in San Francisco, California and registered as a charitable foundation under local laws. Best know ...
sibling projects * Global Research Identifier Database (''GRID'') – an international database of institutions engaged in academic research, with relationships. GRID models two types of relationships: a parent-child relationship that defines a subordinate association, and a related relationship that describes other associations * KnowWhereGraph – an integrated 12 billion triples strong
knowledge graph The Google Knowledge Graph is a knowledge base from which Google serves relevant information in an infobox beside its search results. This allows the user to see the answer in a glance. The data is generated automatically from a variety of sou ...
of 30 data layers at the intersection between humans and their environment using Semantic Web and Linked Data technologies. * Open Icecat - a multilingual open catalogue containing product datasheets, related digital assets and usage statistics.


Dataset instance and class relationships

Clickable diagrams that show the individual datasets and their relationships within the DBpedia-spawned LOD cloud (as shown by the figures to the right) are available.


See also

* American Art Collaborative - consortium of US art museums committed to establishing a critical mass of linked open data on American art * Authority control – about ''controlled headings'' in library catalogs *
Citation analysis Citation analysis is the examination of the frequency, patterns, and graphs of citations in documents. It uses the directed graph of citations — links from one document to another document — to reveal properties of the documents. A t ...
– for citations between scholarly articles *
Hyperdata Hyperdata are data objects linked to other data objects in other places, as hypertext indicates text linked to other text in other places. Hyperdata enables formation of a web of data, evolving from the "data on the Web" that is not inter-related ...
* Network model – an older type of database management system * Open data *
Schema.org Schema.org is a reference website that publishes documentation and guidelines for using structured data mark-up on web-pages (called microdata). Its main objective is to standardize HTML tags to be used by webmasters for creating rich results (di ...
* VoID – Vocabulary of Interlinked Datasets *
Web Ontology Language The Web Ontology Language (OWL) is a family of knowledge representation languages for authoring ontologies. Ontologies are a formal way to describe taxonomies and classification networks, essentially defining the structure of knowledge for vario ...


References


Further reading

* Ahmet Soylu, Felix Mödritscher, and Patrick De Causmaecker. 2012
“Ubiquitous Web Navigation through Harvesting Embedded Semantic Data: A Mobile Scenario.”
Integrated Computer-Aided Engineering 19 (1): 93–109. *
Linked Data: Evolving the Web into a Global Data Space
' (2011) by Tom Heath and Christian Bizer, Synthesis Lectures on the Semantic Web: Theory and Technology, Morgan & Claypool
How to Publish Linked Data on the Web
by Chris Bizer, Richard Cyganiak and Tom Heath, Linked Data Tutorial at Freie Universität Berlin, Germany, 27 July 2007.
The Web Turns 20: Linked Data Gives People Power
part 1 of 4, by Mark Fischetti, ''
Scientific American ''Scientific American'', informally abbreviated ''SciAm'' or sometimes ''SA'', is an American popular science magazine. Many famous scientists, including Albert Einstein and Nikola Tesla, have contributed articles to it. In print since 1845, it ...
'' 2010 October 23
Linked Data Is Merely More Data
– Prateek Jain,
Pascal Hitzler Pascal Hitzler is a German American computer scientist specializing in Semantic Web and Artificial Intelligence. He is endowed Lloyd T. Smith Creativity in Engineering Chair and Director of the Center for Artificial Intelligence and Data Science ...
, Peter Z. Yeh, Kunal Verma, and Amit P. Sheth. In: Dan Brickley, Vinay K. Chaudhri, Harry Halpin, and Deborah McGuinness: ''Linked Data Meets Artificial Intelligence''. Technical Report SS-10-07, AAAI Press, Menlo Park, California, 2010, pp. 82–86.
Moving beyond sameAs with PLATO: Partonomy detection for Linked Data
– Prateek Jain,
Pascal Hitzler Pascal Hitzler is a German American computer scientist specializing in Semantic Web and Artificial Intelligence. He is endowed Lloyd T. Smith Creativity in Engineering Chair and Director of the Center for Artificial Intelligence and Data Science ...
, Kunal Verma, Peter Z. Yeh, Amit Sheth. In: Proceedings of the 23rd ACM Hypertext and Social Media conference (HT 2012), Milwaukee, WI, USA, June 25–28, 2012. * Freitas, André, Edward Curry, João Gabriel Oliveira, and Sean O’Riain. 2012
“Querying Heterogeneous Datasets on the Linked Data Web: Challenges, Approaches, and Trends.”
IEEE Internet Computing 16 (1): 24–33.
Interlinking Open Data on the Web
– Chris Bizer, Tom Heath, Danny Ayers, Yves Raimond. In Proceedings Poster Track, ESWC2007, Innsbruck, Austria
Ontology Alignment for Linked Open Data
– Prateek Jain,
Pascal Hitzler Pascal Hitzler is a German American computer scientist specializing in Semantic Web and Artificial Intelligence. He is endowed Lloyd T. Smith Creativity in Engineering Chair and Director of the Center for Artificial Intelligence and Data Science ...
, Amit Sheth, Kunal Verma, Peter Z. Yeh. In proceedings of the 9th International Semantic Web Conference, ISWC 2010, Shanghai, China
Linked open drug data for pharmaceutical research and development
- J Cheminform. 2011; 3: 19. Samwald, Jentzsch, Bouton, Kallesøe, Willighagen, Hajagos, Marshall, Prud'hommeaux, Hassenzadeh, Pichler, and Stephens (May 2011)
Interview with Sören Auer, head of the LOD2 project about the continuation of LOD2 in 2011
June 2011
Linked Open Data: The Essentials
- Florian Bauer and Martin Kaltenböck (January 2012)
The Flap of a Butterfly Wing
- semanticweb.com Richard Wallis (February 2012)


External links


LinkedData
at the W3C Wiki
LinkedData.org

OpenLink Software white papers
{{Authority control Computer-related introductions in 2007 Cloud standards Data management Distributed computing architecture Hypermedia Internet terminology Open data Semantic Web