
In
knowledge representation and reasoning
Knowledge representation (KR) aims to model information in a structured manner to formally represent it as knowledge in knowledge-based systems whereas knowledge representation and reasoning (KRR, KR&R, or KR²) also aims to understand, reason, and ...
, a knowledge graph is a
knowledge base
In computer science, a knowledge base (KB) is a set of sentences, each sentence given in a knowledge representation language, with interfaces to tell new sentences and to ask questions about what is known, where either of these interfaces migh ...
that uses a
graph
Graph may refer to:
Mathematics
*Graph (discrete mathematics), a structure made of vertices and edges
**Graph theory, the study of such graphs and their properties
*Graph (topology), a topological space resembling a graph in the sense of discret ...
-structured
data model
A data model is an abstract model that organizes elements of data and standardizes how they relate to one another and to the properties of real-world entities. For instance, a data model may specify that the data element representing a car be ...
or
topology
Topology (from the Greek language, Greek words , and ) is the branch of mathematics concerned with the properties of a Mathematical object, geometric object that are preserved under Continuous function, continuous Deformation theory, deformat ...
to represent and operate on
data
Data ( , ) are a collection of discrete or continuous values that convey information, describing the quantity, quality, fact, statistics, other basic units of meaning, or simply sequences of symbols that may be further interpreted for ...
. Knowledge graphs are often used to store interlinked descriptions of
entities objects, events, situations or abstract concepts while also encoding the free-form
semantics
Semantics is the study of linguistic Meaning (philosophy), meaning. It examines what meaning is, how words get their meaning, and how the meaning of a complex expression depends on its parts. Part of this process involves the distinction betwee ...
or relationships underlying these entities.
Since the development of the
Semantic Web
The Semantic Web, sometimes known as Web 3.0, is an extension of the World Wide Web through standards set by the World Wide Web Consortium (W3C). The goal of the Semantic Web is to make Internet data machine-readable.
To enable the encoding o ...
, knowledge graphs have often been associated with
linked open data projects, focusing on the connections between
concept
A concept is an abstract idea that serves as a foundation for more concrete principles, thoughts, and beliefs.
Concepts play an important role in all aspects of cognition. As such, concepts are studied within such disciplines as linguistics, ...
s and entities.
They are also historically associated with and used by
search engine
A search engine is a software system that provides hyperlinks to web pages, and other relevant information on World Wide Web, the Web in response to a user's web query, query. The user enters a query in a web browser or a mobile app, and the sea ...
s such as
Google
Google LLC (, ) is an American multinational corporation and technology company focusing on online advertising, search engine technology, cloud computing, computer software, quantum computing, e-commerce, consumer electronics, and artificial ...
,
Bing
Bing most often refers to:
* Bing Crosby (1903–1977), American singer
* Microsoft Bing, a web search engine
Bing may also refer to:
Food and drink
* Bing (bread), a Chinese flatbread
* Bing (soft drink), a UK brand
* Bing cherry, a varie ...
,
Yext and
Yahoo
Yahoo (, styled yahoo''!'' in its logo) is an American web portal that provides the search engine Yahoo Search and related services including My Yahoo, Yahoo Mail, Yahoo News, Yahoo Finance, Yahoo Sports, y!entertainment, yahoo!life, an ...
;
knowledge-engines and question-answering services such as
WolframAlpha, Apple's
Siri
Siri ( , backronym: Speech Interpretation and Recognition Interface) is a digital assistant purchased, developed, and popularized by Apple Inc., which is included in the iOS, iPadOS, watchOS, macOS, Apple TV, audioOS, and visionOS operating sys ...
, and Amazon
Alexa; and
social network
A social network is a social structure consisting of a set of social actors (such as individuals or organizations), networks of Dyad (sociology), dyadic ties, and other Social relation, social interactions between actors. The social network per ...
s such as
LinkedIn
LinkedIn () is an American business and employment-oriented Social networking service, social network. It was launched on May 5, 2003 by Reid Hoffman and Eric Ly. Since December 2016, LinkedIn has been a wholly owned subsidiary of Microsoft. ...
and
Facebook
Facebook is a social media and social networking service owned by the American technology conglomerate Meta Platforms, Meta. Created in 2004 by Mark Zuckerberg with four other Harvard College students and roommates, Eduardo Saverin, Andre ...
.
Recent developments in data science and machine learning, particularly in graph neural networks and representation learning and also in machine learning, have broadened the scope of knowledge graphs beyond their traditional use in search engines and recommender systems. They are increasingly used in scientific research, with notable applications in fields such as genomics, proteomics, and systems biology.
History
The term was coined as early as 1972 by the Austrian
linguist
Linguistics is the scientific study of language. The areas of linguistic analysis are syntax (rules governing the structure of sentences), semantics (meaning), Morphology (linguistics), morphology (structure of words), phonetics (speech sounds ...
Edgar W. Schneider, in a discussion of how to build modular instructional systems for courses. In the late 1980s, the
University of Groningen
The University of Groningen (abbreviated as UG; , abbreviated as RUG) is a Public university#Continental Europe, public research university of more than 30,000 students in the city of Groningen (city), Groningen, Netherlands. Founded in 1614, th ...
and
University of Twente
The University of Twente ( ; Abbreviation, abbr. ) is a Public university, public technical university located in Enschede, Netherlands.
The university has been placed in the top 170 universities in the world by multiple central ranking tables. ...
jointly began a project called Knowledge Graphs, focusing on the design of
semantic network
A semantic network, or frame network is a knowledge base that represents semantic relations between concepts in a network. This is often used as a form of knowledge representation. It is a directed or undirected graph consisting of vertices, ...
s with edges restricted to a limited set of relations, to facilitate
algebras on the graph. In subsequent decades, the distinction between semantic networks and knowledge graphs was blurred.
Some early knowledge graphs were topic-specific. In 1985,
Wordnet
WordNet is a lexical database of semantic relations between words that links words into semantic relations including synonyms, hyponyms, and meronyms. The synonyms are grouped into ''synsets'' with short definitions and usage examples. It can thu ...
was founded, capturing semantic relationships between words and meanings an application of this idea to language itself. In 2005, Marc Wirk founded
Geonames
GeoNames (or GeoNames.org) is a user-editable geographical database available and accessible through various web services, under a Creative Commons attribution license. The project was founded in late 2005.
The GeoNames dataset differs from, b ...
to capture relationships between different geographic names and locales and associated entities. In 1998 Andrew Edmonds of Science in Finance Ltd in the UK created a system called ThinkBase that offered
fuzzy-logic based reasoning in a graphical context. ThinkBase LLC
In 2007, both
DBpedia
DBpedia (from "DB" for "database") is a project aiming to extract structured content from the information created in the Wikipedia project. This structured information is made available on the World Wide Web using OpenLink Virtuoso. DBpedia a ...
and
Freebase were founded as graph-based knowledge
repositories for general-purpose knowledge. DBpedia focused exclusively on data extracted from Wikipedia, while Freebase also included a range of public datasets. Neither described themselves as a 'knowledge graph' but developed and described related concepts.
In 2012, Google introduced their
Knowledge Graph
In knowledge representation and reasoning, a knowledge graph is a knowledge base that uses a Graph (discrete mathematics), graph-structured data model or topology to represent and operate on data. Knowledge graphs are often used to store interl ...
,
building on DBpedia and Freebase among other sources. They later incorporated
RDFa
RDFa or Resource Description Framework in Attributes is a W3C Recommendation that adds a set of attribute-level extensions to HTML, XHTML and various XML-based document types for embedding rich metadata within web documents. The Resource Descript ...
,
Microdata,
JSON-LD
JSON-LD (JavaScript Object Notation for Linked Data) is a method of encoding linked data using JSON. One goal for JSON-LD was to require as little effort as possible from developers to transform their existing JSON to JSON-LD. JSON-LD allows data ...
content extracted from indexed web pages, including the ''
CIA World Factbook
''The World Factbook'', also known as the ''CIA World Factbook'', is a reference resource produced by the United States' Central Intelligence Agency (CIA) with almanac-style information about the countries of the world. The official print ve ...
'',
Wikidata
Wikidata is a collaboratively edited multilingual knowledge graph hosted by the Wikimedia Foundation. It is a common source of open data that Wikimedia projects such as Wikipedia, and anyone else, are able to use under the CC0 public domain ...
, and
Wikipedia
Wikipedia is a free content, free Online content, online encyclopedia that is written and maintained by a community of volunteers, known as Wikipedians, through open collaboration and the wiki software MediaWiki. Founded by Jimmy Wales and La ...
.
Entity and relationship types associated with this knowledge graph have been further organized using terms from the
schema.org vocabulary. The Google Knowledge Graph became a successful complement to string-based search within Google, and its popularity online brought the term into more common use.
Since then, several large multinationals have advertised their knowledge graphs use, further popularising the term. These include Facebook, LinkedIn,
Airbnb
Airbnb, Inc. ( , an abbreviation of its original name, "Air Bed and Breakfast") is an American company operating an online marketplace for short-and-long-term homestays, experiences and services in various countries and regions. It acts as a ...
,
Microsoft
Microsoft Corporation is an American multinational corporation and technology company, technology conglomerate headquartered in Redmond, Washington. Founded in 1975, the company became influential in the History of personal computers#The ear ...
,
Amazon
Amazon most often refers to:
* Amazon River, in South America
* Amazon rainforest, a rainforest covering most of the Amazon basin
* Amazon (company), an American multinational technology company
* Amazons, a tribe of female warriors in Greek myth ...
,
Uber
Uber Technologies, Inc. is an American multinational transportation company that provides Ridesharing company, ride-hailing services, courier services, food delivery, and freight transport. It is headquartered in San Francisco, California, a ...
and
eBay
eBay Inc. ( , often stylized as ebay) is an American multinational e-commerce company based in San Jose, California, that allows users to buy or view items via retail sales through online marketplaces and websites in 190 markets worldwide. ...
.
In 2019,
IEEE
The Institute of Electrical and Electronics Engineers (IEEE) is an American 501(c)(3) organization, 501(c)(3) public charity professional organization for electrical engineering, electronics engineering, and other related disciplines.
The IEEE ...
combined its annual international conferences on "Big Knowledge" and "Data Mining and Intelligent Computing" into the International Conference on Knowledge Graph.
Definitions
There is no single commonly accepted definition of a knowledge graph. Most definitions view the topic through a Semantic Web lens and include these features:
*''Flexible relations among knowledge in topical domains'': A knowledge graph (i) defines
abstract class
In object-oriented programming, a class defines the shared aspects of objects created from the class. The capabilities of a class differ between programming languages, but generally the shared aspects consist of state ( variables) and behavior ( ...
es and relations of entities in a schema, (ii) mainly describes real world entities and their interrelations, organized in a graph, (iii) allows for potentially interrelating arbitrary entities with each other, and (iv) covers various topical domains.
* ''General structure'': A network of entities, their semantic types, properties, and relationships. To represent properties, categorical or numerical values are often used.
* ''Supporting reasoning over inferred ontologies'': A knowledge graph acquires and integrates information into an ontology and applies a reasoner to derive new knowledge.
There are, however, many knowledge graph representations for which some of these features are not relevant. For those knowledge graphs, this simpler definition may be more useful:
* A digital structure that represents knowledge as concepts and the relationships between them (facts). A knowledge graph can include an ontology that allows both humans and machines to understand and reason about its contents.
Implementations
In addition to the above examples, the term has been used to describe open knowledge projects such as
YAGO and Wikidata; federations like the Linked Open Data cloud; a range of commercial search tools, including Yahoo's semantic search assistant Spark, Google's
Knowledge Graph
In knowledge representation and reasoning, a knowledge graph is a knowledge base that uses a Graph (discrete mathematics), graph-structured data model or topology to represent and operate on data. Knowledge graphs are often used to store interl ...
, and Microsoft's Satori; and the LinkedIn and Facebook entity graphs.
The term is also used in the context of
note-taking software applications that allow a user to build a
personal knowledge graph.
The popularization of knowledge graphs and their accompanying methods have led to the development of graph databases such as Neo4j, GraphDB and
AgensGraph. These graph databases allow users to easily store data as entities and their interrelationships, and facilitate operations such as data reasoning, node embedding, and ontology development on knowledge bases.
In contrast, virtual knowledge graphs do not store information in specialized databases. They rely on an underlying relational database or data lake to answer queries on the graph. Such a virtual knowledge graph system must be properly configured in order to answer the queries correctly. This specific configuration is done through a set of mappings that define the relationship between the elements of the data source and the structure and ontology of the virtual knowledge graph.
Using a knowledge graph for reasoning over data
A knowledge graph formally represents semantics by describing entities and their relationships. Knowledge graphs may make use of
ontologies
In information science, an ontology encompasses a representation, formal naming, and definitions of the categories, properties, and relations between the concepts, data, or entities that pertain to one, many, or all domains of discourse. More ...
as a schema layer. By doing this, they allow
logical inference for retrieving
implicit knowledge rather than only allowing queries requesting explicit knowledge.
In order to allow the use of knowledge graphs in various machine learning tasks, several methods for deriving latent feature representations of entities and relations have been devised. These knowledge graph embeddings allow them to be connected to machine learning methods that require feature vectors like
word embedding
In natural language processing, a word embedding is a representation of a word. The embedding is used in text analysis. Typically, the representation is a real-valued vector that encodes the meaning of the word in such a way that the words that ...
s. This can complement other estimates of conceptual similarity.
Models for generating useful knowledge graph embeddings are commonly the domain of graph neural networks (GNNs). GNNs are deep learning architectures that comprise edges and nodes, which correspond well to the entities and relationships of knowledge graphs. The topology and data structures afforded by GNNs provides a convenient domain for semi-supervised learning, wherein the network is trained to predict the value of a node embedding (provided a group of adjacent nodes and their edges) or edge (provided a pair of nodes). These tasks serve as fundamental abstractions for more complex tasks such as knowledge graph reasoning and alignment.
Entity alignment

As new knowledge graphs are produced across a variety of fields and contexts, the same entity will inevitably be represented in multiple graphs. However, because no single standard for the construction or representation of knowledge graph exists, resolving which entities from disparate graphs correspond to the same real world subject is a non-trivial task. This task is known as ''knowledge graph entity alignment'', and is an active area of research.
Strategies for entity alignment generally seek to identify similar substructures, semantic relationships, shared attributes, or combinations of all three between two distinct knowledge graphs. Entity alignment methods use these structural similarities between generally non-isomorphic graphs to predict which nodes corresponds to the same entity.
The recent successes of large language models (LLMs), in particular their effectiveness at producing syntactically meaningful embeddings, has spurred the use of LLMs in the task of entity alignment.
As the amount of data stored in knowledge graphs grows, developing dependable methods for knowledge graph entity alignment becomes an increasingly crucial step in the integration and cohesion of knowledge graph data.
See also
*
*
*
*
*
*
*
*
*
*
*
Wikibase- Mediawiki Software extensions for creating knowledge bases
*
Wikidata
Wikidata is a collaboratively edited multilingual knowledge graph hosted by the Wikimedia Foundation. It is a common source of open data that Wikimedia projects such as Wikipedia, and anyone else, are able to use under the CC0 public domain ...
- Free Knowledge Database Project
*
References
External links
*
{{Authority control
Ontology (information science)
Formal semantics (natural language)
Information science