HOME

TheInfoList



OR:

Wikidata is a collaboratively edited multilingual
knowledge graph The Google Knowledge Graph is a knowledge base from which Google serves relevant information in an infobox beside its search results. This allows the user to see the answer in a glance. The data is generated automatically from a variety of so ...
hosted by the
Wikimedia Foundation The Wikimedia Foundation, Inc., or Wikimedia for short and abbreviated as WMF, is an American 501(c)(3) nonprofit organization headquartered in San Francisco, California and registered as a charitable foundation under local laws. Best kno ...
. It is a common source of
open data Open data is data that is openly accessible, exploitable, editable and shared by anyone for any purpose. Open data is licensed under an open license. The goals of the open data movement are similar to those of other "open(-source)" movements ...
that Wikimedia projects such as
Wikipedia Wikipedia is a multilingual free online encyclopedia written and maintained by a community of volunteers, known as Wikipedians, through open collaboration and using a wiki-based editing system. Wikipedia is the largest and most-read refer ...
, and anyone else, can use under the
CC0 A Creative Commons (CC) license is one of several public copyright licenses that enable the free distribution of an otherwise copyrighted "work".A "work" is any creative material made by a person. A painting, a graphic, a book, a song/lyric ...
public domain The public domain (PD) consists of all the creative work A creative work is a manifestation of creative effort including fine artwork (sculpture, paintings, drawing, sketching, performance art), dance, writing (literature), filmmaking, ...
license. Wikidata is a wiki powered by the software
MediaWiki MediaWiki is a free and open-source wiki software. It is used on Wikipedia and almost all other Wikimedia websites, including Wiktionary, Wikimedia Commons and Wikidata; these sites define a large part of the requirement set for MediaWiki ...
, and is also powered by the set of knowledge graph MediaWiki extensions known as
Wikibase Wikibase is a set of MediaWiki extensions for working with versioned semi-structured data in a central repository based upon JSON instead of the unstructured data of MediaWiki wikitext. Its primary components are the ''Wikibase Repository'', an ...
.


Concept

Wikidata is a
document-oriented database A document-oriented database, or document store, is a computer program and data storage system designed for storing, retrieving and managing document-oriented information, also known as semi-structured data. Document-oriented databases are one ...
, focused on items, which represent any kind of topic, concept, or object. Each item is allocated a unique,
persistent identifier A persistent identifier (PI or PID) is a long-lasting reference to a document, file, web page, or other object. The term "persistent identifier" is usually used in the context of digital objects that are accessible over the Internet. Typically, s ...
, a positive integer prefixed with the upper-case letter Q, known as a "QID". This enables the basic information required to identify the topic that the item covers to be translated without favouring any language. Examples of items include , , , , and . Item labels need not be unique. For example, there are two items named "Elvis Presley": , which represents the American singer and actor, and , which represents his self-titled album. However, the combination of a label and its description must be unique. To avoid ambiguity, an item's unique identifier (''QID'') is therefore linked to this combination.


Main parts

Fundamentally, an item consists of: * Obligatorily, an
identifier An identifier is a name that identifies (that is, labels the identity of) either a unique object or a unique ''class'' of objects, where the "object" or class may be an idea, physical countable object (or class thereof), or physical noncountable ...
(the QID), related to a label and a description. * Optionally, multiple aliases and some number of statements (and their properties and values).


Statements

Statements are how any information known about an item is recorded in Wikidata. Formally, they consist of key–value pairs, which match a ''property'' (such as "author", or "publication date") with one or more entity ''values'' (such as "
Sir Arthur Conan Doyle Sir Arthur Ignatius Conan Doyle (22 May 1859 – 7 July 1930) was a British writer and physician. He created the character Sherlock Holmes in 1887 for '' A Study in Scarlet'', the first of four novels and fifty-six short stories about Ho ...
" or "1902"). For example, the informal English statement "milk is white" would be encoded by a statement pairing the property with the value under the item . Statements may map a property to more than one value. For example, the "occupation" property for
Marie Curie Marie Salomea Skłodowska–Curie ( , , ; born Maria Salomea Skłodowska, ; 7 November 1867 – 4 July 1934) was a Polish and naturalized-French physicist and chemist who conducted pioneering research on radioactivity. She was the first ...
could be linked with the values "physicist" and "chemist", to reflect the fact that she engaged in both occupations. Values may take on many types including other Wikidata items, strings, numbers, or media files. Properties prescribe what types of values they may be paired with. For example, the property may only be paired with values of type "URL". Optionally, ''qualifiers'' can be used to refine the meaning of a statement by providing additional information. For example, a "population" statement could be modified with a qualifier such as "as of 2011". Values in the statements may also be annotated with ''references'', pointing to a source backing up the statement's content. As with statements, all qualifiers and references are property–value pairs.


Properties

Each property has a numeric identifier prefixed with a capital P and a page on Wikidata with optional label, description, aliases, and statements. As such, there are properties with the sole purpose of describing other properties, such as . Properties may also define more complex rules about their intended usage, termed ''constraints''. For example, the property includes a "single value constraint", reflecting the reality that (typically) territories have only one capital city. Constraints are treated as testing alerts and hints, rather than inviolable rules. Before a new property is created, it needs to undergo a discussion process. The most used property is , which is used on more than item pages


Lexemes

In
linguistics Linguistics is the scientific study of human language. It is called a scientific study because it entails a comprehensive, systematic, objective, and precise analysis of all aspects of language, particularly its nature and structure. Linguis ...
, a
lexeme A lexeme () is a unit of lexical meaning that underlies a set of words that are related through inflection. It is a basic abstract unit of meaning, a unit of morphological analysis in linguistics that roughly corresponds to a set of forms taken ...
is a unit of
lexical Lexical may refer to: Linguistics * Lexical corpus or lexis, a complete set of all words in a language * Lexical item, a basic unit of lexicographical classification * Lexicon, the vocabulary of a person, language, or branch of knowledge * Lexical ...
meaning. Similarly, Wikidata's ''lexemes'' are items with a structure that makes them more suitable to store
lexicographical Lexicography is the study of lexicons, and is divided into two separate academic disciplines. It is the art of compiling dictionaries. * Practical lexicography is the art or craft of compiling, writing and editing dictionaries. * Theoreti ...
data. Besides storing the language to which the lexeme refers, they have a section for ''forms'' and a section for ''senses''.


EntitySchemas

In January 2019 development started of a new extension for MediaWiki to enable storing Shape Expressions in a separate namespace. This extension has since been installed on Wikidata and enables contributors to use Shape Expressions for validating and describing Resource Description Framework data in items and lexemes. Any item or lexeme on Wikidata can be validated against an Entity Schema, and this makes it an important tool for quality assurance.


Development

The creation of the project was funded by donations from the
Allen Institute for Artificial Intelligence The Allen Institute for AI (abbreviated AI2) is a research institute founded by late Microsoft co-founder Paul Allen. The institute seeks to achieve scientific breakthroughs by constructing AI systems with reasoning, learning, and reading capabi ...
, the
Gordon and Betty Moore Foundation The Gordon and Betty Moore Foundation is an American foundation established by Intel co-founder Gordon E. Moore and his wife Betty I. Moore in September 2000 to support scientific discovery, environmental conservation, patient care improvements a ...
, and Google, Inc., totaling
The euro sign () is the currency sign used for the euro, the official currency of the eurozone and unilaterally adopted by Kosovo and Montenegro. The design was presented to the public by the European Commission on 12 December 1996. It consists o ...
1.3 million. The development of the project is mainly driven by
Wikimedia Deutschland Wikimedia chapters are national or sub-national not-for-profit organizations created to promote the interests of Wikimedia projects locally. Chapters are legally independent of the Wikimedia Foundation, entering into an agreement with the founda ...
under the management of
Lydia Pintscher Lydia ( Lydian: ‎𐤮𐤱𐤠𐤭𐤣𐤠, ''Śfarda''; Aramaic: ''Lydia''; el, Λυδία, ''Lȳdíā''; tr, Lidya) was an Iron Age kingdom of western Asia Minor located generally east of ancient Ionia in the modern western Turkish prov ...
, and was originally split into three phases: # Centralising interlanguage links – links between Wikipedia articles about the same topic in different languages. # Providing a central place for
infobox An infobox is a digital or physical Table (information), table used to collect and present a subset of information about its subject, such as a document. It is a structured document containing a set of attribute–value pairs, and in Wikipedia r ...
data for all Wikipedias. # Creating and updating list articles based on data in Wikidata and linking to other Wikimedia sister projects, including
Meta-Wiki The Wikimedia Foundation, Inc., or Wikimedia for short and abbreviated as WMF, is an American 501(c)(3) nonprofit organization headquartered in San Francisco, California and registered as foundation (United States law), a charitable foundati ...
and the own Wikidata (interwikilinks).


Initial rollout

Wikidata was launched on 29 October 2012 and was the first new project of the Wikimedia Foundation since 2006.Wikidata
()
At this time, only the centralization of language links was available. This enabled items to be created and filled with basic information: a label – a name or title, aliases – alternative terms for the label, a description, and links to articles about the topic in all the various language editions of Wikipedia (interwikipedia links). Historically, a Wikipedia article would include a list of interlanguage links (links to articles on the same topic in other editions of Wikipedia, if they existed). Wikidata was originally a self-contained
repository Repository may refer to: Archives and online databases * Content repository, a database with an associated set of data management tools, allowing application-independent access to the content * Disciplinary repository (or subject repository), an ...
of interlanguage links. Wikipedia language editions were still not able to access Wikidata, so they needed to continue to maintain their own lists of interlanguage links. On 14 January 2013, the
Hungarian Wikipedia The Hungarian Wikipedia ( hu, Magyar Wikipédia) is the Hungarian/Magyar version of Wikipedia, the free encyclopedia. Started on 8 July 2003, this version reached the 300,000-article milestone in May 2015.
became the first to enable the provision of interlanguage links via Wikidata. This functionality was extended to the
Hebrew Hebrew (; ; ) is a Northwest Semitic language of the Afroasiatic language family. Historically, it is one of the spoken languages of the Israelites and their longest-surviving descendants, the Jews and Samaritans. It was largely preserved ...
and
Italian Italian(s) may refer to: * Anything of, from, or related to the people of Italy over the centuries ** Italians, an ethnic group or simply a citizen of the Italian Republic or Italian Kingdom ** Italian language, a Romance language *** Regional Ita ...
Wikipedias on 30 January, to the
English Wikipedia The English Wikipedia is, along with the Simple English Wikipedia, one of two English-language editions of Wikipedia, an online encyclopedia. It was founded on January 15, 2001, as Wikipedia's first edition, and, as of , has the most arti ...
on 13 February and to all other Wikipedias on 6 March. After no consensus was reached over a proposal to restrict the removal of language links from the English Wikipedia, they were automatically removed by
bot Bot may refer to: Sciences Computing and technology * Chatbot, a computer program that converses in natural language * Internet bot, a software application that runs automated tasks (scripts) over the Internet **a Spambot, an internet bot des ...
s. On 23 September 2013, interlanguage links went live on Wikimedia Commons.


Statements and data access

On 4 February 2013, statements were introduced to Wikidata entries. The possible values for properties were initially limited to two data types (items and images on Wikimedia Commons), with more
data type In computer science and computer programming, a data type (or simply type) is a set of possible values and a set of allowed operations on it. A data type tells the compiler or interpreter how the programmer intends to use the data. Most progra ...
s (such as
coordinates In geometry, a coordinate system is a system that uses one or more numbers, or coordinates, to uniquely determine the position of the points or other geometric elements on a manifold such as Euclidean space. The order of the coordinates is sig ...
and dates) to follow later. The first new type, string, was deployed on 6 March. The ability for the various language editions of Wikipedia to access data from Wikidata was rolled out progressively between 27 March and 25 April 2013. On 16 September 2015, Wikidata began allowing so-called ''arbitrary access'', or access from a given article of a Wikipedia to the statements on Wikidata items not directly connected to it. For example, it became possible to read data about Germany from the Berlin article, which was not feasible before. On 27 April 2016 arbitrary access was activated on Wikimedia Commons. According to a 2020 study, a large proportion of the data on Wikidata consists of entries imported en masse from other databases by
Internet bot An Internet bot, web robot, robot or simply bot, is a software application that runs automated tasks (scripts) over the Internet, usually with the intent to imitate human activity on the Internet, such as messaging, on a large scale. An Internet ...
s, which helps to "break down the walls" of data silos.


Query service and other improvements

On 7 September 2015, the
Wikimedia Foundation The Wikimedia Foundation, Inc., or Wikimedia for short and abbreviated as WMF, is an American 501(c)(3) nonprofit organization headquartered in San Francisco, California and registered as a charitable foundation under local laws. Best kno ...
announced the release of the Wikidata Query Service, which lets users run queries on the data contained in Wikidata. The service uses
SPARQL SPARQL (pronounced "sparkle" , a recursive acronym for SPARQL Protocol and RDF Query Language) is an RDF query language—that is, a semantic query language for databases—able to retrieve and manipulate data stored in Resource Description F ...
as the query language. As of November 2018, there are at least 26 different tools that allow querying the data in different ways. It uses
Blazegraph Blazegraph is an open source triplestore and graph database, developed by Systap, which is used in the Wikidata SPARQL endpoint and by other large customers. It is licensed under the GNU GPL (version 2). Amazon acquired the Blazegraph developer ...
as its
triplestore A triplestore or RDF store is a purpose-built database for the storage and retrieval of triples through semantic queries. A triple is a data entity composed of subject–predicate–object, like "Bob is 35" or "Bob knows Fred". Much like a relati ...
and
graph database A graph database (GDB) is a database that uses graph structures for semantic queries with nodes, edges, and properties to represent and store data. A key concept of the system is the ''graph'' (or ''edge'' or ''relationship''). The graph relat ...
.


Logo

The bars on the
logo A logo (abbreviation of logotype; ) is a graphic mark, emblem, or symbol used to aid and promote public identification and recognition. It may be of an abstract or figurative design or include the text of the name it represents as in a wordma ...
contain the word "WIKI" encoded in
Morse code Morse code is a method used in telecommunication to encode text characters as standardized sequences of two different signal durations, called ''dots'' and ''dashes'', or ''dits'' and ''dahs''. Morse code is named after Samuel Morse, one of ...
. It was created by Arun Ganesh and selected through community decision.


Reception

In November 2014, Wikidata received the Open Data Publisher Award from the
Open Data Institute The Open Data Institute (ODI) is a non-profit private company limited by guarantee, based in the United Kingdom. Founded by Sir Tim Berners-Lee and Sir Nigel Shadbolt in 2012, the ODI’s mission is to connect, equip and inspire people around th ...
"for sheer scale, and built-in openness". In December 2014, Google announced that it would shut down
Freebase Freebase may refer to: *Free base or freebase, the pure basic form of an amine, as opposed to its salt form *Freebase (database), a former online database service *Freebase (mixtape), ''Freebase'' (mixtape), 2014 mixtape by 2 Chainz *An original ...
in favor of Wikidata. , Wikidata information was used in 58.4% of all English Wikipedia articles, mostly for external identifiers or coordinate locations. In aggregate, data from Wikidata is shown in 64% of all Wikipedias' pages, 93% of all
Wikivoyage Wikivoyage is a free web-based travel guide for travel destinations and travel topics written by volunteer authors. It is a sister project of Wikipedia and supported and hosted by the same non-profit Wikimedia Foundation (WMF). Wikivoyage has ...
articles, 34% of all
Wikiquote Wikiquote is part of a family of wiki-based projects run by the Wikimedia Foundation using MediaWiki software. Based on an idea by Daniel Alston and implemented by Brion Vibber, the project's objective is to produce collaboratively a vast refer ...
s', 32% of all
Wikisource Wikisource is an online digital library of free-content textual sources on a wiki, operated by the Wikimedia Foundation. Wikisource is the name of the project as a whole and the name for each instance of that project (each instance usually rep ...
s', and 27% of Wikimedia Commons's. Usage in other
Wikimedia Foundation The Wikimedia Foundation, Inc., or Wikimedia for short and abbreviated as WMF, is an American 501(c)(3) nonprofit organization headquartered in San Francisco, California and registered as a charitable foundation under local laws. Best kno ...
projects is a testimonial. , Wikidata's data was visualized by at least 20 other external tools and over 300 papers have been published about Wikidata. Wikidata's structured dataset has been used by
virtual assistant An intelligent virtual assistant (IVA) or intelligent personal assistant (IPA) is a software agent that can perform tasks or services for an individual based on commands or questions. The term "chatbot" is sometimes used to refer to virtual ...
s such as Apple's
Siri Siri ( ) is a virtual assistant that is part of Apple Inc.'s iOS, iPadOS, watchOS, macOS, tvOS, and audioOS operating systems. It uses voice queries, gesture based control, focus-tracking and a natural-language user interface to answer questio ...
and
Amazon Alexa Amazon Alexa, also known simply as Alexa, is a virtual assistant technology largely based on a Polish speech synthesiser named Ivona, bought by Amazon in 2013. It was first used in the Amazon Echo smart speaker and the Echo Dot, Echo Studio and ...
.


Applications

* Mwnci extension can import data from Wikidata to
LibreOffice Calc LibreOffice Calc is the spreadsheet component of the LibreOffice software package. After forking from OpenOffice.org in 2010, LibreOffice Calc underwent a massive re-work of external reference handling to fix many defects in formula calculation ...
spreadsheet A spreadsheet is a computer application for computation, organization, analysis and storage of data in tabular form. Spreadsheets were developed as computerized analogs of paper accounting worksheets. The program operates on data entered in cel ...
s * There are (at October 2019) discussions about using QID items in relation to what is being called QID emoji * Wiki Explorer – Android application to discover things around you and micro editing Wikidata * KDE Itinerary – a privacy conscious open source travel assistant that uses data from Wikidata *
Google Google LLC () is an American multinational technology company focusing on search engine technology, online advertising, cloud computing, computer software, quantum computing, e-commerce, artificial intelligence, and consumer electronics. ...
originally started a frame semantic
parser Parsing, syntax analysis, or syntactic analysis is the process of analyzing a string of symbols, either in natural language, computer languages or data structures, conforming to the rules of a formal grammar. The term ''parsing'' comes from Lati ...
project that aims to parse the information on
Wikipedia Wikipedia is a multilingual free online encyclopedia written and maintained by a community of volunteers, known as Wikipedians, through open collaboration and using a wiki-based editing system. Wikipedia is the largest and most-read refer ...
and transfer it into Wikidata by coming up with relevant statements using
artificial intelligence Artificial intelligence (AI) is intelligence—perceiving, synthesizing, and inferring information—demonstrated by machines, as opposed to intelligence displayed by animals and humans. Example tasks in which this is done include speech re ...
. A systematic literature review of the uses of Wikidata in research was carried in 2019.


See also

*
Abstract Wikipedia Abstract Wikipedia is an in-development project of the Wikimedia Foundation that aims to use Wikifunctions to create a language-independent version of Wikipedia using its structured data. The overall project was conceived by Denny Vrandečić, t ...
*
BabelNet BabelNet is a multilingual lexicalized semantic network and ontology developed at the NLP group of the Sapienza University of Rome.R. Navigli and S. P Ponzetto. 2012BabelNet: The Automatic Construction, Evaluation and Application of a Wide-Cover ...
*
DBpedia DBpedia (from "DB" for "database") is a project aiming to extract structured content from the information created in the Wikipedia project. This structured information is made available on the World Wide Web. DBpedia allows users to semantica ...
*
Semantic MediaWiki Semantic MediaWiki (SMW) is an extension to MediaWiki that allows for annotating semantic data within wiki pages, thus turning a wiki that incorporates the extension into a semantic wiki. Data that has been encoded can be used in semantic search ...
*
Wikibase Wikibase is a set of MediaWiki extensions for working with versioned semi-structured data in a central repository based upon JSON instead of the unstructured data of MediaWiki wikitext. Its primary components are the ''Wikibase Repository'', an ...


References


Further reading

* * Claudia Müller-Birn, Benjamin Karran, Janette Lehmann, Markus Luczak-Rösch:
Peer-production system or collaborative ontology development effort: What is Wikidata?
' In, OpenSym 2015 – Conference on Open Collaboration, San Francisco, US, 19 – 21 Aug 2015 (preprint).


External links

* * Videos
WikidataCon
on ''media.ccc.de'' {{Authority control Knowledge graphs Online databases Wikimedia projects Lexical databases Advertising-free websites Creative Commons-licensed websites Internet properties established in 2012 Articles containing video clips Open data Community websites