HOME

TheInfoList



OR:

DBpedia (from "DB" for "
database In computing, a database is an organized collection of data stored and accessed electronically. Small databases can be stored on a file system, while large databases are hosted on computer clusters or cloud storage. The design of databases sp ...
") is a project aiming to extract structured content from the information created in the
Wikipedia Wikipedia is a multilingual free online encyclopedia written and maintained by a community of volunteers, known as Wikipedians, through open collaboration and using a wiki-based editing system. Wikipedia is the largest and most-read refer ...
project. This structured information is made available on the
World Wide Web The World Wide Web (WWW), commonly known as the Web, is an information system enabling documents and other web resources to be accessed over the Internet. Documents and downloadable media are made available to the network through web se ...
. DBpedia allows users to semantically query relationships and properties of Wikipedia resources, including links to other related
dataset A data set (or dataset) is a collection of data. In the case of tabular data, a data set corresponds to one or more database tables, where every column of a table represents a particular variable, and each row corresponds to a given record of the ...
s. In 2008,
Tim Berners-Lee Sir Timothy John Berners-Lee (born 8 June 1955), also known as TimBL, is an English computer scientist best known as the inventor of the World Wide Web. He is a Professorial Fellow of Computer Science at the University of Oxford and a profess ...
described DBpedia as one of the most famous parts of the decentralized
Linked Data In computing, linked data (often capitalized as Linked Data) is structured data which is interlinked with other data so it becomes more useful through semantic queries. It builds upon standard Web technologies such as HTTP, RDF and URIs, but ...
effort.


Background

The project was started by people at the
Free University of Berlin The Free University of Berlin (, often abbreviated as FU Berlin or simply FU) is a public research university in Berlin, Germany. It is consistently ranked among Germany's best universities, with particular strengths in political science and t ...
and
Leipzig University Leipzig University (german: Universität Leipzig), in Leipzig in Saxony, Germany, is one of the world's oldest universities and the second-oldest university (by consecutive years of existence) in Germany. The university was founded on 2 December ...
''DBpedia: A Nucleus for a Web of Open Data'', available a

o

/ref> in collaboration with OpenLink Software, and is now maintained by people at the
University of Mannheim The University of Mannheim (German: ''Universität Mannheim''), abbreviated UMA, is a public research university in Mannheim, Baden-Württemberg, Germany. Founded in 1967, the university has its origins in the ''Palatine Academy of Sciences'', ...
and Leipzig University. The first publicly available dataset was published in 2007. The data is made available under
free licence A free license or open license is a license which allows others to reuse another creator’s work as they wish. Without a special license, these uses are normally prohibited by copyright, patent or commercial license. Most free licenses are wo ...
s (
CC-BY-SA A Creative Commons (CC) license is one of several public copyright license A public license or public copyright licenses is a license by which a copyright holder as licensor can grant additional copyright permissions to any and all pers ...
), allowing others to reuse the dataset; it doesn't however use an
open data Open data is data that is openly accessible, exploitable, editable and shared by anyone for any purpose. Open data is licensed under an open license. The goals of the open data movement are similar to those of other "open(-source)" movements ...
license to waive the
sui generis database right A database right is a ''sui generis'' property right, comparable to but distinct from copyright, that exists to recognise the investment that is made in compiling a database, even when this does not involve the "creative" aspect that is reflecte ...
s. Wikipedia articles consist mostly of free text, but also include structured information embedded in the articles, such as "
infobox An infobox is a digital or physical Table (information), table used to collect and present a subset of information about its subject, such as a document. It is a structured document containing a set of attribute–value pairs, and in Wikipedia r ...
" tables (the pull-out panels that appear in the top right of the default view of many Wikipedia articles, or at the start of the mobile versions), categorization information, images,
geo-coordinates The geographic coordinate system (GCS) is a spherical or ellipsoidal coordinate system for measuring and communicating positions directly on the Earth as latitude and longitude. It is the simplest, oldest and most widely used of the various ...
and links to external Web pages. This structured information is extracted and put in a uniform dataset which can be queried.


Dataset

The 2016-04 release of the DBpedia data set describes 6.0 million entities, out of which 5.2 million are classified in a consistent
ontology In metaphysics, ontology is the philosophical study of being, as well as related concepts such as existence, becoming, and reality. Ontology addresses questions like how entities are grouped into categories and which of these entities exis ...
, including 1.5 million persons, 810,000 places, 135,000 music albums, 106,000 films, 20,000 video games, 275,000 organizations, 301,000 species and 5,000 diseases. DBpedia uses the
Resource Description Framework The Resource Description Framework (RDF) is a World Wide Web Consortium (W3C) standard originally designed as a data model for metadata. It has come to be used as a general method for description and exchange of graph data. RDF provides a variety of ...
(RDF) to represent extracted information and consists of 9.5 billion RDF triples, of which 1.3 billion were extracted from the English edition of Wikipedia and 5.0 billion from other language editions. From this data set, information spread across multiple pages can be extracted. For example, book authorship can be put together from pages about the work, or the author. One of the challenges in extracting information from Wikipedia is that the same
concepts Concepts are defined as abstract ideas. They are understood to be the fundamental building blocks of the concept behind principles, thoughts and beliefs. They play an important role in all aspects of cognition. As such, concepts are studied by sev ...
can be expressed using different parameters in infobox and other templates, such as and . Because of this, queries about where people were born would have to search for both of these properties in order to get more complete results. As a result, the DBpedia Mapping Language has been developed to help in mapping these properties to an ontology while reducing the number of synonyms. Due to the large diversity of infoboxes and properties in use on Wikipedia, the process of developing and improving these mappings has been opened to public contributions. Version 2014 was released in September 2014. A main change since previous versions was the way abstract texts were extracted. Specifically, running a local mirror of Wikipedia and retrieving rendered abstracts from it made extracted texts considerably cleaner. Also, a new data set extracted from
Wikimedia Commons Wikimedia Commons (or simply Commons) is a media repository of free-to-use images, sounds, videos and other media. It is a project of the Wikimedia Foundation. Files from Wikimedia Commons can be used across all of the Wikimedia projects in ...
was introduced. As of June 2021, DBPedia contains over a 850 million triples.


Examples

DBpedia extracts factual information from Wikipedia pages, allowing users to find answers to questions where the information is spread across multiple Wikipedia articles. Data is accessed using an SQL-like
query language Query languages, data query languages or database query languages (DQL) are computer languages used to make queries in databases and information systems. A well known example is the Structured Query Language (SQL). Types Broadly, query language ...
for RDF called
SPARQL SPARQL (pronounced "sparkle" , a recursive acronym for SPARQL Protocol and RDF Query Language) is an RDF query language—that is, a semantic query language for databases—able to retrieve and manipulate data stored in Resource Description F ...
. For example, if one were interested in the
Japanese Japanese may refer to: * Something from or related to Japan, an island country in East Asia * Japanese language, spoken mainly in Japan * Japanese people, the ethnic group that identifies with Japan through ancestry or culture ** Japanese diaspor ...
''shōjo'' manga series ''
Tokyo Mew Mew is a Japanese manga series created and written by Reiko Yoshida and illustrated by Mia Ikumi. It was originally serialized in Kodansha's ''shōjo'' manga magazine ''Nakayoshi'' from September 2000 to February 2003, with its chapters co ...
'', and wanted to find the genres of other works written by its illustrator Mia Ikumi. DBpedia combines information from Wikipedia's entries on ''Tokyo Mew Mew'',
Mia Ikumi was a Japanese manga artist best known for being the illustrator of ''Tokyo Mew Mew'', a manga series she created with Reiko Yoshida. Her first manga story ''The Sleeping Princess of Berry Forest'' was written when she was just 18 years old. ...
and on works such as ''
Super Doll Licca-chan is a Japanese anime television series which ran on TV Tokyo in 1998–1999. Kodansha also serialized a manga based on the anime series in its monthly manga magazine ''Nakayoshi''. The story follows an ordinary elementary school girl named ...
'' and ''Koi Cupid''. Since DBpedia normalises information into a single database, the followin
query
can be asked without needing to know exactly which entry carries each fragment of information, and will list related genres: PREFIX dbprop: PREFIX db: SELECT ?who, ?WORK, ?genre WHERE


Use cases

DBpedia has a broad scope of entities covering different areas of
human knowledge Knowledge can be defined as Descriptive knowledge, awareness of facts or as Procedural knowledge, practical skills, and may also refer to Knowledge by acquaintance, familiarity with objects or situations. Knowledge of facts, also called pro ...
. This makes it a natural hub for connecting datasets, where external datasets could link to its concepts. The DBpedia dataset is interlinked on the RDF level with various other
Open Data Open data is data that is openly accessible, exploitable, editable and shared by anyone for any purpose. Open data is licensed under an open license. The goals of the open data movement are similar to those of other "open(-source)" movements ...
datasets on the Web. This enables applications to enrich DBpedia data with data from these datasets. , there are more than 45 million interlinks between DBpedia and external datasets including:
Freebase Freebase may refer to: *Free base or freebase, the pure basic form of an amine, as opposed to its salt form *Freebase (database), a former online database service *Freebase (mixtape), ''Freebase'' (mixtape), 2014 mixtape by 2 Chainz *An original ...
,
OpenCyc Cyc (pronounced ) is a long-term artificial intelligence project that aims to assemble a comprehensive ontology and knowledge base that spans the basic concepts and rules about how the world works. Hoping to capture common sense knowledge, Cyc f ...
,
UMBEL In botany, an umbel is an inflorescence that consists of a number of short flower stalks (called pedicels) that spread from a common point, somewhat like umbrella ribs. The word was coined in botanical usage in the 1590s, from Latin ''umbella'' "p ...
,
GeoNames GeoNames (or GeoNames.org) is a user editable geographical database available and accessible through various web services, under a Creative Commons attribution license. The project was founded in late 2005. The GeoNames dataset differs from ...
,
MusicBrainz MusicBrainz is a MetaBrainz project that aims to create a collaborative music database that is similar to the freedb project. MusicBrainz was founded in response to the restrictions placed on the Compact Disc Database (CDDB), a database for sof ...
,
CIA World Fact Book ''The World Factbook'', also known as the ''CIA World Factbook'', is a reference resource produced by the Central Intelligence Agency (CIA) with almanac-style information about the countries of the world. The official print version is available ...
,
DBLP DBLP is a computer science bibliography website. Starting in 1993 at Universität Trier in Germany, it grew from a small collection of HTML files and became an organization hosting a database and logic programming bibliography site. Since Nove ...
,
Project Gutenberg Project Gutenberg (PG) is a Virtual volunteering, volunteer effort to digitize and archive cultural works, as well as to "encourage the creation and distribution of eBooks." It was founded in 1971 by American writer Michael S. Hart and is the ...
, DBtune
Jamendo Jamendo is a Luxembourg-based music website and an open community of independent artists and music lovers. A subsidiary of Belgian company AudioValley, and Independent Management Entity (IME) since 2019. Originally, Jamendo was a music platform ...
,
Eurostat Eurostat ('European Statistical Office'; DG ESTAT) is a Directorate-General of the European Commission located in the Kirchberg, Luxembourg, Kirchberg quarter of Luxembourg City, Luxembourg. Eurostat's main responsibilities are to provide statis ...
,
UniProt UniProt is a freely accessible database of protein sequence and functional information, many entries being derived from genome sequencing projects. It contains a large amount of information about the biological function of proteins derived from ...
, Bio2RDF, and US Census data. The
Thomson Reuters Thomson Reuters Corporation ( ) is a Canadian multinational media conglomerate. The company was founded in Toronto, Ontario, Canada, where it is headquartered at the Bay Adelaide Centre. Thomson Reuters was created by the Thomson Corpora ...
initiative OpenCalais, the Linked Open Data project of ''
The New York Times ''The New York Times'' (''the Times'', ''NYT'', or the Gray Lady) is a daily newspaper based in New York City with a worldwide readership reported in 2020 to comprise a declining 840,000 paid print subscribers, and a growing 6 million paid ...
'', the Zemanta API and
DBpedia Spotlight DBpedia (from "DB" for "database") is a project aiming to extract structured content from the information created in the Wikipedia project. This structured information is made available on the World Wide Web. DBpedia allows users to semantical ...
also include links to DBpedia. The
BBC #REDIRECT BBC #REDIRECT BBC Here i going to introduce about the best teacher of my life b BALAJI sir. He is the precious gift that I got befor 2yrs . How has helped and thought all the concept and made my success in the 10th board exam. ...
...
uses DBpedia to help organize its content. Faviki uses DBpedia for semantic tagging.
Samsung The Samsung Group (or simply Samsung) ( ko, 삼성 ) is a South Korean multinational manufacturing conglomerate headquartered in Samsung Town, Seoul, South Korea. It comprises numerous affiliated businesses, most of them united under the ...
also includes DBpedia in it
"Knowledge Sharing Platform"
Such a rich source of structured cross-domain knowledge is fertile ground for
Artificial Intelligence Artificial intelligence (AI) is intelligence—perceiving, synthesizing, and inferring information—demonstrated by machines, as opposed to intelligence displayed by animals and humans. Example tasks in which this is done include speech re ...
systems. DBpedia was used as one of the knowledge sources in
IBM Watson IBM Watson is a question-answering computer system capable of answering questions posed in natural language, developed in IBM's DeepQA project by a research team led by principal investigator David Ferrucci. Watson was named after IBM's founder ...
's
Jeopardy! ''Jeopardy!'' is an American game show created by Merv Griffin. The show is a quiz competition that reverses the traditional question-and-answer format of many quiz shows. Rather than being given questions, contestants are instead given genera ...
winning system
Amazon Amazon most often refers to: * Amazons, a tribe of female warriors in Greek mythology * Amazon rainforest, a rainforest covering most of the Amazon basin * Amazon River, in South America * Amazon (company), an American multinational technology c ...
provides a DBpedia ''Public Data Set'' that can be integrated into
Amazon Web Services Amazon Web Services, Inc. (AWS) is a subsidiary of Amazon.com, Amazon that provides Software as a service, on-demand cloud computing computing platform, platforms and Application programming interface, APIs to individuals, companies, and gover ...
applications. Data about creators from DBpedia can be used for enriching artworks' sales observations. The
crowdsourcing Crowdsourcing involves a large group of dispersed participants contributing or producing goods or services—including ideas, votes, micro-tasks, and finances—for payment or as volunteers. Contemporary crowdsourcing often involves digita ...
software company,
Ushahidi Ushahidi is an open source software application which utilises user-generated reports to collate and map data. It uses the concept of crowdsourcing serving as an initial model for what has been coined as "activist mapping" - the combination o ...
, built a prototype of its software that leveraged DBpedia to perform semantic annotations on citizen-generated reports. The prototype incorporated the "YODIE" (Yet another Open Data Information Extraction system) service developed by the
University of Sheffield , mottoeng = To discover the causes of things , established = – University of SheffieldPredecessor institutions: – Sheffield Medical School – Firth College – Sheffield Technical School – University College of Sheffield , type = Pu ...
, which uses DBpedia to perform the annotations. The goal for Ushahidi was to improve the speed and facility with which incoming reports could be validated managed.


DBpedia Spotlight

DBpedia Spotlight is a tool for annotating mentions of DBpedia resources in text. This allows linking unstructured information sources to the
Linked Open Data In computing, linked data (often capitalized as Linked Data) is structured data which is interlinked with other data so it becomes more useful through semantic queries. It builds upon standard Web technologies such as HTTP, RDF and URIs, but r ...
cloud through DBpedia. DBpedia Spotlight performs named
entity extraction Named-entity recognition (NER) (also known as (named) entity identification, entity chunking, and entity extraction) is a subtask of information extraction that seeks to locate and classify named entities mentioned in unstructured text into pre ...
, including entity detection and name resolution (in other words, disambiguation). It can also be used for
named entity recognition Named-entity recognition (NER) (also known as (named) entity identification, entity chunking, and entity extraction) is a subtask of information extraction that seeks to locate and classify named entities mentioned in unstructured text into pre ...
, and other
information extraction Information extraction (IE) is the task of automatically extracting structured information from unstructured and/or semi-structured machine-readable documents and other electronically represented sources. In most of the cases this activity concer ...
tasks. DBpedia Spotlight aims to be customizable for many use cases. Instead of focusing on a few entity types, the project strives to support the annotation of all 3.5million entities and concepts from more than 320 classes in DBpedia. The project started in June 2010 at the Web Based Systems Group at the Free University of Berlin. DBpedia Spotlight is publicly available as a web service for testing and a
Java Java (; id, Jawa, ; jv, ꦗꦮ; su, ) is one of the Greater Sunda Islands in Indonesia. It is bordered by the Indian Ocean to the south and the Java Sea to the north. With a population of 151.6 million people, Java is the world's List ...
/ Scala
API An application programming interface (API) is a way for two or more computer programs to communicate with each other. It is a type of software interface, offering a service to other pieces of software. A document or standard that describes how ...
licensed via the Apache License. The DBpedia Spotlight distribution includes a
jQuery jQuery is a JavaScript library designed to simplify HTML DOM tree traversal and manipulation, as well as event handling, CSS animation, and Ajax. It is free, open-source software using the permissive MIT License. As of Aug 2022, jQuery is used ...
plugin that allows developers to annotate pages anywhere on the Web by adding one line to their page. Clients are also available in Java or
PHP PHP is a general-purpose scripting language geared toward web development. It was originally created by Danish-Canadian programmer Rasmus Lerdorf in 1993 and released in 1995. The PHP reference implementation is now produced by The PHP Group. ...
. The tool handles various languages through its demo page and web services. Internationalization is supported for any language that has a Wikipedia edition.


Archivo ontology database

From 2020, the DBpedia project provides a regularly updated database of web‑accessible ontologies written in the
OWL Owls are birds from the order Strigiformes (), which includes over 200 species of mostly solitary and nocturnal birds of prey typified by an upright stance, a large, broad head, binocular vision, binaural hearing, sharp talons, and feathers a ...
ontology language. Archivo also provides a four star rating scheme for the ontologies it scrapes, based on accessibility, quality, and related fitness‑for‑use criteria. For instance,
SHACL Shapes Constraint Language (SHACL) is a World Wide Web Consortium (W3C) standard language for describing Resource Description Framework (RDF) graphs. SHACL has been designed to enhance the semantic and technical interoperability layers of ontolog ...
compliance for graph‑based data is evaluated when appropriate. Ontologies should also contain metadata about their characteristics and specify a public license describing their terms‑of‑use. the Archivo database contains 1368 entries.


History

DBpedia was initiated in 2007 by Sören Auer, Christian Bizer, Georgi Kobilarov,
Jens Lehmann Jens Gerhard Lehmann (; born 10 November 1969) is a German former professional footballer who played as a goalkeeper. He was a member of Arsenal's " Invincibles", playing every match of their unbeaten title-winning season. He holds the UEFA Ch ...
, Richard Cyganiak and Zachary Ives.


See also

*
BabelNet BabelNet is a multilingual lexicalized semantic network and ontology developed at the NLP group of the Sapienza University of Rome.R. Navigli and S. P Ponzetto. 2012BabelNet: The Automatic Construction, Evaluation and Application of a Wide-Cove ...
*
Semantic MediaWiki Semantic MediaWiki (SMW) is an extension to MediaWiki that allows for annotating semantic data within wiki pages, thus turning a wiki that incorporates the extension into a semantic wiki. Data that has been encoded can be used in semantic search ...
*
Wikidata Wikidata is a collaboratively edited multilingual knowledge graph hosted by the Wikimedia Foundation. It is a common source of open data that Wikimedia projects such as Wikipedia, and anyone else, can use under the CC0 public domain license. ...


References


External links

* {{DEFAULTSORT:Dbpedia Free software culture and documents Open data Semantic Web Knowledge bases History of Wikipedia Java platform Free software programmed in Scala