HOME

TheInfoList



OR:

An infobox is a digital or physical table used to collect and present a subset of information about its subject, such as a
document A document is a written, drawn, presented, or memorialized representation of thought, often the manifestation of non-fictional, as well as fictional, content. The word originates from the Latin ''Documentum'', which denotes a "teaching" o ...
. It is a structured document containing a set of attribute–value pairs, and in
Wikipedia Wikipedia is a multilingual free online encyclopedia written and maintained by a community of volunteers, known as Wikipedians, through open collaboration and using a wiki-based editing system. Wikipedia is the largest and most-read refer ...
represents a summary of information about the subject of an
article Article often refers to: * Article (grammar), a grammatical element used to indicate definiteness or indefiniteness * Article (publishing), a piece of nonfictional prose that is an independent part of a publication Article may also refer to: ...
. In this way, they are comparable to data tables in some aspects. When presented within the larger document it summarizes, an infobox is often presented in a sidebar format. An infobox may be implemented in another document by transcluding it into that document and specifying some or all of the attribute–value pairs associated with that infobox, known as
parameterization In mathematics, and more specifically in geometry, parametrization (or parameterization; also parameterisation, parametrisation) is the process of finding parametric equations of a curve, a surface, or, more generally, a manifold or a variety, d ...
.


Wikipedia

An infobox may be used to summarize the information of an article on
Wikipedia Wikipedia is a multilingual free online encyclopedia written and maintained by a community of volunteers, known as Wikipedians, through open collaboration and using a wiki-based editing system. Wikipedia is the largest and most-read refer ...
. They are used on similar articles to ensure consistency of presentation by using a common format. Originally, infoboxes (and templates in general) were used for
page layout In graphic design, page layout is the arrangement of visual elements on a page. It generally involves organizational principles of composition to achieve specific communication objectives. The high-level page layout involves deciding on the ...
purposes. An infobox may be transcluded into an article by specifying the
value Value or values may refer to: Ethics and social * Value (ethics) wherein said concept may be construed as treating actions themselves as abstract objects, associating value to them ** Values (Western philosophy) expands the notion of value beyo ...
for some or all of its
parameters A parameter (), generally, is any characteristic that can help in defining or classifying a particular system (meaning an event, project, object, situation, etc.). That is, a parameter is an element of a system that is useful, or critical, when ...
. The parameter name used must be the same as that specified in the infobox template, but any value may be associated to it. The name is delimited from the value by an
equals sign The equals sign (British English, Unicode) or equal sign (American English), also known as the equality sign, is the mathematical symbol , which is used to indicate equality in some well-defined sense. In an equation, it is placed between tw ...
. The parameter name may be regarded as an attribute of the article's subject. On Wikipedia, an infobox is transcluded into an article by enclosing its name and attribute–value pairs within a double set of braces. The
MediaWiki MediaWiki is a free and open-source wiki software. It is used on Wikipedia and almost all other Wikimedia websites, including Wiktionary, Wikimedia Commons and Wikidata; these sites define a large part of the requirement set for Media ...
software on which Wikipedia operates then parses the document, for which the infobox and other templates are processed by a
template processor A template processor (also known as a template engine or template parser) is software designed to combine templates with a data model to produce result documents. The language that the templates are written in is known as a template language ...
. This is a template engine which produces a web document and a style sheet used for presentation of the document. This enables the design of the infobox to be separated from the content it manipulates; that is, the design of the template may be updated without affecting the information within it, and the new design will automatically propagate to all articles that transclude the infobox. Usually, infoboxes are formatted to appear in the top-right corner of a Wikipedia article in the desktop view, or at the top in the mobile view. Placement of an infobox within the
wikitext A wiki ( ) is an online hypertext publication collaboratively edited and managed by its own audience, using a web browser. A typical wiki contains multiple pages for the subjects or scope of the project, and could be either open to the pub ...
of an article is important for
accessibility Accessibility is the design of products, devices, services, vehicles, or environments so as to be usable by people with disabilities. The concept of accessible design and practice of accessible development ensures both "direct access" (i. ...
. A
best practice A best practice is a method or technique that has been generally accepted as superior to other known alternatives because it often produces results that are superior to those achieved by other means or because it has become a standard way of doing ...
is to place them following ''disambiguation'' templates (those that direct readers to articles about topics with similar names) and maintenance templates (such as that marking an article as unreferenced), but before all other
content Content or contents may refer to: Media * Content (media), information or experience provided to audience or end-users by publishers or media producers ** Content industry, an umbrella term that encompasses companies owning and providing mas ...
.
Baeza-Yates Ricardo A. Baeza-Yates (born March 21, 1961) is a Chilean-Catalan computer scientist that currently is a Research Professor at the Institute for Experiential AI of Northeastern University in the Silicon Valley campus. He is also part-time profes ...
and King say that some editors find templates such as infoboxes complicated, as the template may hide text about a property or resource that the editor wishes to change; this is exacerbated by chained templates, that is templates transcluded within other templates. As of August 2009, English Wikipedia used about infobox templates that collectively used more than attributes. Since then, many have been merged, to reduce redundancy. As of June 2013, there were at least transclusions of the parent
Infobox template An infobox is a digital or physical table used to collect and present a subset of information about its subject, such as a document. It is a structured document containing a set of attribute–value pairs, and in Wikipedia represents a summary ...
, used by some, but not all, infoboxes, on articles. The name of an Infobox is typically "Infobox enre; however, widely used infoboxes may be assigned shorter names, such as "taxobox" for taxonomy.


Machine learning

About 44.2% of Wikipedia articles contained an infobox in 2008, and about 33% in 2010. Automated semantic knowledge extraction using
machine learning Machine learning (ML) is a field of inquiry devoted to understanding and building methods that 'learn', that is, methods that leverage data to improve performance on some set of tasks. It is seen as a part of artificial intelligence. Machine ...
algorithms is used to "extract machine-processable information at a relatively low complexity cost". However, the low coverage makes it more difficult, though this can be partially overcome by complementing article data with that in
categories Category, plural categories, may refer to: Philosophy and general uses *Categorization, categories in cognitive science, information science and generally *Category of being *Categories (Aristotle), ''Categories'' (Aristotle) *Category (Kant) ...
in which the article is included. The
French Wikipedia The French Wikipedia (french: Wikipédia en français) is the French-language edition of Wikipedia, the free online encyclopedia. This edition was started on 23 March 2001, two months after the official creation of Wikipedia. It has article ...
initiated the project ''Infobox Version 2'' in May 2011.The project is hosted on the
French Wikipedia The French Wikipedia (french: Wikipédia en français) is the French-language edition of Wikipedia, the free online encyclopedia. This edition was started on 23 March 2001, two months after the official creation of Wikipedia. It has article ...
page Infobox/V2.
Knowledge obtained by machine learning can be used to improve an article, such as by using automated software suggestions to editors for adding infobox data. The iPopulator project created a system to add a value to an article's infobox parameter via an automated parsing of the text of that article.
DBpedia DBpedia (from "DB" for "database") is a project aiming to extract structured content from the information created in the Wikipedia project. This structured information is made available on the World Wide Web. DBpedia allows users to semantica ...
uses structured content extracted from infoboxes by machine learning algorithms to create a resource of
linked data In computing, linked data (often capitalized as Linked Data) is structured data which is interlinked with other data so it becomes more useful through semantic queries. It builds upon standard Web technologies such as HTTP, RDF and URIs, but ...
in the Semantic Web; it has been described by
Tim Berners-Lee Sir Timothy John Berners-Lee (born 8 June 1955), also known as TimBL, is an English computer scientist best known as the inventor of the World Wide Web. He is a Professorial Fellow of Computer Science at the University of Oxford and a profess ...
as "one of the more famous" components of the linked data project. Machine extraction creates a triple consisting of a subject, predicate or relation, and object. Each attribute-value pair of the infobox is used to create an RDF statement using an
ontology In metaphysics, ontology is the philosophical study of being, as well as related concepts such as existence, becoming, and reality. Ontology addresses questions like how entities are grouped into categories and which of these entities exis ...
. This is facilated by the narrower gap between Wikipedia and an ontology than exists between unstructured or free text and an ontology. The semantic relationship between the subject and object is established by the predicate. In the example infobox, the triple ("crostata", type, "tart") indicates that a
crostata A crostata is an Italian baked tart or pie, also known as ''coppi'' in Naples and ''sfogliata'' in Lombardy. The earliest known use of ''crostata'' in its modern sense can be traced to the cookbooks ''Libro de Arte Coquinaria'' (Book of the Ar ...
is a type of
tart A tart is a baked dish consisting of a filling over a pastry base with an open top not covered with pastry. The pastry is usually shortcrust pastry; the filling may be sweet or savoury, though modern tarts are usually fruit-based, sometimes with ...
. The article's topic is used as the subject, the parameter name is used as the predicate, and the parameter's value as the object. Each type of infobox is mapped to an ontology class, and each property (parameter) within an infobox is mapped to an ontology property. These mappings are used when parsing a Wikipedia article to extract data.


Citations


Works cited

* * * * * * *


Further reading

* * * {{cite journal, title=Information extraction from Wikipedia: moving down the long tail, last1=Wu, first1=Fei, last2=Hoffmann, first2=Ralph, last3=Weld, first3=Daniel s., journal=Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, publisher=
Association for Computing Machinery The Association for Computing Machinery (ACM) is a US-based international learned society for computing. It was founded in 1947 and is the world's largest scientific and educational computing society. The ACM is a non-profit professional member ...
, year=2008, pages=731–739, isbn=9781605581934, doi=10.1145/1401890.1401978, s2cid=7781746 Semantic Web Wikipedia