Controlled vocabularies provide a way to organize knowledge for subsequent retrieval. They are used in
subject indexing Subject indexing is the act of describing or classifying a document
A document is a written
Writing is a medium of human communication
Communication (from Latin ''communicare'', meaning "to share") is the act of developing Semantics, m ...
schemes,
subject headingAn index term, subject term, subject heading, or descriptor, in information retrieval, is a term that captures the essence of the topic of a document. Index terms make up a controlled vocabulary for use in bibliographic records. They are an integr ...
s,
thesauri
A thesaurus (plural ''thesauri'' or ''thesauruses'') or synonym dictionary is a reference work for finding synonyms and sometimes antonyms of words. They are often used by writers to help find the best word to express an idea:
Synonym dictiona ...
,
taxonomies and other
knowledge organization systemKnowledge Organization Systems (KOS), concept system or concept scheme is a generic term used in knowledge organization about authority files, classification schemes, thesaurus (information retrieval), thesauri, topic maps, Ontology (information scie ...
s. Controlled vocabulary schemes mandate the use of predefined, authorised terms that have been preselected by the designers of the schemes, in contrast to
natural language
In neuropsychology
Neuropsychology is a branch of psychology. It is concerned with how a person's cognition and behavior are related to the brain and the rest of the nervous system. Professionals in this branch of psychology often focus on ...
vocabularies, which have no such restriction.
In library and information science
In
library and information science
Library and information science (LIS) (sometimes given as the plural library and information sciences) is a branch of academic disciplines that deal generally with organization, access, and collection of information, whether in physical (for example ...
, controlled vocabulary is a carefully selected list of
word
In linguistics
Linguistics is the scientific study of language
A language is a structured system of communication used by humans, including speech (spoken language), gestures (Signed language, sign language) and writing. Most lang ...
s and
phrase
In syntax
In linguistics, syntax () is the set of rules, principles, and processes that govern the structure of Sentence (linguistics), sentences (sentence structure) in a given Natural language, language, usually including word order. The ter ...

s, which are used to
tag units of information (document or work) so that they may be more easily retrieved by a search. Controlled vocabularies solve the problems of
homographs
A homograph (from the el, ὁμός, ''homós'', "same" and γράφω, ''gráphō'', "write") is a word that shares the same written form as another word but has a different meaning. However, some dictionaries insist that the words must also ...
,
synonyms
A synonym is a word, morpheme
A morpheme is the smallest meaningful lexical item in a language. A morpheme is not a word. The difference between a morpheme and a word is that a morpheme bound and free morphemes, sometimes does not stand alone ...
and
polyseme
Polysemy ( or ; from grc-gre, πολύ-, , "many" and , , "sign") is the capacity for a word or phrase to have multiple related meanings. Polysemy is thus distinct from homonymy—or homophone, homophony—which is an accidental similarity betwee ...
s by a
bijection
In , a bijection, bijective function, one-to-one correspondence, or invertible function, is a between the elements of two , where each element of one set is paired with exactly one element of the other set, and each element of the other set is p ...

between concepts and authorized terms. In short, controlled vocabularies reduce ambiguity inherent in normal human languages where the same concept can be given different names and ensure consistency.
For example, in the
Library of Congress Subject HeadingsThe Library of Congress Subject Headings (LCSH) comprise a thesaurus
A thesaurus (plural ''thesauri'' or ''thesauruses'') or synonym dictionary is a reference work for finding synonyms and sometimes antonyms of words. They are often used by wr ...
(a subject heading system that uses a controlled vocabulary), authorized terms—subject headings in this case—have to be chosen to handle choices between variant spellings of the same word (American versus British), choice among scientific and popular terms (''cockroach'' versus ''Periplaneta americana''), and choices between synonyms (''automobile'' versus ''car''), among other difficult issues.
Choices of authorized terms are based on the principles of ''user warrant'' (what terms users are likely to use), ''literary warrant'' (what terms are generally used in the literature and documents), and ''structural warrant'' (terms chosen by considering the structure, scope of the controlled vocabulary).
Controlled vocabularies also typically handle the problem of
homographs
A homograph (from the el, ὁμός, ''homós'', "same" and γράφω, ''gráphō'', "write") is a word that shares the same written form as another word but has a different meaning. However, some dictionaries insist that the words must also ...
with qualifiers. For example, the term ''pool'' has to be qualified to refer to either ''swimming pool'' or the game ''pool'' to ensure that each authorized term or heading refers to only one concept.
Types used in libraries
There are two main kinds of controlled vocabulary tools used in libraries: subject headings and thesauri. While the differences between the two are diminishing, there are still some minor differences.
Historically subject headings were designed to describe books in library catalogs by catalogers while thesauri were used by indexers to apply index terms to documents and articles. Subject headings tend to be broader in scope describing whole books, while thesauri tend to be more specialized covering very specific disciplines. Also because of the card catalog system, subject headings tend to have terms that are in indirect order (though with the rise of automated systems this is being removed), while thesaurus terms are always in direct order. Subject headings also tend to use more pre-coordination of terms such that the designer of the controlled vocabulary will combine various concepts together to form one authorized subject heading. (e.g., children and terrorism) while thesauri tend to use singular direct terms. Lastly thesauri list not only equivalent terms but also narrower, broader terms and related terms among various authorized and non-authorized terms, while historically most subject headings did not.
For example, the
Library of Congress Subject HeadingThe Library of Congress Subject Headings (LCSH) comprise a thesaurus (information retrieval), thesaurus (in the information science sense, a controlled vocabulary) of subject headings, maintained by the United States Library of Congress, for use in b ...
itself did not have much syndetic structure until 1943, and it was not until 1985 when it began to adopt the thesauri type term "
Broader term" and "
Narrow term".
The
are chosen and organized by trained professionals (including librarians and information scientists) who possess expertise in the subject area. Controlled vocabulary terms can accurately describe what a given document is actually about, even if the terms themselves do not occur within the document's text. Well known subject heading systems include the
Library of Congress system,
MeSH
A mesh is a barrier made of connected strands of metal
A metal (from Ancient Greek, Greek μέταλλον ''métallon'', "mine, quarry, metal") is a material that, when freshly prepared, polished, or fractured, shows a lustrous appearan ...
, and
Sears
Sears, Roebuck and Co., commonly known as Sears, is an American chain of department stores founded by Richard Warren Sears and Alvah Curtis Roebuck in 1892, and reincorporated by Richard Sears and Julius Rosenwald in 1906. Formerly based at ...
. Well known thesauri include the
Art and Architecture Thesaurus
Art is a diverse range of (products of) human behavior, human activities involving creative imagination to express technical proficiency, beauty, emotional power, or conceptual ideas.
There is no generally agreed definition of what constitute ...
and the
ERIC
The given name
Image:FML names-2.png, Diagram of naming conventions, using John F. Kennedy as an example. "First names" can also be called given names; "last names" can also be called surnames or family names. This shows a structure typical f ...
Thesaurus.
Choosing authorized terms to be used is a tricky business, besides the areas already considered above, the designer has to consider the specificity of the term chosen, whether to use direct entry, inter consistency and stability of the language. Lastly the amount of pre-co-ordinate (in which case the degree of enumeration versus synthesis becomes an issue) and post co-ordinate in the system is another important issue.
Controlled vocabulary elements (terms/phrases) employed as
tags
Tag, TAG, or tagging could refer to:
Identification and tracking
* Tag, a label used in electronic article surveillance to prevent shoplifting
* Tagging (graffiti), a form of graffiti signature
* Dog tag (military), an ID tag worn by military p ...
, to aid in the content identification process of documents, or other information system entities (e.g. DBMS, Web Services) qualifies as
metadata
Metadata is "data
Data (; ) are individual facts, statistics, or items of information, often numeric. In a more technical sense, data are a set of values of qualitative property, qualitative or quantity, quantitative variable (research), v ...

.
Indexing languages
There are three main types of indexing languages.
* Controlled indexing language – only approved terms can be used by the indexer to describe the document
*
Natural language
In neuropsychology
Neuropsychology is a branch of psychology. It is concerned with how a person's cognition and behavior are related to the brain and the rest of the nervous system. Professionals in this branch of psychology often focus on ...
indexing language – any term from the document in question can be used to describe the document
* Free indexing language – any term (not only from the document) can be used to describe the document
When indexing a document, the indexer also has to choose the level of indexing exhaustivity, the level of detail in which the document is described. For example, using low indexing exhaustivity, minor aspects of the work will not be described with index terms. In general the higher the indexing exhaustivity, the more terms indexed for each document.
In recent years
free text search
In Text retrieve, text retrieval, full-text search refers to techniques for searching a single computer-stored document or a collection in a full-text database. Full-text search is distinguished from searches based on metadata or on parts of the or ...
as a means of access to documents has become popular. This involves using natural language indexing with an indexing exhaustively set to maximum (every word in the text is ''indexed''). Many studies have been done to compare the efficiency and effectiveness of free text searches against documents that have been indexed by experts using a few well chosen controlled vocabulary descriptors.
Advantages
Controlled vocabularies are often claimed to improve the accuracy of free text searching, such as to reduce
irrelevant items in the retrieval list. These irrelevant items (
false positives
False or falsehood may refer to:
*False (logic)
In logic
Logic (from Ancient Greek, Greek: grc, wikt:λογική, λογική, label=none, lit=possessed of reason, intellectual, dialectical, argumentative, translit=logikḗ)Also related t ...
) are often caused by the inherent ambiguity of
natural language
In neuropsychology
Neuropsychology is a branch of psychology. It is concerned with how a person's cognition and behavior are related to the brain and the rest of the nervous system. Professionals in this branch of psychology often focus on ...
. Take the English word
''football'' for example. ''Football'' is the name given to a number of different
team sport
A team sport includes any sport where individuals are organized into opposing sports team, teams which compete to win. Team members act together towards a shared objective. This can be done in a number of ways such as outscoring the opposing ...
s. Worldwide the most popular of these team sports is
association football
Association football, more commonly known as simply football or soccer, is a team sport
A team sport includes any sport
Sport pertains to any form of Competition, competitive physical activity or game that aims to use, maintain ...
, which also happens to be called ''
soccer
Association football, more commonly known as simply football or soccer, is a team sport
A team sport includes any sport
Sport pertains to any form of Competition, competitive physical activity or game that aims to use, maintain ...

'' in several countries. The word ''football'' is also applied to
rugby football
Rugby football is a collective name for the family of team sports of rugby union and rugby league, as well as the earlier forms of football from which both games, as well as Australian rules football and gridiron football, evolved.
The two v ...
(
rugby union
Rugby union, commonly known simply as rugby, is a Contact sport#Terminology, close-contact team sport that originated in England in the first half of the 19th century. One of the Comparison of rugby league and rugby union, two codes of rugby f ...
and
rugby league
Rugby league football, commonly known as just rugby league or simply league, rugby, football, or footy, is a full-contact sport played by two teams of thirteen players on a rectangular field
Field may refer to:
Expanses of open ground
* Fi ...
),
American football
American football, referred to simply as football in the United States and Canada and also known as gridiron, is a team sport played by two teams of eleven players on a rectangular American football field, field with goalposts at each end. ...

,
Australian rules football
Australian rules football, officially known as Australian football, or simply called "Aussie rules", "football
Football is a family of team sport
A team is a , Gaelic football">roup (disambiguation), group of individuals (human or non-human) working ...
, Gaelic football, and Canadian football. A search for ''football'' therefore will retrieve documents that are about several completely different sports. Controlled vocabulary solves this problem by Tag (metadata), tagging the documents in such a way that the ambiguities are eliminated.
Compared to free text searching, the use of a controlled vocabulary can dramatically increase the performance of an information retrieval system, if performance is measured by precision (the percentage of documents in the retrieval list that are actually