HOME

TheInfoList




An index term, subject term, subject heading, descriptor, or keyword, in
information retrieval Information retrieval (IR) in computing Computing is any goal-oriented activity requiring, benefiting from, or creating computing machinery. It includes the study and experimentation of algorithmic processes and development of both computer h ...
, is a term that captures the essence of the topic of a document. Index terms make up a
controlled vocabulary Controlled vocabularies provide a way to organize knowledge for subsequent retrieval. They are used in subject indexing schemes, subject headings, thesaurus (information retrieval), thesauri, Taxonomy (general), taxonomies and other knowledge orga ...
for use in
bibliographic recordA bibliographic record is an entry in a bibliographic index (or a library catalog) which represents and describes a specific resource. A bibliographic record contains the data elements necessary to help users identify and retrieve that resource, as w ...
s. They are an integral part of bibliographic control, which is the function by which libraries collect, organize and disseminate documents. They are used as keywords to retrieve documents in an information system, for instance, a catalog or a
search engine A search engine is a software system A software system is a system of intercommunicating software component, components based on forming part of a computer system (a combination of Computer hardware, hardware and software). It "consists of a n ...
. A popular form of keywords on the web are
tags Tag, TAG, or tagging could refer to: Identification and tracking * Tag, a label used in electronic article surveillance to prevent shoplifting * Tagging (graffiti), a form of graffiti signature * Dog tag (military), an ID tag worn by military p ...
, which are directly visible and can be assigned by non-experts. Index terms can consist of a word, phrase, or alphanumerical term. They are created by analyzing the document either manually with
subject indexing Subject indexing is the act of describing or classifying a document A document is a written Writing is a medium of human communication Communication (from Latin ''communicare'', meaning "to share") is the act of developing Semantics, m ...
or automatically with
automatic indexing Automatic indexing is the computer A computer is a machine that can be programmed to carry out sequences of arithmetic or logical operations automatically. Modern computers can perform generic sets of operations known as Computer program, pr ...
or more sophisticated methods of keyword extraction. Index terms can either come from a controlled vocabulary or be freely assigned. Keywords are stored in a
search indexSearch engine optimisation indexing is the collecting, parsing, and storing of data to facilitate fast and accurate information retrieval. Index design incorporates interdisciplinary concepts from linguistics, cognitive psychology, mathematics, infor ...
. Common words like articles (a, an, the) and conjunctions (and, or, but) are not treated as keywords because it's inefficient. Almost every English-language site on the Internet has the article "''the''", and so it makes no sense to search for it. The most popular search engine,
Google Google LLC is an American multinational Multinational may refer to: * Multinational corporation, a corporate organization operating in multiple countries * Multinational force, a military body from multiple countries * Multinational stat ...

Google
removed
stop words Stop words are any word in a stop list (or stoplist or negative dictionary) which are filtered out (i.e. stopped) before or after Natural language processing, processing of natural language data (text). There is no single universal list of stop word ...
such as "the" and "a" from its indexes for several years, but then re-introduced them, making certain types of precise search possible again. The term "descriptor" was by
Calvin Mooers Calvin Northrup Mooers (October 24, 1919 – December 1, 1994), was an United States, American computer scientist known for his work in information retrieval and for the programming language TRAC programming language, TRAC. Early life Mooers was ...
in 1948. It is in particular used about a preferred term from a
thesaurus A thesaurus (plural ''thesauri'' or ''thesauruses'') or synonym dictionary is a reference work for finding synonyms and sometimes antonyms of words. They are often used by writers to help find the best word to express an idea: Synonym diction ...
. The
Simple Knowledge Organization System Simple Knowledge Organization System (SKOS) is a W3C recommendation The World Wide Web Consortium (W3C) is the main international standards organization A standards organization, standards body, standards developing organization (SDO), or s ...
language (SKOS) provides a way to express index terms with
Resource Description Framework The Resource Description Framework (RDF) is a family of World Wide Web Consortium (W3C) specifications originally designed as a metadata data model. It has come to be used as a general method for conceptual description or modeling of information t ...
for use in the context of the
Semantic Web The Semantic Web (sometimes known as Web 3.0) is an extension of the World Wide Web The World Wide Web (WWW), commonly known as the Web, is an information system An information system (IS) is a formal, sociotechnical Sociotechnica ...

Semantic Web
.


In web search engines

Most
web search engine A search engine is a software system A software system is a system A system is a group of interacting Interaction is a kind of action that occurs as two or more objects have an effect upon one another. The idea of a two-way effect is ...
s are designed to search for words anywhere in a document—the title, the body, and so on. This being the case, a keyword can be any term that exists within the document. However, priority is given to words that occur in the title, words that recur numerous times, and words that are explicitly assigned as keywords within the coding. Index terms can be further refined using
Boolean operators Any kind of logic, function, expression, or theory based on the work of George Boole is considered Boolean. Related to this, "Boolean" may refer to: * Boolean data type, a form of data with only two possible values (usually "true" and "false") ...
such as "AND, OR, NOT." "AND" is normally unnecessary as most search engines infer it. "OR" will search for results with one search term or another or both. "NOT" eliminates a word or phrase from the search, getting rid of any results that include it. Multiple words can also be enclosed in quotation marks to turn the individual index terms into a specific index ''phrase''. These modifiers and methods all help to refine search terms, to better maximize the accuracy of search results.CLIO. ''Keyword search''. Columbia University Libraries. Retrieved from http://www.columbia.edu/cu/lweb/help/clio/keyword.html


Author keywords

Author keywords are an integral part of literature. Many journals and databases provide access to index terms made by authors of the respective articles. How qualified the provider is decides the quality of both indexer-provided index terms and author-provided index terms. The quality of these two types of index terms is of research interest, particularly in relation to
information retrieval Information retrieval (IR) in computing Computing is any goal-oriented activity requiring, benefiting from, or creating computing machinery. It includes the study and experimentation of algorithmic processes and development of both computer h ...
. In general, an author will have difficulty providing indexing terms that characterize his or her document ''relative'' to other documents in the database.


Examples

*
Canadian Subject HeadingsCanadian Subject Headings (CSH) is a list of subject headings in the English language, using controlled vocabulary, to access and express the topic content of documents on Canada and Canadian topics. Library and Archives Canada publishes and maintain ...
(CS) *
Library of Congress Subject HeadingsThe Library of Congress Subject Headings (LCSH) comprise a thesaurus A thesaurus (plural ''thesauri'' or ''thesauruses'') or synonym dictionary is a reference work for finding synonyms and sometimes antonyms of words. They are often used by wr ...
(LCSH) *
Medical Subject Headings Medical Subject Headings (MeSH) is a comprehensive controlled vocabulary Controlled vocabularies provide a way to organize knowledge for subsequent retrieval. They are used in subject indexing schemes, subject headings, thesaurus (information r ...
(MeSH) *
Polythematic Structured Subject Heading System Polythematic structured-subject heading system (abbreviated as PSH from the Czech language, Czech ''Polytematický Strukturovaný Heslář'') is a bilingual Czech–English controlled vocabulary of Index term, subject headings developed and mainta ...
(PSH) *
Subject Headings Authority FileThe Subject Headings Authority File sharing, File (german: Schlagwortnormdatei) or SWD is a controlled vocabulary index term system used primarily for subject indexing in library catalogs. The SWD is managed by the German National Library (DNB) in co ...
(SWD)


See also

* Dynamic keyword insertion *
Index (publishing) An '' 'index (plural: usually indexes, more rarely indices; see below) is a list of words or phrases ('headings') and associated pointers ('locators') to where useful material relating to that heading can be found in a document or collection ...
*
Keyword density Keyword density is the percentage of times a keyword or phrase appears on a web page compared to the total number of words on the page. In the context of search engine optimization, keyword density can be used to determine whether a web page is rele ...
*
Search engine optimization Search engine optimization (SEO) is the process of improving the quality and quantity of website traffic to a website A website (also written as web site) is a collection of web page A web page (or webpage) is a hypertext File:D ...
*
Subject (documents) In library and information science Library and information science (LIS) (sometimes given as the plural library and information sciences) is a branch of academic disciplines that deal generally with organization, access, and collection of informati ...
*
Tag (metadata) In information system An information system (IS) is a formal, sociotechnical Sociotechnical systems (STS) in organizational development is an approach to complex organizational work design that recognizes the interaction between people and t ...
*
Tag cloud File:Web 2.0 Map.svg, A tag cloud with terms related to Web 2.0 A tag cloud (word cloud or wordle or weighted list in visual design) is a novelty visual representation of text data, typically used to depict tag (metadata), keyword metadata (tags ...


References


Further reading

*


External links


Lardera, Marco and Birger Hjørland. "Keyword" In ISKO Encyclopedia of Knowledge Organization
{{Authority control Information retrieval techniques Thesauri