HOME

TheInfoList



OR:

Key Word In Context (KWIC) is the most common format for
concordance Concordance may refer to: * Agreement (linguistics), a form of cross-reference between different parts of a sentence or phrase * Bible concordance, an alphabetical listing of terms in the Bible * Concordant coastline, in geology, where beds, or la ...
lines. The term KWIC was first coined by
Hans Peter Luhn Hans Peter Luhn (July 1, 1896 – August 19, 1964) was a German researcher in the field of computer science and Library & Information Science for IBM, and creator of the Luhn algorithm, KWIC (Key Words In Context) indexing, and Selective ...
. The system was based on a concept called ''keyword in titles'' which was first proposed for Manchester libraries in 1864 by
Andrea Crestadoro Dr. Andrea Crestadoro (1808–1879) was a bibliographer who became Chief Librarian of Manchester Free Library, 1864–1879. He is credited with being the first person to propose that books could be catalogued by using keywords that did not occur ...
. A KWIC index is formed by sorting and aligning the words within an article title to allow each word (except the
stop words Stop words are the words in a stop list (or ''stoplist'' or ''negative dictionary'') which are filtered out (i.e. stopped) before or after processing of natural language data (text) because they are insignificant. There is no single universal list ...
) in titles to be searchable alphabetically in the index. It was a useful indexing method for technical manuals before computerized
full text search In text retrieval, full-text search refers to techniques for searching a single computer-stored document or a collection in a full-text database. Full-text search is distinguished from searches based on metadata or on parts of the original texts r ...
became common. For example, a search query including all of the words in an example definition ("KWIC is an acronym for Key Word In Context, the most common format for concordance lines") and the Wikipedia slogan in English ("the free encyclopedia"), searched against a Wikipedia page, might yield a KWIC index as follows. A KWIC index usually uses a wide layout to allow the display of maximum 'in context' information (not shown in the following example). A KWIC index is a special case of a ''permuted index''. This term refers to the fact that it indexes all
cyclic permutation In mathematics, and in particular in group theory, a cyclic permutation (or cycle) is a permutation of the elements of some set ''X'' which maps the elements of some subset ''S'' of ''X'' to each other in a cyclic fashion, while fixing (that is, ma ...
s of the headings. Books composed of many short sections with their own descriptive headings, most notably collections of manual pages, often ended with a permuted index section, allowing the reader to easily find a section by any word from its heading. This practice, also known as Key Word Out of Context (KWOC), is no longer common. KWAC.png, Keyword alongside context (KWAC) KWIC.png, Keyword in context (KWIC) KWOC.png, Keyword out of context (KWOC)


References in literature

''Note: The first reference does not show the KWIC index unless you pay to view the paper. The second reference does not even list the paper at all.'' * David L. Parnas uses a KWIC Index as an example on how to perform modular design in his pape
''On the Criteria To Be Used in Decomposing Systems into Modules''
available as a
ACM Classic Paper
* Christopher D. Manning and Hinrich Schütze describe a KWIC index and computer concordancing in section 1.4.5 of their book ''Foundations of Statistical Natural Language Processing''. Cambridge, Mass: MIT Press, 1999. . They cite an article from H.P. Luhn from 1960, "Key word-in-context index for technical literature (kwic index)". * According to Rev. Gerard O'Connor's

', "Most of the concordances produced in recent times and with the aid of computer software use both the KWIC (keyword in context) and KWICn (keyword in center) formats, which lists the keyword, usually highlighted in bold text in a consistent position, within a limited amount of context text, i.e. three rfour words of the text prior to the keyword and the same amount of text following. This format is extremely useful in that the keyword is easily identified together with its context. ... The Concordance of the Roman Missal is produced in both the KWIC and KWICn formats and is noteworthy in that each word form is listed as it appears in the text, that is, it is un-lemmatized."


See also

* ptx, a Unix command-line utility producing a
permuted index Key Word In Context (KWIC) is the most common format for concordance lines. The term KWIC was first coined by Hans Peter Luhn. The system was based on a concept called ''keyword in titles'' which was first proposed for Manchester libraries in 1864 ...
*
Concordancer A concordancer is a computer program that automatically constructs a concordance. The output of a concordancer may serve as input to a translation memory system for computer-assisted translation, or as an early step in machine translation. Conco ...
*
Concordance (publishing) A concordance is an alphabetical list of the principal words used in a book or body of work, listing every instance of each word with its immediate context. Concordances have been compiled only for works of special importance, such as the Vedas, ...
*
Burrows–Wheeler transform The Burrows–Wheeler transform (BWT, also called block-sorting compression) rearranges a character string into runs of similar characters. This is useful for compression, since it tends to be easy to compress a string that has runs of repeated c ...
*
Hans Peter Luhn Hans Peter Luhn (July 1, 1896 – August 19, 1964) was a German researcher in the field of computer science and Library & Information Science for IBM, and creator of the Luhn algorithm, KWIC (Key Words In Context) indexing, and Selective ...
* Suffix tree


References

{{Reflist Index (publishing) Reference Concordances (publishing)