alphabetical ordering
   HOME

TheInfoList



OR:

Alphabetical order is a system whereby character strings are placed in order based on the position of the characters in the conventional ordering of an
alphabet An alphabet is a standardized set of basic written graphemes (called letters) that represent the phonemes of certain spoken languages. Not all writing systems represent language in this way; in a syllabary, each character represents a syllab ...
. It is one of the methods of
collation Collation is the assembly of written information into a standard order. Many systems of collation are based on numerical order or alphabetical order, or extensions and combinations thereof. Collation is a fundamental element of most office filin ...
. In mathematics, a
lexicographical order In mathematics, the lexicographic or lexicographical order (also known as lexical order, or dictionary order) is a generalization of the alphabetical order of the dictionaries to sequences of ordered symbols or, more generally, of elements of ...
is the generalization of the alphabetical order to other data types, such as sequences of numbers or other ordered
mathematical object A mathematical object is an abstract concept arising in mathematics. In the usual language of mathematics, an ''object'' is anything that has been (or could be) formally defined, and with which one may do deductive reasoning and mathematical p ...
s. When applied to strings or sequences that may contain digits, numbers or more elaborate types of elements, in addition to alphabetical characters, the alphabetical order is generally called a
lexicographical order In mathematics, the lexicographic or lexicographical order (also known as lexical order, or dictionary order) is a generalization of the alphabetical order of the dictionaries to sequences of ordered symbols or, more generally, of elements of ...
. To determine which of two strings of characters comes first when arranging in alphabetical order, their first
letters Letter, letters, or literature may refer to: Characters typeface * Letter (alphabet), a character representing one or more of the sounds used in speech; any of the symbols of an alphabet. * Letterform, the graphic form of a letter of the alpha ...
are compared. If they differ, then the string whose first letter comes earlier in the alphabet comes before the other string. If the first letters are the same, then the second letters are compared, and so on. If a position is reached where one string has no more letters to compare while the other does, then the first (shorter) string is deemed to come first in alphabetical order.
Capital letter Letter case is the distinction between the letters that are in larger uppercase or capitals (or more formally ''majuscule'') and smaller lowercase (or more formally ''minuscule'') in the written representation of certain languages. The writin ...
s (upper case) are generally considered to be identical to their corresponding lower case letters for the purposes of alphabetical ordering, although conventions may be adopted to handle situations where two strings differ ''only'' in capitalization. Various conventions also exist for the handling of strings containing
space Space is the boundless three-dimensional extent in which objects and events have relative position and direction. In classical physics, physical space is often conceived in three linear dimensions, although modern physicists usually consi ...
s, modified letters (such as those with
diacritic A diacritic (also diacritical mark, diacritical point, diacritical sign, or accent) is a glyph added to a letter or to a basic glyph. The term derives from the Ancient Greek (, "distinguishing"), from (, "to distinguish"). The word ''diacrit ...
s), and non-letter characters such as marks of
punctuation Punctuation (or sometimes interpunction) is the use of spacing, conventional signs (called punctuation marks), and certain typographical devices as aids to the understanding and correct reading of written text, whether read silently or aloud. A ...
. The result of placing a set of words or strings in alphabetical order is that all of the strings beginning with the same letter are grouped together; within that grouping all words beginning with the same two-letter sequence are grouped together; and so on. The system thus tends to maximize the number of common initial letters between adjacent words.


History

Alphabetical order was first used in the 1st millennium BCE by Northwest Semitic scribes using the
abjad An abjad (, ar, أبجد; also abgad) is a writing system in which only consonants are represented, leaving vowel sounds to be inferred by the reader. This contrasts with other alphabets, which provide graphemes for both consonants and vow ...
system. However, a range of other methods of classifying and ordering material, including geographical,
chronological Chronology (from Latin ''chronologia'', from Ancient Greek , ''chrónos'', "time"; and , '' -logia'') is the science of arranging events in their order of occurrence in time. Consider, for example, the use of a timeline or sequence of events. ...
,
hierarchical A hierarchy (from Greek: , from , 'president of sacred rites') is an arrangement of items (objects, names, values, categories, etc.) that are represented as being "above", "below", or "at the same level as" one another. Hierarchy is an important ...
and by category, were preferred over alphabetical order for centuries. The
Bible The Bible (from Koine Greek , , 'the books') is a collection of religious texts or scriptures that are held to be sacred in Christianity Christianity is an Abrahamic monotheistic religion based on the life and teachings of Jesus ...
is dated to the 6th–7th centuries BCE. In the
Book of Jeremiah The Book of Jeremiah ( he, ספר יִרְמְיָהוּ) is the second of the Latter Prophets in the Hebrew Bible, and the second of the Prophets in the Christian Old Testament. The superscription at chapter Jeremiah 1:1–3 identifies the b ...
, the prophet utilizes the
Atbash Atbash ( he, אתבש; also transliterated Atbaš) is a monoalphabetic substitution cipher originally used to encrypt the Hebrew alphabet. It can be modified for use with any known writing system with a standard collating order. Encryption Th ...
substitution cipher In cryptography, a substitution cipher is a method of encrypting in which units of plaintext are replaced with the ciphertext, in a defined manner, with the help of a key; the "units" may be single letters (the most common), pairs of letters, tri ...
, based on alphabetical order. Similarly, biblical authors used
acrostic An acrostic is a poem or other word composition in which the ''first'' letter (or syllable, or word) of each new line (or paragraph, or other recurring feature in the text) spells out a word, message or the alphabet. The term comes from the F ...
s based on the (ordered)
Hebrew alphabet The Hebrew alphabet ( he, אָלֶף־בֵּית עִבְרִי, ), known variously by scholars as the Ktav Ashuri, Jewish script, square script and block script, is an abjad script used in the writing of the Hebrew language and other Jewi ...
. The first effective use of alphabetical order as a cataloging device among scholars may have been in ancient Alexandria, in the
Great Library of Alexandria The Great Library of Alexandria in Alexandria, Egypt, was one of the largest and most significant libraries of the ancient world. The Library was part of a larger research institution called the Mouseion, which was dedicated to the Muses, t ...
, which was founded around 300 BCE. The poet and scholar
Callimachus Callimachus (; ) was an ancient Greek poet, scholar and librarian who was active in Alexandria during the 3rd century BC. A representative of Ancient Greek literature of the Hellenistic period, he wrote over 800 literary works in a wide varie ...
, who worked there, is thought to have created the world's first
library catalog A library catalog (or library catalogue in British English) is a register of all bibliographic items found in a library or group of libraries, such as a network of libraries at several locations. A catalog for a group of libraries is also ...
, known as the
Pinakes The ''Pinakes'' ( grc, Πίνακες "tables", plural of ) is a lost bibliographic work composed by Callimachus (310/305–240 BCE) that is popularly considered to be the first library catalog in the West; its contents were based upon the hold ...
, with scrolls shelved in alphabetical order of the first letter of authors' names. In the 1st century BC, Roman writer
Varro Marcus Terentius Varro (; 116–27 BC) was a Roman polymath and a prolific author. He is regarded as ancient Rome's greatest scholar, and was described by Petrarch as "the third great light of Rome" (after Vergil and Cicero). He is sometimes calle ...
compiled alphabetic lists of authors and titles. In the 2nd century CE,
Sextus Pompeius Festus Sextus Pompeius Festus, usually known simply as Festus, was a Roman grammarian who probably flourished in the later 2nd century AD, perhaps at Narbo (Narbonne) in Gaul. Work He made a 20-volume epitome of Verrius Flaccus's voluminous and encyclop ...
wrote an encyclopedic
epitome An epitome (; gr, ἐπιτομή, from ἐπιτέμνειν ''epitemnein'' meaning "to cut short") is a summary or miniature form, or an instance that represents a larger reality, also used as a synonym for embodiment. Epitomacy represents " ...
of the works of Verrius Flaccus, ''
De verborum significatu ''De verborum significatione libri XX'' ('Twenty Books on the Meaning of Words'), also known as the ''Lexicon of Festus'', is an epitome compiled, edited, and annotated by Sextus Pompeius Festus from the encyclopedic works of Verrius Flaccus. Fe ...
'', with entries in alphabetic order. In the 3rd century CE, Harpocration wrote a
Homer Homer (; grc, Ὅμηρος , ''Hómēros'') (born ) was a Greek poet who is credited as the author of the ''Iliad'' and the ''Odyssey'', two epic poems that are foundational works of ancient Greek literature. Homer is considered one of the ...
ic lexicon alphabetized by all letters. In the 10th century, the author of the ''
Suda The ''Suda'' or ''Souda'' (; grc-x-medieval, Σοῦδα, Soûda; la, Suidae Lexicon) is a large 10th-century Byzantine encyclopedia of the ancient Mediterranean world, formerly attributed to an author called Soudas (Σούδας) or Souida ...
'' used alphabetic order with phonetic variations. Alphabetical order as an aid to consultation started to enter the mainstream of
Western Europe Western Europe is the western region of Europe. The region's countries and territories vary depending on context. The concept of "the West" appeared in Europe in juxtaposition to "the East" and originally applied to the ancient Mediterranean ...
an intellectual life in the second half of the 12th century, when alphabetical tools were developed to help preachers analyse
biblical The Bible (from Koine Greek , , 'the books') is a collection of religious texts or scriptures that are held to be sacred in Christianity, Judaism, Samaritanism, and many other religions. The Bible is an anthologya compilation of texts of ...
vocabulary. This led to the compilation of alphabetical concordances of the Bible by the Dominican friars in
Paris Paris () is the capital and most populous city of France, with an estimated population of 2,165,423 residents in 2019 in an area of more than 105 km² (41 sq mi), making it the 30th most densely populated city in the world in 2020. Si ...
in the 13th century, under Hugh of Saint Cher. Older reference works such as St. Jerome's ''Interpretations of Hebrew Names'' were alphabetized for ease of consultation. The use of alphabetical order was initially resisted by scholars, who expected their students to master their area of study according to its own rational structures; its success was driven by such tools as Robert Kilwardby's index to the works of
St. Augustine Augustine of Hippo ( , ; la, Aurelius Augustinus Hipponensis; 13 November 354 – 28 August 430), also known as Saint Augustine, was a theologian and philosopher of Berber origin and the bishop of Hippo Regius in Numidia, Roman North Afr ...
, which helped readers access the full original text instead of depending on the compilations of excerpts which had become prominent in 12th century
scholasticism Scholasticism was a medieval school of philosophy that employed a critical organic method of philosophical analysis predicated upon the Aristotelian 10 Categories. Christian scholasticism emerged within the monastic schools that translat ...
. The adoption of alphabetical order was part of the transition from the primacy of
memory Memory is the faculty of the mind by which data or information is encoded, stored, and retrieved when needed. It is the retention of information over time for the purpose of influencing future action. If past events could not be remember ...
to that of written works. The idea of ordering information by the order of the alphabet also met resistance from the compilers of encyclopaedias in the 12th and 13th centuries, who were all devout churchmen. They preferred to organise their material
theologically Theology is the systematic study of the nature of the divine and, more broadly, of religious belief. It is taught as an academic discipline, typically in universities and seminaries. It occupies itself with the unique content of analyzing the ...
– in the order of God's creation, starting with ''Deus'' (meaning God). In 1604 Robert Cawdrey had to explain in '' Table Alphabeticall'', the first monolingual English
dictionary A dictionary is a listing of lexemes from the lexicon of one or more specific languages, often arranged alphabetically (or by radical and stroke for ideographic languages), which may include information on definitions, usage, etymologie ...
, "Nowe if the word, which thou art desirous to finde, begin with (a) then looke in the beginning of this Table, but if with (v) looke towards the end". Although as late as 1803
Samuel Taylor Coleridge Samuel Taylor Coleridge (; 21 October 177225 July 1834) was an English poet, literary critic, philosopher, and theologian who, with his friend William Wordsworth, was a founder of the Romantic Movement in England and a member of the Lak ...
condemned encyclopedias with "an arrangement determined by the accident of initial letters", many lists are today based on this principle. Arrangement in alphabetical order can be seen as a force for democratising access to information, as it does not require extensive prior knowledge to find what was needed.


Ordering in the Latin script


Basic order and examples

The standard order of the modern
ISO basic Latin alphabet The ISO basic Latin alphabet is an international standard (beginning with ISO/IEC 646) for a Latin-script alphabet that consists of two sets ( uppercase and lowercase) of 26 letters, codified in various national and international standards and ...
is: :A-B-C-D-E-F-G-H-I-J-K-L-M-N-O-P-Q-R-S-T-U-V-W-X-Y-Z An example of straightforward alphabetical ordering follows: *''As; Aster; Astrolabe; Astronomy; Astrophysics; At; Ataman; Attack; Baa'' Another example: *''Barnacle; Be; Been; Benefit; Bent'' The above words are ordered alphabetically. ''As'' comes before ''Aster'' because they begin with the same two letters and ''As'' has no more letters after that whereas ''Aster'' does. The next three words come after ''Aster'' because their fourth letter (the first one that differs) is ''r'', which comes after ''e'' (the fourth letter of ''Aster'') in the alphabet. Those words themselves are ordered based on their sixth letters (''l'', ''n'' and ''p'' respectively). Then comes ''At'', which differs from the preceding words in the second letter (''t'' comes after ''s''). ''Ataman'' comes after ''At'' for the same reason that ''Aster'' came after ''As''. ''Attack'' follows ''Ataman'' based on comparison of their third letters, and ''Baa'' comes after all of the others because it has a different first letter.


Treatment of multiword strings

When some of the strings being ordered consist of more than one word, i.e., they contain spaces or other separators such as
hyphen The hyphen is a punctuation mark used to join words and to separate syllables of a single word. The use of hyphens is called hyphenation. ''Son-in-law'' is an example of a hyphenated word. The hyphen is sometimes confused with dashes ( figure ...
s, then two basic approaches may be taken. In the first approach, all strings are ordered initially according to their first word, as in the sequence: *''Oak; Oak Hill; Oak Ridge; Oakley Park; Oakley River'' *:where all strings beginning with the separate word ''Oak'' precede all those beginning ''Oakley'', because ''Oak'' precedes ''Oakley'' in alphabetical order. In the second approach, strings are alphabetized as if they had no spaces, giving the sequence: *''Oak; Oak Hill; Oakley Park; Oakley River; Oak Ridge'' *:where ''Oak Ridge'' now comes after the ''Oakley'' strings, as it would if it were written "Oakridge". The second approach is the one usually taken in dictionaries, and it is thus often called '' dictionary order'' by
publishers Publishing is the activity of making information, literature, music, software and other content available to the public for sale or for free. Traditionally, the term refers to the creation and distribution of printed works, such as books, news ...
. The first approach has often been used in book indexes, although each publisher traditionally set its own standards for which approach to use therein; there was no ISO standard for book indexes ( ISO 999) before 1975.


Special cases


Modified letters

In French, modified letters (such as those with
diacritic A diacritic (also diacritical mark, diacritical point, diacritical sign, or accent) is a glyph added to a letter or to a basic glyph. The term derives from the Ancient Greek (, "distinguishing"), from (, "to distinguish"). The word ''diacrit ...
s) are treated the same as the base letter for alphabetical ordering purposes. For example, ''rôle'' comes between ''rock'' and ''rose'', as if it were written ''role''. However, languages that use such letters systematically generally have their own ordering rules. See below.


Ordering by surname

In most cultures where
family name In some cultures, a surname, family name, or last name is the portion of one's personal name that indicates one's family, tribe or community. Practices vary by culture. The family name may be placed at either the start of a person's full name, ...
s are written after
given name A given name (also known as a forename or first name) is the part of a personal name quoted in that identifies a person, potentially with a middle name as well, and differentiates that person from the other members of a group (typically a ...
s, it is still desired to sort lists of names (as in telephone directories) by family name first. In this case, names need to be reordered to be sorted correctly. For example, Juan Hernandes and Brian O'Leary should be sorted as "Hernandes, Juan" and "O'Leary, Brian" even if they are not written this way. Capturing this rule in a computer collation algorithm is complex, and simple attempts will fail. For example, unless the algorithm has at its disposal an extensive list of family names, there is no way to decide if "Gillian Lucille van der Waal" is "van der Waal, Gillian Lucille", "Waal, Gillian Lucille van der", or even "Lucille van der Waal, Gillian". Ordering by surname is frequently encountered in academic contexts. Within a single multi-author paper, ordering the authors alphabetically by surname, rather than by other methods such as reverse seniority or subjective degree of contribution to the paper, is seen as a way of "acknowledg ngsimilar contributions" or "avoid ngdisharmony in collaborating groups". The practice in certain fields of ordering
citation A citation is a reference to a source. More precisely, a citation is an abbreviated alphanumeric expression embedded in the body of an intellectual work that denotes an entry in the bibliographic references section of the work for the purpose o ...
s in bibliographies by the surnames of their authors has been found to create bias in favour of authors with surnames which appear earlier in the alphabet, while this effect does not appear in fields in which bibliographies are ordered chronologically.


''The'' and other common words

If a phrase begins with a very common word (such as "the", "a" or "an", called articles in grammar), that word is sometimes ignored or moved to the end of the phrase, but this is not always the case. For example, the book " The Shining" might be treated as "Shining", or "Shining, The" and therefore before the book title " Summer of Sam". However, it may also be treated as simply "The Shining" and after "Summer of Sam". Similarly, "
A Wrinkle in Time ''A Wrinkle in Time'' is a young adult science fantasy novel written by American author Madeleine L'Engle. First published in 1962, the book won the Newbery Medal, the Sequoyah Book Award, the Lewis Carroll Shelf Award, and was runner-up for ...
" might be treated as "Wrinkle in Time", "Wrinkle in Time, A", or "A Wrinkle in Time". All three alphabetization methods are fairly easy to create by algorithm, but many programs rely on simple lexicographic ordering instead.


''Mac'' prefixes

The prefixes ''M'' and ''Mc'' in Irish and Scottish surnames are abbreviations for ''Mac'' and are sometimes alphabetized as if the spelling is ''Mac'' in full. Thus ''McKinley'' might be listed before ''Mackintosh'' (as it would be if it had been spelled out as "MacKinley"). Since the advent of computer-sorted lists, this type of alphabetization is less frequently encountered, though it is still used in British telephone directories.


''St'' prefix

The prefix ''St'' or ''St.'' is an abbreviation of "Saint", and is traditionally alphabetized as if the spelling is ''Saint'' in full. Thus in a gazetteer ''St John's'' might be listed before ''Salem'' (as if it would be if it had been spelled out as "Saint John's"). Since the advent of computer-sorted lists, this type of alphabetization is less frequently encountered, though it is still sometimes used.


Ligatures

Ligatures (two or more letters merged into one symbol) which are not considered distinct letters, such as Æ and Œ in English, are typically collated as if the letters were separate—"æther" and "aether" would be ordered the same relative to all other words. This is true even when the ligature is not purely stylistic, such as in
loanword A loanword (also loan word or loan-word) is a word at least partly assimilated from one language (the donor language) into another language. This is in contrast to cognates, which are words in two or more languages that are similar because ...
s and brand names. Special rules may need to be adopted to sort strings which vary only by whether two letters are joined by a ligature.


Treatment of numerals

When some of the strings contain numerals (or other non-letter characters), various approaches are possible. Sometimes such characters are treated as if they came before or after all the letters of the alphabet. Another method is for numbers to be sorted alphabetically as they would be spelled: for example ''
1776 Events January–February * January 1 – American Revolutionary War – Burning of Norfolk: The town of Norfolk, Virginia is destroyed, by the combined actions of the British Royal Navy and occupying Patriot forces. * Januar ...
'' would be sorted as if spelled out "seventeen seventy-six", and '' 24 heures du Mans'' as if spelled "vingt-quatre..." (French for "twenty-four"). When numerals or other symbols are used as special graphical forms of letters, as ''1337'' for
leet Leet (or "1337"), also known as eleet or leetspeak, is a system of modified spellings used primarily on the Internet. It often uses character replacements in ways that play on the similarity of their glyphs via reflection or other resemblance ...
or the movie '' Seven'' (which was stylised as ''Se7en''), they may be sorted as if they were those letters. Natural sort order orders strings alphabetically, except that multi-digit numbers are treated as a single character and ordered by the value of the number encoded by the digits. In the case of
monarch A monarch is a head of stateWebster's II New College DictionarMonarch Houghton Mifflin. Boston. 2001. p. 707. Life tenure, for life or until abdication, and therefore the head of state of a monarchy. A monarch may exercise the highest authority ...
s and
pope The pope ( la, papa, from el, πάππας, translit=pappas, 'father'), also known as supreme pontiff ( or ), Roman pontiff () or sovereign pontiff, is the bishop of Rome (or historically the patriarch of Rome), head of the worldwide Cathol ...
s, although their numbers are in
Roman numerals Roman numerals are a numeral system that originated in ancient Rome and remained the usual way of writing numbers throughout Europe well into the Late Middle Ages. Numbers are written with combinations of letters from the Latin alphabet, ...
and resemble letters, they are normally arranged in numerical order: so, for example, even though V comes after I, the Danish king
Christian IX Christian IX (8 April 181829 January 1906) was King of Denmark from 1863 until his death in 1906. From 1863 to 1864, he was concurrently Duke of Schleswig, Holstein and Lauenburg. A younger son of Frederick William, Duke of Schleswig-Holstein ...
comes after his predecessor
Christian VIII Christian VIII (18 September 1786 – 20 January 1848) was King of Denmark from 1839 to 1848 and, as Christian Frederick, King of Norway in 1814. Christian Frederick was the eldest son of Hereditary Prince Frederick, a younger son of King Frederic ...
.


Language-specific conventions

Languages which use an
extended Latin alphabet The lists and tables below summarize and compare the letter inventories of some of the Latin-script alphabets. In this article, the scope of the word "alphabet" is broadened to include letters with tone marks, and other diacritics used to represe ...
generally have their own conventions for treatment of the extra letters. Also in some languages certain digraphs are treated as single letters for collation purposes. For example, the 29-letter alphabet of Spanish treats ''ñ'' as a basic letter following ''n'', and formerly treated the digraphs ''ch'' and ''ll'' as basic letters following ''c'' and ''l'', respectively. ''Ch'' and ''ll'' are still considered letters, but are now alphabetized as two-letter combinations. (The new alphabetization rule was issued by the
Royal Spanish Academy The Royal Spanish Academy ( es, Real Academia Española, generally abbreviated as RAE) is Spain's official royal institution with a mission to ensure the stability of the Spanish language. It is based in Madrid, Spain, and is affiliated with ...
in 1994.) On the other hand, the digraph ''rr'' follows ''rqu'' as expected, and did so even before the 1994 alphabetization rule. In a few cases, such as
Arabic Arabic (, ' ; , ' or ) is a Semitic language spoken primarily across the Arab world.Semitic languages: an international handbook / edited by Stefan Weninger; in collaboration with Geoffrey Khan, Michael P. Streck, Janet C. E.Watson; Walter ...
Kiowa Kiowa () people are a Native American tribe and an indigenous people of the Great Plains of the United States. They migrated southward from western Montana into the Rocky Mountains in Colorado in the 17th and 18th centuries,Pritzker 326 and e ...
, the alphabet has been completely reordered. Alphabetization rules applied in various languages are listed below. * In
Arabic Arabic (, ' ; , ' or ) is a Semitic language spoken primarily across the Arab world.Semitic languages: an international handbook / edited by Stefan Weninger; in collaboration with Geoffrey Khan, Michael P. Streck, Janet C. E.Watson; Walter ...
, there are two main orders of the 28 letter alphabet used today. The standard and most commonly used is the , which was coined by the early Arab linguist and features a visual ordering method where for example the letters baa, taa, Θaa ب ت ث are ordered base on shape of baa. The original
abjad An abjad (, ar, أبجد; also abgad) is a writing system in which only consonants are represented, leaving vowel sounds to be inferred by the reader. This contrasts with other alphabets, which provide graphemes for both consonants and vow ...
order, which phonethically resembles that of other
Semitic languages The Semitic languages are a branch of the Afroasiatic language family. They are spoken by more than 330 million people across much of West Asia, the Horn of Africa, and latterly North Africa, Malta, West Africa, Chad, and in large immigrant ...
as well as Latin, is still in use today, usually limited for ordering lists in a document, analogous to
Roman Numerals Roman numerals are a numeral system that originated in ancient Rome and remained the usual way of writing numbers throughout Europe well into the Late Middle Ages. Numbers are written with combinations of letters from the Latin alphabet, ...
. When the ''abjadiyya'' is used in numbering, a unique abstracted way of writing the letters must be used in order to distinguish those letters from three first letter of the sentence as well as from numbers. For example, the Alef "ا" which looks identical to the Hindi numeral one "١", a small oval loop extends clockwise of the letter's bottom, followed by a short tail. Although these characters are rarely used digitally, they have been recognized under
ASCII ASCII ( ), abbreviated from American Standard Code for Information Interchange, is a character encoding standard for electronic communication. ASCII codes represent text in computers, telecommunications equipment, and other devices. Because ...
as Arabic Mathematical Alphabet, with ranges from 1EE00 TO 1EEFF. There is a less common order, which is ordered phonetically , starting from the deep throat sound haa to the lip most meem. This ingenious oder was coined by Al-faraheedi. * In
Azerbaijani Azerbaijani may refer to: * Something of, or related to Azerbaijan * Azerbaijanis * Azerbaijani language See also * Azerbaijan (disambiguation) * Azeri (disambiguation) * Azerbaijani cuisine * Culture of Azerbaijan The culture of Azerbaijan ...
, there are eight additional letters to the standard Latin alphabet. Five of them are vowels: i, ı, ö, ü, ə and three are consonants: ç, ş, ğ. The alphabet is the same as the Turkish, with the same sounds written with the same letters, except for three additional letters: q, x and ə for sounds that do not exist in Turkish. Although all the "Turkish letters" are collated in their "normal" alphabetical order like in Turkish, the three extra letters are collated arbitrarily after letters whose sounds approach theirs. So, q is collated just after k, x (pronounced like a German ''ch'') is collated just after h and ə (pronounced roughly like an English short ''a'') is collated just after e. * In Breton, there is no "c", "q", "x" but there are the digraphs "ch" and "c'h", which are collated between "b" and "d". For example: « buzhugenn, chug, c'hoar, daeraouenn » (earthworm, juice, sister, teardrop). * In Czech and Slovak, accented vowels have secondary collating weight – compared to other letters, they are treated as their unaccented forms (in Czech, A-Á, E-É-Ě, I-Í, O-Ó, U-Ú-Ů, Y-Ý, and in Slovak, A-Á-Ä, E-É, I-Í, O-Ó-Ô, U-Ú, Y-Ý), but then they are sorted after the unaccented letters (for example, the correct lexicographic order is baa, baá, báa, báá, bab, báb, bac, bác, bač, báč n Czechand baa, baá, baä, báa, báá, báä, bäa, bäá, bää, bab, báb, bäb, bac, bác, bäc, bač, báč, bäč n Slovak. Accented consonants have primary collating weight and are collated immediately after their unaccented counterparts, with exception of Ď, Ň and Ť (in Czech) and Ď, Ĺ, Ľ, Ň, Ŕ and Ť (in Slovak), which have again secondary weight. CH is considered to be a separate letter and goes between H and I. In Slovak, DZ and are also considered separate letters and are positioned between Ď and E. * In the
Danish and Norwegian alphabet The Danish and Norwegian alphabets, together called the Dano-Norwegian alphabet, is the set of symbols, forming a variant of the Latin alphabet, used for writing the Danish and Norwegian languages. It has consisted of the following 29 letters si ...
s, the same extra vowels as in Swedish (see below) are also present but in a different order and with different
glyph A glyph () is any kind of purposeful mark. In typography, a glyph is "the specific shape, design, or representation of a character". It is a particular graphical representation, in a particular typeface, of an element of written language. A g ...
s (..., X, Y, Z, Æ, Ø, Å). Also, "Aa" collates as an equivalent to "Å". The Danish alphabet has traditionally seen "W" as a variant of "V", but today "W" is considered a separate letter. * In Dutch the combination IJ (representing IJ) was formerly to be collated as Y (or sometimes as a separate letter: Y < IJ < Z), but is currently mostly collated as 2 letters (II < IJ < IK). Exceptions are phone directories; IJ is always collated as Y here because in many Dutch family names Y is used where modern spelling would require IJ. Note that a word starting with ij that is written with a capital I is also written with a capital J, for example, the town
IJmuiden IJ_(digraph).html" ;"title="n IJ (digraph)">n IJ (digraph) and that should remain the only places where they are used. > IJmuiden () is a port city in the Netherlands, Dutch province of North Holland. It is the main town in the municipality ...
, the river IJssel and the country IJsland (
Iceland Iceland ( is, Ísland; ) is a Nordic island country in the North Atlantic Ocean and in the Arctic Ocean. Iceland is the most sparsely populated country in Europe. Iceland's capital and largest city is Reykjavík, which (along with its ...
). * In
Esperanto Esperanto ( or ) is the world's most widely spoken constructed international auxiliary language. Created by the Warsaw-based ophthalmologist L. L. Zamenhof in 1887, it was intended to be a universal second language for international communic ...
, consonants with
circumflex The circumflex () is a diacritic in the Latin and Greek scripts that is also used in the written forms of many languages and in various romanization and transcription schemes. It received its English name from la, circumflexus "bent around" ...
accents ( ĉ, ĝ, ĥ, ĵ, ŝ), as well as ŭ (u with
breve A breve (, less often , neuter form of the Latin "short, brief") is the diacritic mark ˘, shaped like the bottom half of a circle. As used in Ancient Greek, it is also called , . It resembles the caron (the wedge or in Czech, in S ...
), are counted as separate letters and collated separately (c, ĉ, d, e, f, g, ĝ, h, ĥ, i, j, ĵ ... s, ŝ, t, u, ŭ, v, z). * In
Estonian Estonian may refer to: * Something of, from, or related to Estonia, a country in the Baltic region in northern Europe * Estonians, people from Estonia, or of Estonian descent * Estonian language * Estonian cuisine * Estonian culture See also * ...
õ, ä, ö and ü are considered separate letters and collate after w. Letters š, z and ž appear in loanwords and foreign proper names only and follow the letter s in the Estonian alphabet, which otherwise does not differ from the basic Latin alphabet. * The
Faroese alphabet Faroese orthography is the method employed to write the Faroese language, using a 29-letter Latin alphabet. Alphabet The Faroese alphabet consists of 29 letters derived from the Latin script: * Eth (Faroese ') never appears at the beginning ...
also has some of the Danish, Norwegian, and Swedish extra letters, namely Æ and Ø. Furthermore, the
Faroese alphabet Faroese orthography is the method employed to write the Faroese language, using a 29-letter Latin alphabet. Alphabet The Faroese alphabet consists of 29 letters derived from the Latin script: * Eth (Faroese ') never appears at the beginning ...
uses the Icelandic eth, which follows the D. Five of the six vowels A, I, O, U and Y can get accents and are after that considered separate letters. The consonants C, Q, X, W and Z are not found. Therefore, the first five letters are A, Á, B, D and Ð, and the last five are V, Y, Ý, Æ, Ø * In Filipino (Tagalog) and other Philippine languages, the letter Ng is treated as a separate letter. It is pronounced as in ''sing'', ''ping-pong'', etc. By itself, it is pronounced ''nang'', but in general Filipino orthography, it is spelled as if it were two separate letters (n and g). Also, letter derivatives (such as Ñ) immediately follow the base letter. Filipino also is written with diacritics, but their use is very rare (except the
tilde The tilde () or , is a grapheme with several uses. The name of the character came into English from Spanish, which in turn came from the Latin '' titulus'', meaning "title" or "superscription". Its primary use is as a diacritic (accent) i ...
). * The Finnish alphabet and collating rules are the same as those of Swedish. * For
French French (french: français(e), link=no) may refer to: * Something of, from, or related to France ** French language, which originated in France, and its various dialects and accents ** French people, a nation and ethnic group identified with Franc ...
, the ''last'' accent in a given word determines the order. For example, in French, the following four words would be sorted this way: cote < côte < coté < côté. * In
German German(s) may refer to: * Germany (of or related to) **Germania (historical use) * Germans, citizens of Germany, people of German ancestry, or native speakers of the German language ** For citizens of Germany, see also German nationality law **Ge ...
letters with umlaut ( Ä, Ö, Ü) are treated generally just like their non-umlauted versions; ß is always sorted as ss. This makes the alphabetic order Arbeit, Arg, Ärgerlich, Argument, Arm, Assistant, Aßlar, Assoziation. For phone directories and similar lists of names, the umlauts are to be collated like the letter combinations "ae", "oe", "ue" because a number of German surnames appear both with umlaut and in the non-umlauted form with "e" (Müller/Mueller). This makes the alphabetic order Udet, Übelacker, Uell, Ülle, Ueve, Üxküll, Uffenbach. * The Hungarian vowels have accents, umlauts, and double accents, while consonants are written with single, double (digraphs) or triple (trigraph) characters. In collating, accented vowels are equivalent with their non-accented counterparts and double and triple characters follow their single originals. Hungarian alphabetic order is: A=Á, B, C, Cs, D, Dz, Dzs, E=É, F, G, Gy, H, I=Í, J, K, L, Ly, M, N, Ny, O=Ó, Ö=Ő, P, Q, R, S, Sz, T, Ty, U=Ú, Ü=Ű, V, W, X, Y, Z, Zs. (Before 1984, ''dz'' and ''dzs'' were not considered single letters for collation, but two letters each, d+z and d+zs instead.) It means that e.g. ''nádcukor'' should precede ''nádcsomó'' (even though ''s'' normally precedes ''u''), since ''c'' precedes ''cs'' in the collation. Difference in vowel length should only be taken into consideration if the two words are otherwise identical (e.g. ''egér, éger''). Spaces and hyphens within phrases are ignored in collation. ''Ch'' also occurs as a digraph in certain words but it is not considered as a grapheme on its own right in terms of collation. *:A particular feature of Hungarian collation is that contracted forms of double di- and trigraphs (such as from ''gy + gy'' or from ''dzs + dzs'') should be collated as if they were written in full (independently of the fact of the contraction and the elements of the di- or trigraphs). For example, ''kaszinó'' should precede ''kassza'' (even though the fourth character ''z'' would normally come after ''s'' in the alphabet), because the fourth "character" (
grapheme In linguistics, a grapheme is the smallest functional unit of a writing system. The word ''grapheme'' is derived and the suffix ''-eme'' by analogy with ''phoneme'' and other names of emic units. The study of graphemes is called '' graphemi ...
) of the word ''kassza'' is considered a second ''sz'' (decomposing ''ssz'' into ''sz + sz''), which does follow ''i'' (in ''kaszinó''). * In Icelandic, Þ is added, and D is followed by Ð. Each vowel (A, E, I, O, U, Y) is followed by its correspondent with
acute Acute may refer to: Science and technology * Acute angle ** Acute triangle ** Acute, a leaf shape in the glossary of leaf morphology * Acute (medicine), a disease that it is of short duration and of recent onset. ** Acute toxicity, the adverse ef ...
: Á, É, Í, Ó, Ú, Ý. There is no Z, so the alphabet ends: ... X, Y, Ý, Þ, Æ, Ö. ** Both letters were also used by
Anglo-Saxon The Anglo-Saxons were a cultural group who inhabited England in the Early Middle Ages. They traced their origins to settlers who came to Britain from mainland Europe in the 5th century. However, the ethnogenesis of the Anglo-Saxons happened wit ...
scribes who also used the Runic letter
Wynn Wynn or wyn (; also spelled wen, ƿynn, and ƿen) is a letter of the Old English alphabet, where it is used to represent the sound . History The letter "W" While the earliest Old English texts represent this phoneme with the digraph , ...
to represent /w/. ** Þ (called thorn; lowercase þ) is also a Runic letter. ** Ð (called eth; lowercase ð) is the letter D with an added stroke. *
Kiowa Kiowa () people are a Native American tribe and an indigenous people of the Great Plains of the United States. They migrated southward from western Montana into the Rocky Mountains in Colorado in the 17th and 18th centuries,Pritzker 326 and e ...
is ordered on phonetic principles, like the
Brahmic scripts The Brahmic scripts, also known as Indic scripts, are a family of abugida writing systems. They are used throughout the Indian subcontinent, Southeast Asia and parts of East Asia. They are descended from the Brahmi script of ancient In ...
, rather than on the historical Latin order. Vowels come first, then stop consonants ordered from the front to the back of the mouth, and from negative to positive
voice-onset time In phonetics, voice onset time (VOT) is a feature of the production of stop consonants. It is defined as the length of time that passes between the release of a stop consonant and the onset of voicing, the vibration of the vocal folds, or, acco ...
, then the affricates, fricatives, liquids, and nasals: :: A, AU, E, I, O, U, B, F, P, V, D, J, T, TH, G, C, K, Q, CH, X, S, Z, L, Y, W, H, M, N * In Lithuanian, specifically Lithuanian letters go after their Latin originals. Another change is that Y comes just before J: ... G, H, I, Į, Y, J, K... * In Polish, specifically Polish letters derived from the Latin alphabet are collated after their originals: A, Ą, B, C, Ć, D, E, Ę, ..., L, Ł, M, N, Ń, O, Ó, P, ..., S, Ś, T, ..., Z, Ź, Ż. The digraphs for collation purposes are treated as if they were two separate letters. * In Portuguese, the collating order is just like in English: A, B, C, D, E, F, G, H, I, J, K, L, M, N, O, P, Q, R, S, T, U, V, W, X, Y, Z. Digraphs and letters with diacritics are not included in the alphabet. * In
Romanian Romanian may refer to: *anything of, from, or related to the country and nation of Romania ** Romanians, an ethnic group **Romanian language, a Romance language ***Romanian dialects, variants of the Romanian language **Romanian cuisine, traditiona ...
, special characters derived from the Latin alphabet are collated after their originals: A, Ă, Â, ..., I, Î, ..., S, Ș, T, Ț, ..., Z. * In
Serbo-Croatian Serbo-Croatian () – also called Serbo-Croat (), Serbo-Croat-Bosnian (SCB), Bosnian-Croatian-Serbian (BCS), and Bosnian-Croatian-Montenegrin-Serbian (BCMS) – is a South Slavic language and the primary language of Serbia, Croatia, Bosnia an ...
and other related South Slavic languages, the five accented characters and three conjoined characters are sorted after the originals: ..., C, Č, Ć, D, DŽ, Đ, E, ..., L, LJ, M, N, NJ, O, ..., S, Š, T, ..., Z, Ž. * Spanish treated (until 1994) "CH" and "LL" as single letters, giving an ordering of '', , '' and '', , .'' This is not true any more since in 1994 the RAE adopted the more conventional usage, and now LL is collated between LK and LM, and CH between CG and CI. The six characters with diacritics Á, É, Í, Ó, Ú, Ü are treated as the original letters A, E, I, O, U, for example: '', , , , .'' The only Spanish-specific collating question is Ñ () as a different letter collated after N. * In the
Swedish alphabet The Swedish alphabet ( sv, Svenska alfabetet) is a basic element of the Latin writing system used for the Swedish language. The 29 letters of this alphabet are the modern 26-letter basic Latin alphabet (A through Z) plus Å, Ä, and Ö, in ...
, there are three extra
vowel A vowel is a syllabic speech sound pronounced without any stricture in the vocal tract. Vowels are one of the two principal classes of speech sounds, the other being the consonant. Vowels vary in quality, in loudness and also in quantity (len ...
s placed at its end (..., X, Y, Z, Å, Ä, Ö), similar to the Danish and Norwegian alphabet, but with different glyphs and a different collating order. The letter "W" has been treated as a variant of "V", but in the 13th edition of '' Svenska Akademiens ordlista'' (2006) "W" was considered a separate letter. * In the
Turkish alphabet The Turkish alphabet ( tr, ) is a Latin-script alphabet used for writing the Turkish language, consisting of 29 letters, seven of which ( Ç, Ğ, I, İ, Ö, Ş and Ü) have been modified from their Latin originals for the phonetic requir ...
there are 6 additional letters: ç, ğ, ı, ö, ş, and ü (but no q, w, and x). They are collated with ç after c, ğ after g, ı ''before'' i, ö after o, ş after s, and ü after u. Originally, when the alphabet was introduced in 1928, ı was collated after i, but the order was changed later so that letters having shapes containing dots, cedilles or other adorning marks always follow the letters with corresponding bare shapes. Note that in Turkish orthography the letter I is the majuscule of dotless ı, whereas İ is the majuscule of dotted i. * In many
Turkic languages The Turkic languages are a language family of over 35 documented languages, spoken by the Turkic peoples of Eurasia from Eastern Europe and Southern Europe to Central Asia, East Asia, North Asia ( Siberia), and Western Asia. The Turki ...
(such as
Azeri Azerbaijanis (; az, Azərbaycanlılar, ), Azeris ( az, Azərilər, ), or Azerbaijani Turks ( az, Azərbaycan Türkləri, ) are a Turkic people living mainly in northwestern Iran and the Republic of Azerbaijan. They are the second-most nume ...
or the Jaꞑalif orthography for
Tatar The Tatars ()Tatar
in the Collins English Dictionary
is an umbrella term for different
), there used to be the letter Gha (Ƣƣ), which came between G and H. It is now in disuse. * In Vietnamese, there are 7 additional letters: ă, â, đ, ê, ô, ơ, ư while f, j, w, z are absent, even though they are still in some use (like Internet address, foreign loan language). "f" is replaced by the combination "ph". The same as for "w" is "qu". * In
Volapük Volapük (; , "Language of the World", or lit. "World Speak") is a constructed language created between 1879 and 1880 by Johann Martin Schleyer, a Catholic priest in Baden, Germany, who believed that God had told him in a dream to create an ...
ä, ö and ü are counted as separate letters and collated separately (a, ä, b ... o, ö, p ... u, ü, v) while q and w are absent. * In
Welsh Welsh may refer to: Related to Wales * Welsh, referring or related to Wales * Welsh language, a Brittonic Celtic language spoken in Wales * Welsh people People * Welsh (surname) * Sometimes used as a synonym for the ancient Britons (Celtic peopl ...
the digraphs CH, DD, FF, NG, LL, PH, RH, and TH are treated as single letters, and each is listed after the first character of the pair (except for NG which is listed after G), producing the order A, B, C, CH, D, DD, E, F, FF, G, NG, H, and so on. It can sometimes happen, however, that word compounding results in the juxtaposition of two letters which do ''not'' form a digraph. An example is the word LLONGYFARCH (composed from LLON + GYFARCH). This results in such an ordering as, for example, LAWR, LWCUS, LLONG, LLOM, LLONGYFARCH (NG is a digraph in LLONG, but not in LLONGYFARCH). The letter combination R+H (as distinct from the digraph RH) may similarly arise by juxtaposition in compounds, although this tends not to produce any pairs in which misidentification could affect the ordering. For the other potentially confusing letter combinations that may occur – namely, D+D and L+L – a hyphen is used in the spelling (e.g. AD-DAL, CHWIL-LYS).


Automation

Collation algorithms (in combination with
sorting algorithm In computer science, a sorting algorithm is an algorithm that puts elements of a list into an order. The most frequently used orders are numerical order and lexicographical order, and either ascending or descending. Efficient sorting is important ...
s) are used in computer programming to place strings in alphabetical order. A standard example is the Unicode Collation Algorithm, which can be used to put strings containing any
Unicode Unicode, formally The Unicode Standard,The formal version reference is is an information technology standard for the consistent encoding, representation, and handling of text expressed in most of the world's writing systems. The standard, ...
symbols into (an extension of) alphabetical order. It can be made to conform to most of the language-specific conventions described above by tailoring its default collation table. Several such tailorings are collected in
Common Locale Data Repository The Common Locale Data Repository Project, often abbreviated as CLDR, is a project of the Unicode Consortium to provide locale data in XML format for use in computer applications. CLDR contains locale-specific information that an operating sys ...
.


Similar orderings

The principle behind alphabetical ordering can still be applied in languages that do not strictly speaking use an
alphabet An alphabet is a standardized set of basic written graphemes (called letters) that represent the phonemes of certain spoken languages. Not all writing systems represent language in this way; in a syllabary, each character represents a syllab ...
– for example, they may be written using a
syllabary In the linguistic study of written languages, a syllabary is a set of written symbols that represent the syllables or (more frequently) moras which make up words. A symbol in a syllabary, called a syllabogram, typically represents an (option ...
or
abugida An abugida (, from Ge'ez: ), sometimes known as alphasyllabary, neosyllabary or pseudo-alphabet, is a segmental writing system in which consonant-vowel sequences are written as units; each unit is based on a consonant letter, and vowel no ...
– provided the symbols used have an established ordering. For
logograph In a written language, a logogram, logograph, or lexigraph is a written character that represents a word or morpheme. Chinese characters (pronounced ''hanzi'' in Mandarin, '' kanji'' in Japanese, '' hanja'' in Korean) are generally logograms, ...
ic writing systems, such as Chinese
hanzi Chinese characters () are logograms developed for the writing of Chinese. In addition, they have been adapted to write other East Asian languages, and remain a key component of the Japanese writing system where they are known as ''kanji' ...
or Japanese
kanji are the logographic Chinese characters taken from the Chinese script and used in the writing of Japanese. They were made a major part of the Japanese writing system during the time of Old Japanese and are still used, along with the subsequ ...
, the method of radical-and-stroke sorting is frequently used as a way of defining an ordering on the symbols. Japanese sometimes uses pronunciation order, most commonly with the Gojūon order but sometimes with the older
Iroha The is a Japanese poem. Originally the poem was attributed to the founder of the Shingon Esoteric sect of Buddhism in Japan, Kūkai, but more modern research has found the date of composition to be later in the Heian period (794–1179). Th ...
ordering. In mathematics,
lexicographical order In mathematics, the lexicographic or lexicographical order (also known as lexical order, or dictionary order) is a generalization of the alphabetical order of the dictionaries to sequences of ordered symbols or, more generally, of elements of ...
is a means of ordering sequences in a manner analogous to that used to produce alphabetical order. Some computer applications use a version of alphabetical order that can be achieved using a very simple
algorithm In mathematics and computer science, an algorithm () is a finite sequence of rigorous instructions, typically used to solve a class of specific problems or to perform a computation. Algorithms are used as specifications for performing ...
, based purely on the
ASCII ASCII ( ), abbreviated from American Standard Code for Information Interchange, is a character encoding standard for electronic communication. ASCII codes represent text in computers, telecommunications equipment, and other devices. Because ...
or
Unicode Unicode, formally The Unicode Standard,The formal version reference is is an information technology standard for the consistent encoding, representation, and handling of text expressed in most of the world's writing systems. The standard, ...
codes for characters. This may have non-standard effects such as placing all capital letters before lower-case ones. See ASCIIbetical order. A rhyming dictionary is based on sorting words in alphabetical order starting from the last to the first letter of the word.


See also

*
Collation Collation is the assembly of written information into a standard order. Many systems of collation are based on numerical order or alphabetical order, or extensions and combinations thereof. Collation is a fundamental element of most office filin ...
*
Sorting Sorting refers to ordering data in an increasing or decreasing manner according to some linear relationship among the data items. # ordering: arranging items in a sequence ordered by some criterion; # categorizing: grouping items with similar pro ...


References


Further reading

* Chauvin, Yvonne. ''Pratique du classement alphabétique''. 4th ed. Paris: Bordas, 1977. * Flanders, Judith. ''A Place for Everything: The Curious History of Alphabetical Order''. New York: Basic Books / Hatchette Books, 2020. {{Authority control Alphabets Collation ta:அகரவரிசை