HOME

TheInfoList



OR:

The Unicode collation algorithm (UCA) is an algorithm defined in Unicode Technical Report #10, which is a customizable method to produce binary keys from strings representing text in any
writing system A writing system is a method of visually representing verbal communication, based on a script and a set of rules regulating its use. While both writing and speech are useful in conveying messages, writing differs in also being a reliable fo ...
and
language Language is a structured system of communication. The structure of a language is its grammar and the free components are its vocabulary. Languages are the primary means by which humans communicate, and may be conveyed through a variety of ...
that can be represented with
Unicode Unicode, formally The Unicode Standard,The formal version reference is is an information technology standard for the consistent encoding, representation, and handling of text expressed in most of the world's writing systems. The standard, ...
. These keys can then be efficiently byte-by-byte compared in order to
collate Collation is the assembly of written information into a standard order. Many systems of collation are based on numerical order or alphabetical order, or extensions and combinations thereof. Collation is a fundamental element of most office filin ...
or sort them according to the rules of the language, with options for ignoring case, accents, etc. Unicode Technical Report #10 also specifies the ''Default Unicode Collation Element Table'' (DUCET). This data file specifies a default collation ordering. The DUCET is customizable for different languages. Some such customisations can be found in the Unicode
Common Locale Data Repository The Common Locale Data Repository Project, often abbreviated as CLDR, is a project of the Unicode Consortium to provide locale data in XML format for use in computer applications. CLDR contains locale-specific information that an operating sys ...
(CLDR). An open source implementation of UCA is included with the
International Components for Unicode International Components for Unicode (ICU) is an open-source project of mature C/ C++ and Java libraries for Unicode support, software internationalization, and software globalization. ICU is widely portable to many operating systems and environ ...
, ICU. ICU supports tailoring, and the collation tailorings from CLDR are included in ICU. The effects of tailoring and many language-specific tailorings are displayed in the on-line ICU Locale Explorer.


See also

*
Collation Collation is the assembly of written information into a standard order. Many systems of collation are based on numerical order or alphabetical order, or extensions and combinations thereof. Collation is a fundamental element of most office filin ...
*
ISO/IEC 14651 'ISO/IEC 14651:2016'', ''Information technology -- International string ordering and comparison -- Method for comparing character strings and description of the common template tailorable ordering'', is an ISO/IEC standard specifying an algorithm ...
*
European ordering rules The European ordering rules (EOR / EN 13710), define an ordering for strings written in languages that are written with the Latin, Greek and Cyrillic alphabets. The standard covers languages used by the European Union, the European Free Trade A ...
(EOR) *
Common Locale Data Repository The Common Locale Data Repository Project, often abbreviated as CLDR, is a project of the Unicode Consortium to provide locale data in XML format for use in computer applications. CLDR contains locale-specific information that an operating sys ...
(CLDR)


External links


Unicode Collation Algorithm
Unicode Technical Standard #10
Mimer SQL Unicode Collation Charts


Tools


ICU Locale Explorer
ink broken as of 2021-10-10An online demonstration of the Unicode Collation Algorithm using
International Components for Unicode International Components for Unicode (ICU) is an open-source project of mature C/ C++ and Java libraries for Unicode support, software internationalization, and software globalization. ICU is widely portable to many operating systems and environ ...

an ICU collation demo
that's still up as of 2021-10-10

A sort program that provides an unusual level of flexibility in defining collations and extracting keys. String collation algorithms
Collation Collation is the assembly of written information into a standard order. Many systems of collation are based on numerical order or alphabetical order, or extensions and combinations thereof. Collation is a fundamental element of most office filin ...
Collation {{standard-stub