Unicode Collation Algorithm
   HOME

TheInfoList



OR:

The Unicode collation algorithm (UCA) is an algorithm defined in Unicode Technical Report #10, which is a customizable method to produce binary keys from
strings String or strings may refer to: *String (structure), a long flexible structure made from threads twisted together, which is used to tie, bind, or hang other objects Arts, entertainment, and media Films * ''Strings'' (1991 film), a Canadian anim ...
representing text in any
writing system A writing system is a method of visually representing verbal communication, based on a script and a set of rules regulating its use. While both writing and speech are useful in conveying messages, writing differs in also being a reliable form ...
and
language Language is a structured system of communication. The structure of a language is its grammar and the free components are its vocabulary. Languages are the primary means by which humans communicate, and may be conveyed through a variety of met ...
that can be represented with
Unicode Unicode, formally The Unicode Standard,The formal version reference is is an information technology Technical standard, standard for the consistent character encoding, encoding, representation, and handling of Character (computing), text expre ...
. These keys can then be efficiently byte-by-byte compared in order to collate or sort them according to the rules of the language, with options for ignoring case, accents, etc. Unicode Technical Report #10 also specifies the ''Default Unicode Collation Element Table'' (DUCET). This data file specifies a default collation ordering. The DUCET is customizable for different languages. Some such customisations can be found in the Unicode
Common Locale Data Repository The Common Locale Data Repository Project, often abbreviated as CLDR, is a project of the Unicode Consortium to provide locale data in XML format for use in computer applications. CLDR contains locale-specific information that an operating syst ...
(CLDR). An open source implementation of UCA is included with the
International Components for Unicode International Components for Unicode (ICU) is an open-source project of mature C/ C++ and Java libraries for Unicode support, software internationalization, and software globalization. ICU is widely portable to many operating systems and environ ...
, ICU. ICU supports tailoring, and the collation tailorings from CLDR are included in ICU. The effects of tailoring and many language-specific tailorings are displayed in the on-line ICU Locale Explorer.


See also

*
Collation Collation is the assembly of written information into a standard order. Many systems of collation are based on numerical order or alphabetical order, or extensions and combinations thereof. Collation is a fundamental element of most office fili ...
* ISO/IEC 14651 * European ordering rules (EOR) *
Common Locale Data Repository The Common Locale Data Repository Project, often abbreviated as CLDR, is a project of the Unicode Consortium to provide locale data in XML format for use in computer applications. CLDR contains locale-specific information that an operating syst ...
(CLDR)


External links


Unicode Collation Algorithm
Unicode Technical Standard #10
Mimer SQL Unicode Collation Charts


Tools


ICU Locale Explorer
ink broken as of 2021-10-10 Ink is a gel, Sol (colloid), sol, or Solution (chemistry), solution that contains at least one colorant, such as a dye or pigment, and is used to color a surface to produce an image, writing, text, or design. Ink is used for drawing or writing w ...
An online demonstration of the Unicode Collation Algorithm using
International Components for Unicode International Components for Unicode (ICU) is an open-source project of mature C/ C++ and Java libraries for Unicode support, software internationalization, and software globalization. ICU is widely portable to many operating systems and environ ...

an ICU collation demo
that's still up as of 2021-10-10

A sort program that provides an unusual level of flexibility in defining collations and extracting keys. String collation algorithms
Collation Collation is the assembly of written information into a standard order. Many systems of collation are based on numerical order or alphabetical order, or extensions and combinations thereof. Collation is a fundamental element of most office fili ...
Collation {{standard-stub