Unicode Collation Algorithm
   HOME
*





Unicode Collation Algorithm
The Unicode collation algorithm (UCA) is an algorithm defined in Unicode Technical Report #10, which is a customizable method to produce binary keys from strings representing text in any writing system and language that can be represented with Unicode. These keys can then be efficiently byte-by-byte compared in order to collate or sort them according to the rules of the language, with options for ignoring case, accents, etc. Unicode Technical Report #10 also specifies the ''Default Unicode Collation Element Table'' (DUCET). This data file specifies a default collation ordering. The DUCET is customizable for different languages. Some such customisations can be found in the Unicode Common Locale Data Repository (CLDR). An open source implementation of UCA is included with the International Components for Unicode, ICU. ICU supports tailoring, and the collation tailorings from CLDR are included in ICU. The effects of tailoring and many language-specific tailorings are displayed in the ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

String (computer Science)
In computer programming, a string is traditionally a sequence of characters, either as a literal constant or as some kind of variable. The latter may allow its elements to be mutated and the length changed, or it may be fixed (after creation). A string is generally considered as a data type and is often implemented as an array data structure of bytes (or words) that stores a sequence of elements, typically characters, using some character encoding. ''String'' may also denote more general arrays or other sequence (or list) data types and structures. Depending on the programming language and precise data type used, a variable declared to be a string may either cause storage in memory to be statically allocated for a predetermined maximum length or employ dynamic allocation to allow it to hold a variable number of elements. When a string appears literally in source code, it is known as a string literal or an anonymous string. In formal languages, which are used in mathematical ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

Writing System
A writing system is a method of visually representing verbal communication, based on a script and a set of rules regulating its use. While both writing and speech are useful in conveying messages, writing differs in also being a reliable form of information storage and transfer. Writing systems require shared understanding between writers and readers of the meaning behind the sets of characters that make up a script. Writing is usually recorded onto a durable medium, such as paper or electronic storage, although non-durable methods may also be used, such as writing on a computer display, on a blackboard, in sand, or by skywriting. Reading a text can be accomplished purely in the mind as an internal process, or expressed orally. Writing systems can be placed into broad categories such as alphabets, syllabaries, or logographies, although any particular system may have attributes of more than one category. In the alphabetic category, a standard set of letters represent speech ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

Language
Language is a structured system of communication. The structure of a language is its grammar and the free components are its vocabulary. Languages are the primary means by which humans communicate, and may be conveyed through a variety of methods, including spoken, sign, and written language. Many languages, including the most widely-spoken ones, have writing systems that enable sounds or signs to be recorded for later reactivation. Human language is highly variable between cultures and across time. Human languages have the properties of productivity and displacement, and rely on social convention and learning. Estimates of the number of human languages in the world vary between and . Precise estimates depend on an arbitrary distinction (dichotomy) established between languages and dialects. Natural languages are spoken, signed, or both; however, any language can be encoded into secondary media using auditory, visual, or tactile stimuli – for example, writing, whi ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

Unicode
Unicode, formally The Unicode Standard,The formal version reference is is an information technology Technical standard, standard for the consistent character encoding, encoding, representation, and handling of Character (computing), text expressed in most of the world's writing systems. The standard, which is maintained by the Unicode Consortium, defines as of the current version (15.0) 149,186 characters covering 161 modern and historic script (Unicode), scripts, as well as symbols, emoji (including in colors), and non-visual control and formatting codes. Unicode's success at unifying character sets has led to its widespread and predominant use in the internationalization and localization of computer software. The standard has been implemented in many recent technologies, including modern operating systems, XML, and most modern programming languages. The Unicode character repertoire is synchronized with Universal Coded Character Set, ISO/IEC 10646, each being code-for-code id ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


Collate
Collation is the assembly of written information into a standard order. Many systems of collation are based on numerical order or alphabetical order, or extensions and combinations thereof. Collation is a fundamental element of most office filing systems, library catalogs, and reference books. Collation differs from ''classification'' in that the classes themselves are not necessarily ordered. However, even if the order of the classes is irrelevant, the identifiers of the classes may be members of an ordered set, allowing a sorting algorithm to arrange the items by class. Formally speaking, a collation method typically defines a total order on a set of possible identifiers, called sort keys, which consequently produces a total preorder on the set of items of information (items with the same identifier are not placed in any defined order). A collation algorithm such as the Unicode collation algorithm defines an order through the process of comparing two given character strings ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


Common Locale Data Repository
The Common Locale Data Repository Project, often abbreviated as CLDR, is a project of the Unicode Consortium to provide locale data in XML format for use in computer applications. CLDR contains locale-specific information that an operating system will typically provide to applications. Among the types of data that CLDR includes are the following: * Translations for language names * Translations for territory and country names * Translations for currency names, including singular/plural modifications * Translations for weekday, month, era, period of day, in full and abbreviated forms * Translations for time zones and example cities (or similar) for time zones * Translations for calendar fields * Patterns for formatting/parsing dates or times of day * Exemplar sets of characters used for writing the language * Patterns for formatting/parsing numbers * Rules for language-adapted collation * Rules for spelling out numbers as words * Rules for formatting numbers in traditional num ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


International Components For Unicode
International Components for Unicode (ICU) is an open-source project of mature C/C++ and Java libraries for Unicode support, software internationalization, and software globalization. ICU is widely portable to many operating systems and environments. It gives applications the same results on all platforms and between C, C++, and Java software. The ICU project is a technical committee of the Unicode Consortium and sponsored, supported, and used by IBM and many other companies. ICU provides the following services: Unicode text handling, full character properties, and character set conversions; Unicode regular expressions; full Unicode sets; character, word, and line boundaries; language-sensitive collation and searching; normalization, upper and lowercase conversion, and script transliterations; comprehensive locale data and resource bundle architecture via the Common Locale Data Repository (CLDR); multiple calendars and time zones; and rule-based formatting and parsing of dates ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  




Collation
Collation is the assembly of written information into a standard order. Many systems of collation are based on numerical order or alphabetical order, or extensions and combinations thereof. Collation is a fundamental element of most office filing systems, library catalogs, and reference books. Collation differs from ''classification'' in that the classes themselves are not necessarily ordered. However, even if the order of the classes is irrelevant, the identifiers of the classes may be members of an ordered set, allowing a sorting algorithm to arrange the items by class. Formally speaking, a collation method typically defines a total order on a set of possible identifiers, called sort keys, which consequently produces a total preorder on the set of items of information (items with the same identifier are not placed in any defined order). A collation algorithm such as the Unicode collation algorithm defines an order through the process of comparing two given character string ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


ISO/IEC 14651
'ISO/IEC 14651:2016'', ''Information technology -- International string ordering and comparison -- Method for comparing character strings and description of the common template tailorable ordering'', is an ISO/IEC standard specifying an algorithm that can be used when comparing two strings. This comparison can be used when collating a set of strings. The standard also specifies a datafile specifying the comparison order, the ''Common Tailorable Template'', CTT. The comparison order is supposed to be tailored for different languages (hence the CTT is regarded as a ''template'' and not a default, though the empty tailoring, not changing any weighting, is appropriate in many cases), since different languages have incompatible ordering requirements. One such tailoring is European ordering rules (EOR), which in turn is supposed to be tailored for different European languages. The ''Common Tailorable Template'' (''CTT'') datafile of this ISO/IEC standard is aligned with the ''Default Un ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

European Ordering Rules
The European ordering rules (EOR / EN 13710), define an ordering for strings written in languages that are written with the Latin, Greek and Cyrillic alphabets. The standard covers languages used by the European Union, the European Free Trade Association, and parts of the former Soviet Union. It is a tailoring of the ''Common Tailorable Template'' of ISO/IEC 14651. EOR can in turn be tailored for different (European) languages. But in inter-European contexts, EOR can be used without further tailoring. Method Just as for ISO/IEC 14651, upon which EOR is based, EOR has 4 levels of weights. Level 1 sorts the letters. The following Latin letters are concerned by this level, in order: :a b c d ð e ə ɛ f g h i j k l m n o ɔ p q r s ɯ t u v w x y z þ æ The Greek alphabet has the following order: :α β γ δ ε Ϝ Ϛ ζ η θ ι κ λ μ ν ξ ο π Ϟ ρ σ τ υ φ χ ψ ω Ϡ Cyrillic script has the following order: :а ӑ ӓ ә ӛ ӕ б в г ғ ҕ ґ д ђ ҙ е ӗ ё ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


String Collation Algorithms
String or strings may refer to: *String (structure), a long flexible structure made from threads twisted together, which is used to tie, bind, or hang other objects Arts, entertainment, and media Films * ''Strings'' (1991 film), a Canadian animated short * ''Strings'' (2004 film), a film directed by Anders Rønnow Klarlund * ''Strings'' (2011 film), an American dramatic thriller film * ''Strings'' (2012 film), a British film by Rob Savage * ''Bravetown'' (2015 film), an American drama film originally titled ''Strings'' * ''The String'' (2009), a French film Music Instruments * String (music), the flexible element that produces vibrations and sound in string instruments * String instrument, a musical instrument that produces sound through vibrating strings ** List of string instruments * String piano, a pianistic extended technique in which sound is produced by direct manipulation of the strings, rather than striking the piano's keys Types of groups * String band, musical ens ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

Unicode Algorithms
Unicode, formally The Unicode Standard,The formal version reference is is an information technology standard for the consistent encoding, representation, and handling of text expressed in most of the world's writing systems. The standard, which is maintained by the Unicode Consortium, defines as of the current version (15.0) 149,186 characters covering 161 modern and historic scripts, as well as symbols, emoji (including in colors), and non-visual control and formatting codes. Unicode's success at unifying character sets has led to its widespread and predominant use in the internationalization and localization of computer software. The standard has been implemented in many recent technologies, including modern operating systems, XML, and most modern programming languages. The Unicode character repertoire is synchronized with ISO/IEC 10646, each being code-for-code identical with the other. ''The Unicode Standard'', however, includes more than just the base code. Alongside the ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]