Unicode Collation Algorithm
__NOTOC__ The Unicode collation algorithm (UCA) is an algorithm defined in Unicode Technical Report #10, which is a customizable method to produce binary keys from strings representing text in any writing system and language that can be represented with Unicode. These keys can then be efficiently compared byte by byte in order to collate or sort them according to the rules of the language, with options for ignoring case, accents, etc. Unicode Technical Report #10 also specifies the ''Default Unicode Collation Element Table'' (DUCET). This data file specifies a default collation ordering. The DUCET is customizable for different languages, and some such customizations can be found in the Unicode Common Locale Data Repository (CLDR). An open source implementation of UCA is included with the International Components for Unicode, ICU. ICU supports tailoring, and the collation tailorings from CLDR are included in ICU. See also * Collation * ISO/IEC 14651 * European ordering rules ( ... [...More Info...]       [...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]   |
|
String (computer Science)
In computer programming, a string is traditionally a sequence of character (computing), characters, either as a literal (computer programming), literal constant or as some kind of Variable (computer science), variable. The latter may allow its elements to be Immutable object, mutated and the length changed, or it may be fixed (after creation). A string is often implemented as an array data structure of bytes (or word (computer architecture), words) that stores a sequence of elements, typically characters, using some character encoding. More general, ''string'' may also denote a sequence (or List (abstract data type), list) of data other than just characters. Depending on the programming language and precise data type used, a variable (programming), variable declared to be a string may either cause storage in memory to be statically allocated for a predetermined maximum length or employ dynamic allocation to allow it to hold a variable number of elements. When a string appears lit ... [...More Info...]       [...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]   |
|
Writing System
A writing system comprises a set of symbols, called a ''script'', as well as the rules by which the script represents a particular language. The earliest writing appeared during the late 4th millennium BC. Throughout history, each independently invented writing system gradually emerged from a system of proto-writing, where a small number of ideographs were used in a manner incapable of fully encoding language, and thus lacking the ability to express a broad range of ideas. Writing systems are generally classified according to how its symbols, called ''graphemes'', relate to units of language. Phonetic writing systemswhich include alphabets and syllabariesuse graphemes that correspond to sounds in the corresponding spoken language. Alphabets use graphemes called ''letter (alphabet), letters'' that generally correspond to spoken phonemes. They are typically divided into three sub-types: ''Pure alphabets'' use letters to represent both consonant and vowel sounds, ''abjads'' gene ... [...More Info...]       [...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]   |
|
Language
Language is a structured system of communication that consists of grammar and vocabulary. It is the primary means by which humans convey meaning, both in spoken and signed language, signed forms, and may also be conveyed through writing system, writing. Human language is characterized by its cultural and historical diversity, with significant variations observed between cultures and across time. Human languages possess the properties of Productivity (linguistics), productivity and Displacement (linguistics), displacement, which enable the creation of an infinite number of sentences, and the ability to refer to objects, events, and ideas that are not immediately present in the discourse. The use of human language relies on social convention and is acquired through learning. Estimates of the number of human languages in the world vary between and . Precise estimates depend on an arbitrary distinction (dichotomy) established between languages and dialects. Natural languages are ... [...More Info...]       [...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]   |
|
Unicode
Unicode or ''The Unicode Standard'' or TUS is a character encoding standard maintained by the Unicode Consortium designed to support the use of text in all of the world's writing systems that can be digitized. Version 16.0 defines 154,998 Character (computing), characters and 168 script (Unicode), scripts used in various ordinary, literary, academic, and technical contexts. Unicode has largely supplanted the previous environment of a myriad of incompatible character sets used within different locales and on different computer architectures. The entire repertoire of these sets, plus many additional characters, were merged into the single Unicode set. Unicode is used to encode the vast majority of text on the Internet, including most web pages, and relevant Unicode support has become a common consideration in contemporary software development. Unicode is ultimately capable of encoding more than 1.1 million characters. The Unicode character repertoire is synchronized with Univers ... [...More Info...]       [...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]   |
|
Collate
Collation is the assembly of written information into a standard order. Many systems of collation are based on numerical order or alphabetical order, or extensions and combinations thereof. Collation is a fundamental element of most office filing systems, library catalogs, and reference books. Collation differs from ''classification'' in that the classes themselves are not necessarily ordered. However, even if the order of the classes is irrelevant, the identifiers of the classes may be members of an ordered set, allowing a sorting algorithm to arrange the items by class. Formally speaking, a collation method typically defines a total order on a set of possible identifiers, called sort keys, which consequently produces a total preorder on the set of items of information (items with the same identifier are not placed in any defined order). A collation algorithm such as the Unicode collation algorithm defines an order through the process of comparing two given character string ... [...More Info...]       [...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]   |
|
SIL International
SIL Global (formerly known as the Summer Institute of Linguistics International) is an evangelical Christian nonprofit organization whose main purpose is to study, develop and document languages, especially those that are lesser-known, to expand linguistic knowledge, promote literacy, translate the Christian Bible into local languages, and aid minority language development. Based on its language documentation work, SIL publishes a database, '' Ethnologue'', of its research into the world's languages, and develops and publishes software programs for language documentation, such as FieldWorks Language Explorer (FLEx) and Lexique Pro. Its main offices in the United States are located at the International Linguistics Center in Dallas, Texas. History Early History William Cameron Townsend, a Presbyterian minister, founded the organization in 1934, after undertaking a Christian mission with the Disciples of Christ among the Kaqchikel Maya people in Guatemala in the earl ... [...More Info...]       [...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]   |
|
Common Locale Data Repository
The Common Locale Data Repository (CLDR) is a project of the Unicode Consortium to provide locale data in XML format for use in computer applications. CLDR contains locale-specific information that an operating system will typically provide to applications. CLDR is written in the Locale Data Markup Language (LDML). CLDR is maintained by a technical committee which includes employees from IBM, Apple, Google, Microsoft, and some government-based organizations. The committee is chaired by John Emmons, of IBM; Mark Davis, of Google, is vice-chair. Details Among the types of data that CLDR includes are the following: * Translations for language names * Translations for territory and country names * Translations for currency names, including singular/plural modifications * Translations for weekday, month, era, period of day, in full and abbreviated forms * Translations for time zones and example cities (or similar) for time zones * Translations for calendar fields * Patterns for f ... [...More Info...]       [...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]   |
|
International Components For Unicode
International Components for Unicode (ICU) is an open-source project of mature C/ C++ and Java libraries for Unicode support, software internationalization, and software globalization. ICU is widely portable to many operating systems and environments. It gives applications the same results on all platforms and between C, C++, and Java software. The ICU project is a technical committee of the Unicode Consortium and sponsored, supported, and used by IBM and many other companies. ICU has been included as a standard component with Microsoft Windows since Windows 10 version 1703. ICU provides the following services: Unicode text handling, full character properties, and character set conversions; Unicode regular expressions; full Unicode sets; character, word, and line boundaries; language-sensitive collation and searching; normalization, upper and lowercase conversion, and script transliterations; comprehensive locale data and resource bundle architecture via the Common Loca ... [...More Info...]       [...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]   |
|
Collation
Collation is the assembly of written information into a standard order. Many systems of collation are based on numerical order or alphabetical order, or extensions and combinations thereof. Collation is a fundamental element of most office filing systems, library catalogs, and reference books. Collation differs from ''classification'' in that the classes themselves are not necessarily ordered. However, even if the order of the classes is irrelevant, the identifiers of the classes may be members of an ordered set, allowing a sorting algorithm to arrange the items by class. Formally speaking, a collation method typically defines a total order on a set of possible identifiers, called sort keys, which consequently produces a total preorder on the set of items of information (items with the same identifier are not placed in any defined order). A collation algorithm such as the Unicode collation algorithm defines an order through the process of comparing two given character s ... [...More Info...]       [...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]   |
|
ISO/IEC 14651
'ISO/IEC 14651:2016'', ''Information technology -- International string ordering and comparison -- Method for comparing character strings and description of the common template tailorable ordering'', is an International Organization for Standardization (ISO)/International Electrotechnical Commission (IEC) standard specifying an algorithm that can be used when comparing two strings. This comparison can be used when collating a set of strings. The standard also specifies a data file, the Common Tailorable Template (CTT), which outlines the comparison order. This order is meant to be tailored for different languages, making the CTT a template rather than a default. In many cases, however, the empty tailoring—where no weightings are changed—is appropriate, as different languages have incompatible ordering requirements. One such tailoring is European ordering rules (EOR), which in turn is supposed to be tailored for different European languages. The ''Common Tailorable Template'' ( ... [...More Info...]       [...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]   |
|
European Ordering Rules
The European ordering rules (EOR / EN 13710) define an ordering for strings written in languages that are written with the Latin alphabet, Latin, Greek alphabet, Greek and Cyrillic script, Cyrillic alphabets. The standard covers languages used by the European Union, the European Free Trade Association, and parts of the former Soviet Union. It is a tailoring of the ''Common Tailorable Template'' of ISO/IEC 14651. EOR can in turn be tailored for different (European) languages. But in inter-European contexts, EOR can be used without further tailoring. Method Just as for ISO/IEC 14651, upon which EOR is based, EOR has 4 levels of weights. Level 1 The first level sorts the letters. The following Latin script, Latin letters are concerned by this level, in order: :a b c d ð e f g h i j k l m n o p q r s t u v w x y z þ The Greek alphabet has the following order: :α β γ δ ε ϝ ϛ ζ η θ ι κ λ μ ν ξ ο π ϟ ρ σ τ υ φ χ ψ ω ϡ Cyrillic script has the following ... [...More Info...]       [...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]   |
|
String Collation Algorithms
String or strings may refer to: *String (structure), a long flexible structure made from threads twisted together, which is used to tie, bind, or hang other objects Arts, entertainment, and media Films * ''Strings'' (1991 film), a Canadian animated short * ''Strings'' (2004 film), a film directed by Anders Rønnow Klarlund * ''Strings'' (2011 film), an American dramatic thriller film * ''Strings'' (2012 film), a British film by Rob Savage * '' Bravetown'' (2015 film), an American drama film originally titled ''Strings'' * '' The String'' (2009), a French film Music Instruments * String (music), the flexible element that produces vibrations and sound in string instruments * String instrument, a musical instrument that produces sound through vibrating strings ** List of string instruments * String piano, a pianistic extended technique in which sound is produced by direct manipulation of the strings, rather than striking the piano's keys Types of groups * String band, musical ... [...More Info...]       [...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]   |