HOME

TheInfoList



OR:

The European ordering rules (EOR / EN 13710), define an ordering for strings written in languages that are written with the
Latin Latin (, or , ) is a classical language belonging to the Italic branch of the Indo-European languages. Latin was originally a dialect spoken in the lower Tiber area (then known as Latium) around present-day Rome, but through the power of the ...
,
Greek Greek may refer to: Greece Anything of, from, or related to Greece, a country in Southern Europe: *Greeks, an ethnic group. *Greek language, a branch of the Indo-European language family. **Proto-Greek language, the assumed last common ancestor ...
and
Cyrillic , bg, кирилица , mk, кирилица , russian: кириллица , sr, ћирилица, uk, кирилиця , fam1 = Egyptian hieroglyphs , fam2 = Proto-Sinaitic , fam3 = Phoenician , fam4 = G ...
alphabet An alphabet is a standardized set of basic written graphemes (called letters) that represent the phonemes of certain spoken languages. Not all writing systems represent language in this way; in a syllabary, each character represents a syll ...
s. The standard covers languages used by the
European Union The European Union (EU) is a supranational political and economic union of member states that are located primarily in Europe. The union has a total area of and an estimated total population of about 447million. The EU has often been des ...
, the
European Free Trade Association The European Free Trade Association (EFTA) is a regional trade organization and free trade area consisting of four List of sovereign states and dependent territories in Europe, European states: Iceland, Liechtenstein, Norway and Switzerlan ...
, and parts of the
former Soviet Union The post-Soviet states, also known as the former Soviet Union (FSU), the former Soviet Republics and in Russia as the near abroad (russian: links=no, ближнее зарубежье, blizhneye zarubezhye), are the 15 sovereign states that wer ...
. It is a tailoring of the ''Common Tailorable Template'' of
ISO/IEC 14651 'ISO/IEC 14651:2016'', ''Information technology -- International string ordering and comparison -- Method for comparing character strings and description of the common template tailorable ordering'', is an ISO/IEC standard specifying an algorithm ...
. EOR can in turn be tailored for different (European) languages. But in inter-European contexts, EOR can be used without further tailoring.


Method

Just as for
ISO/IEC 14651 'ISO/IEC 14651:2016'', ''Information technology -- International string ordering and comparison -- Method for comparing character strings and description of the common template tailorable ordering'', is an ISO/IEC standard specifying an algorithm ...
, upon which EOR is based, EOR has 4 levels of weights. Level 1 sorts the letters. The following
Latin Latin (, or , ) is a classical language belonging to the Italic branch of the Indo-European languages. Latin was originally a dialect spoken in the lower Tiber area (then known as Latium) around present-day Rome, but through the power of the ...
letters are concerned by this level, in order: :a b c d ð e ə ɛ f g h i j k l m n o ɔ p q r s ɯ t u v w x y z þ æ The
Greek alphabet The Greek alphabet has been used to write the Greek language since the late 9th or early 8th century BCE. It is derived from the earlier Phoenician alphabet, and was the earliest known alphabetic script to have distinct letters for vowels as we ...
has the following order: :α β γ δ ε Ϝ Ϛ ζ η θ ι κ λ μ ν ξ ο π Ϟ ρ σ τ υ φ χ ψ ω Ϡ
Cyrillic script The Cyrillic script ( ), Slavonic script or the Slavic script, is a writing system used for various languages across Eurasia. It is the designated national script in various Slavic languages, Slavic, Turkic languages, Turkic, Mongolic languages, ...
has the following order: :а ӑ ӓ ә ӛ ӕ б в г ғ ҕ ґ д ђ ҙ е ӗ ё є є̈ ж ӝ җ ӂ з ӟ з́ ѕ ӡ и ӥ і ї й ј к қ ӄ ҡ ҟ ҝ л љ ꙥ м ꙧ н ң ӊ ҥ њ ӈ о ӧ ŏ ө ӫ ө̆ ѡ ꙍ ҩ п ҧ р с ҫ с́ т ҭ ћ у ў ӱ ӳ ү ұ ф х ҳ ӽ ѯ һ ц ҵ ч ӵ ҷ ӌ ҹ ҽ ҿ џ ш щ ъ ы ӹ ь ѣ э ю ю̆ я я̆ Ӏ ѫ ѭ ѧ ѩ ѱ ѳ ѵ ѷ ҁ ꙟ The order for the three alphabets is: # Latin alphabet # Greek alphabet # Cyrillic alphabet The
Georgian Georgian may refer to: Common meanings * Anything related to, or originating from Georgia (country) ** Georgians, an indigenous Caucasian ethnic group ** Georgian language, a Kartvelian language spoken by Georgians **Georgian scripts, three scrip ...
and
Armenian alphabet The Armenian alphabet ( hy, Հայոց գրեր, ' or , ') is an alphabetic writing system used to write Armenian language, Armenian. It was developed around 405 AD by Mesrop Mashtots, an Armenian linguist and wikt:ecclesiastical, ecclesiast ...
s have not been included in ENV 13710. However, they are covered in CR 14400:2001 "European ordering rules – Ordering for Latin, Greek, Cyrillic, Georgian and Armenian scripts". All scripts encoded in ISO/IEC 10646 and Unicode are covered by
ISO/IEC 14651 'ISO/IEC 14651:2016'', ''Information technology -- International string ordering and comparison -- Method for comparing character strings and description of the common template tailorable ordering'', is an ISO/IEC standard specifying an algorithm ...
(and its datafile CTT) as well as
Unicode collation algorithm The Unicode collation algorithm (UCA) is an algorithm defined in Unicode Technical Report #10, which is a customizable method to produce binary keys from strings representing text in any writing system and language that can be represented with Unic ...
(UCA and the associated DUCET), both of which are available at no charge. Level 2 is where different additions, such as
diacritic A diacritic (also diacritical mark, diacritical point, diacritical sign, or accent) is a glyph added to a letter or to a basic glyph. The term derives from the Ancient Greek (, "distinguishing"), from (, "to distinguish"). The word ''diacriti ...
s and variations, to the letters are ordered. Letters with diacritical marks (like , , , and ) are ordered as variants of the base letter. , , and are ordered as modifications of , , and respectively, similarly for similar cases. Level 2 defines the following order of diacritics and other modifications: #
Acute accent The acute accent (), , is a diacritic used in many modern written languages with alphabets based on the Latin, Cyrillic, and Greek scripts. For the most commonly encountered uses of the accent in the Latin and Greek alphabets, precomposed ch ...
(á) #
Grave accent The grave accent () ( or ) is a diacritical mark used to varying degrees in French, Dutch, Portuguese, Italian and many other western European languages, as well as for a few unusual uses in English. It is also used in other languages using t ...
(à) #
Breve A breve (, less often , neuter form of the Latin "short, brief") is the diacritic mark ˘, shaped like the bottom half of a circle. As used in Ancient Greek, it is also called , . It resembles the caron (the wedge or in Czech, in Slo ...
(ă) #
Circumflex The circumflex () is a diacritic in the Latin and Greek scripts that is also used in the written forms of many languages and in various romanization and transcription schemes. It received its English name from la, circumflexus "bent around"a ...
(â) #
Caron A caron (), háček or haček (, or ; plural ''háčeks'' or ''háčky'') also known as a hachek, wedge, check, kvačica, strešica, mäkčeň, varnelė, inverted circumflex, inverted hat, flying bird, inverted chevron, is a diacritic mark ( ...
(š) #
Ring Ring may refer to: * Ring (jewellery), a round band, usually made of metal, worn as ornamental jewelry * To make a sound with a bell, and the sound made by a bell :(hence) to initiate a telephone connection Arts, entertainment and media Film and ...
(å) # Diaeresis (ä) #
Double acute accent The double acute accent ( ˝ ) is a diacritic mark of the Latin and Cyrillic scripts. It is used primarily in Hungarian or Chuvash, and consequently it is sometimes referred to by typographers as hungarumlaut. The signs formed with a regular um ...
(ő) #
Tilde The tilde () or , is a grapheme with several uses. The name of the character came into English from Spanish, which in turn came from the Latin '' titulus'', meaning "title" or "superscription". Its primary use is as a diacritic (accent) in ...
(ã) # Dot (ż) #
Cedilla A cedilla ( ; from Spanish) or cedille (from French , ) is a hook or tail ( ¸ ) added under certain letters as a diacritical mark to modify their pronunciation. In Catalan, French, and Portuguese (called cedilha) it is used only under the ' ...
(ş) #
Ogonek The (; Polish: , "little tail", diminutive of ) is a diacritic hook placed under the lower right corner of a vowel in the Latin alphabet used in several European languages, and directly under a vowel in several Native American languages. It i ...
(ą) # Macron (ā) # With stroke through (ø) # Modified letter(s) (æ) Level 3 makes the distinction between Capital and small letters, as in "Polish" and "polish". Level 4 concerns
punctuation Punctuation (or sometimes interpunction) is the use of spacing, conventional signs (called punctuation marks), and certain typographical devices as aids to the understanding and correct reading of written text, whether read silently or aloud. An ...
and
whitespace character In computer programming, whitespace is any character or series of characters that represent horizontal or vertical space in typography. When rendered, a whitespace character does not correspond to a visible mark, but typically does occupy an area ...
s. This level makes the distinction between "MacDonald" and "Mac Donald", "its" and "it's". An optional, and usually omitted, fifth level can distinguish typographical differences, including whether the text is ''italic'', normal or bold.


See also

*
Collation Collation is the assembly of written information into a standard order. Many systems of collation are based on numerical order or alphabetical order, or extensions and combinations thereof. Collation is a fundamental element of most office fili ...
*
Common Locale Data Repository The Common Locale Data Repository Project, often abbreviated as CLDR, is a project of the Unicode Consortium to provide locale data in XML format for use in computer applications. CLDR contains locale-specific information that an operating syst ...
(CLDR) *
Unicode Unicode, formally The Unicode Standard,The formal version reference is is an information technology Technical standard, standard for the consistent character encoding, encoding, representation, and handling of Character (computing), text expre ...
*
Universal Character Set The Universal Coded Character Set (UCS, Unicode) is a standard set of characters defined by the international standard ISO/IEC 10646, ''Information technology — Universal Coded Character Set (UCS)'' (plus amendments to that standard), whi ...
**
DIN 91379 The DIN standard DIN 91379: "Characters and defined character sequences in Unicode for the electronic processing of names and data exchange in Europe, with CD-ROM" defines a normative subset of Unicode Latin characters, sequences of base characte ...
– a European Unicode subset (also includes Greek and
Cyrillic , bg, кирилица , mk, кирилица , russian: кириллица , sr, ћирилица, uk, кирилиця , fam1 = Egyptian hieroglyphs , fam2 = Proto-Sinaitic , fam3 = Phoenician , fam4 = G ...
for
Bulgarian Bulgarian may refer to: * Something of, from, or related to the country of Bulgaria * Bulgarians, a South Slavic ethnic group * Bulgarian language, a Slavic language * Bulgarian alphabet * A citizen of Bulgaria, see Demographics of Bulgaria * Bul ...
), uses
UTF-8 UTF-8 is a variable-width encoding, variable-length character encoding used for electronic communication. Defined by the Unicode Standard, the name is derived from ''Unicode'' (or ''Universal Coded Character Set'') ''Transformation Format 8-bit'' ...
at interfaces, normalization form C (
NFC NFC may refer to: Psychology * Need for cognition, in psychology * Need for closure, social psychological term Sports * NFC Championship Game, the National Football Conference Championship Game * NCAA Football Championship (Philippines) * Nati ...
) – a German 2022 standard; will be mandatory for German authorities and organizations in the exchange of data from 1 November 2024 *
UTF-8 UTF-8 is a variable-width encoding, variable-length character encoding used for electronic communication. Defined by the Unicode Standard, the name is derived from ''Unicode'' (or ''Universal Coded Character Set'') ''Transformation Format 8-bit'' ...


References

;Notes * Hansson, Roger; Lindgren, Carl Göran; Ljung, Heléne; Lundén, Thomas. ''Språk och skrift i Europa''. SNS Förlag. (2004) * Küster, Marc Wilhelm: ''Geordnetes Weltbild. Die Tradition des alphabetischen Sortierens von der Keilschrift bis zur EDV. Eine Kulturgeschichte.'' Niemeyer (2006) . Written by the editor of ENV 13710, it discusses in chapter 17.4 the genesis and the contents of the EOR. Cf. als

in particular als

{{refend


External links


European Ordering Rules
ENV 13710 – a "European Pre-Standard" Library science Collation