National Arabic Alphabets
   HOME

TheInfoList



OR:

Many scripts in Unicode, such as Arabic, have special
orthographic rules Morphological parsing, in natural language processing, is the process of determining the morphemes from which a given word is constructed. It must be able to distinguish between orthographic rules and morphological rules. For example, the word 'fo ...
that require certain combinations of letterforms to be combined into special ligature forms. In English, the common
ampersand The ampersand, also known as the and sign, is the logogram , representing the conjunction "and". It originated as a ligature of the letters ''et''—Latin for "and". Etymology Traditionally in English, when spelling aloud, any letter that ...
(&) developed from a ligature in which the handwritten Latin letters ''e'' and ''t'' (spelling ''et'',
Latin Latin (, or , ) is a classical language belonging to the Italic branch of the Indo-European languages. Latin was originally a dialect spoken in the lower Tiber area (then known as Latium) around present-day Rome, but through the power of the ...
for ''and'') were combined. The rules governing ligature formation in Arabic can be quite complex, requiring special script-shaping technologies such as the Arabic Calligraphic Engine by DecoType. As of
Unicode Unicode, formally The Unicode Standard,The formal version reference is is an information technology Technical standard, standard for the consistent character encoding, encoding, representation, and handling of Character (computing), text expre ...
15.0, the
Arabic script The Arabic script is the writing system used for Arabic and several other languages of Asia and Africa. It is the second-most widely used writing system in the world by number of countries using it or a script directly derived from it, and the ...
is contained in the following blocks: *
Arabic Arabic (, ' ; , ' or ) is a Semitic languages, Semitic language spoken primarily across the Arab world.Semitic languages: an international handbook / edited by Stefan Weninger; in collaboration with Geoffrey Khan, Michael P. Streck, Janet C ...
(0600–06FF, 256 characters) *
Arabic Supplement Arabic Supplement is a Unicode block that encodes Arabic Arabic (, ' ; , ' or ) is a Semitic language spoken primarily across the Arab world.Semitic languages: an international handbook / edited by Stefan Weninger; in collaboration with ...
(0750–077F, 48 characters) *
Arabic Extended-B Arabic Extended-B is a Unicode block A Unicode block is one of several contiguous ranges of numeric character codes ( code points) of the Unicode character set that are defined by the Unicode Consortium for administrative and documentation purpo ...
(0870–089F, 41 characters) *
Arabic Extended-A Arabic Extended-A is a Unicode block encoding Qur'anic The Quran (, ; Standard Arabic: , Quranic Arabic: , , 'the recitation'), also romanized Qur'an or Koran, is the central religious text of Islam, believed by Muslims to be a revelation ...
(08A0–08FF, 96 characters) *
Arabic Presentation Forms-A Arabic Presentation Forms-A is a Unicode block encoding contextual forms and ligatures of letter variants needed for Persian, Urdu, Sindhi and Central Asian languages. This block also allocates 32 noncharacters in Unicode, designed specifically f ...
(FB50–FDFF, 631 characters) *
Arabic Presentation Forms-B Arabic Presentation Forms-B is a Unicode block encoding spacing forms of Arabic diacritics, and contextual letter forms. The special codepoint ZWNBSP is also here, which is only meant for a byte order mark The byte order mark (BOM) is a parti ...
(FE70–FEFF, 141 characters) *
Rumi Numeral Symbols Rumi Numeral Symbols is a Unicode block containing numeric characters used in Fez, Morocco Fez or Fes (; ar, فاس, fās; zgh, ⴼⵉⵣⴰⵣ, fizaz; french: Fès) is a city in northern inland Morocco and the capital of the Fès-Meknè ...
(10E60–10E7F, 31 characters) * Arabic Extended-C (10EC0-10EFF, 3 characters) *
Indic Siyaq Numbers Indic Siyaq Numbers is a Unicode block containing a specialized subset of the Arabic script that was used for accounting in India under the Mughals The Mughal Empire was an early-modern empire that controlled much of South Asia between ...
(1EC70–1ECBF, 68 characters) *
Ottoman Siyaq Numbers Ottoman Siyaq Numbers is a Unicode block containing a specialized subset of the Arabic script that was used for accounting in Ottoman Turkish language, Ottoman Turkish documents. Block History The following Unicode-related documents record the ...
(1ED00–1ED4F, 61 characters) *
Arabic Mathematical Alphabetic Symbols Arabic Mathematical Alphabetic Symbols is a Unicode block A Unicode block is one of several contiguous ranges of numeric character codes ( code points) of the Unicode character set that are defined by the Unicode Consortium for administrative ...
(1EE00–1EEFF, 143 characters) The basic Arabic range encodes the standard letters and diacritics, but does not encode contextual forms (U+0621–U+0652 being directly based on
ISO 8859-6 ISO/IEC 8859-6:1999, ''Information technology — 8-bit single-byte coded graphic character sets — Part 6: Latin/Arabic alphabet'', is part of the ISO/IEC 8859 series of ASCII-based standard character encodings, first edition published in 1987. ...
); and also includes the most common diacritics and
Arabic-Indic digits The Eastern Arabic numerals, also called Arabic-Hindu numerals or Indo–Arabic numerals, are the symbols used to represent numerical digits in conjunction with the Arabic alphabet in the countries of the Mashriq (the east of the Arab world), ...
. The Arabic Supplement range encodes letter variants mostly used for writing African (non-Arabic) languages. The Arabic Extended-B and Arabic Extended-A ranges encode additional Qur'anic annotations and letter variants used for various non-Arabic languages. The Arabic Presentation Forms-A range encodes contextual forms and ligatures of letter variants needed for Persian, Urdu, Sindhi and Central Asian languages. The Arabic Presentation Forms-B range encodes spacing forms of Arabic diacritics, and more contextual letter forms. The presentation forms are present only for compatibility with older standards, and are not currently needed for coding text. The Arabic Mathematical Alphabetical Symbols block encodes characters used in Arabic mathematical expressions. The Indic Siyaq Numbers block contains a specialized subset of Arabic script that was used for accounting in India under the
Mughal Empire The Mughal Empire was an early-modern empire that controlled much of South Asia between the 16th and 19th centuries. Quote: "Although the first two Timurid emperors and many of their noblemen were recent migrants to the subcontinent, the d ...
by the 17th century through the middle of the 20th century. The Ottoman Siyaq Numbers block contains a specialized subset of Arabic script, also known as ''Siyakat'' numbers, used for accounting in
Ottoman Turkish Ottoman Turkish ( ota, لِسانِ عُثمانى, Lisân-ı Osmânî, ; tr, Osmanlı Türkçesi) was the standardized register of the Turkish language used by the citizens of the Ottoman Empire (14th to 20th centuries CE). It borrowed extens ...
documents.


Contextual forms

A demonstration for the basic alphabet used in
Modern Standard Arabic Modern Standard Arabic (MSA) or Modern Written Arabic (MWA), terms used mostly by linguists, is the variety of Standard language, standardized, Literary language, literary Arabic that developed in the Arab world in the late 19th and early 20th ...
:


Punctuation and ornaments

Only the Arabic question mark ⟨⟩ and the Arabic comma ⟨⟩ are used in regular Arabic script typing and the comma is often substituted for the Latin script comma ( ,). * * * * * * * *U+066D ٭ * * * * * *U+FD3E Arabic ornate left parenthesis *U+FD3F ﴿ Arabic ornate right parenthesis


Word ligatures

Arabic Presentation Forms-A has a few characters defined as "word ligatures" for terms frequently used in formulaic expressions in Arabic. They are rarely used out of professional liturgical typing, also the Rial grapheme is normally written fully, not by the ligature. * * * *, as in the phrase الله أكبر ' * * * * * * * * * *


Code blocks


Arabic


Character table


Compact table


Arabic Supplement


Arabic Extended-B


Arabic Extended-A


Arabic Presentation Forms A

They are mostly ligatures which can be created from the previous charts' characters, with the exception of the bracket-like graphemes ﴾ ﴿ and some of them are ligatures of common liturgical phrases.


Arabic Presentation Forms B

These can all be created from the basic chart's characters.


Rumi Numeral Symbols


Arabic Extended-C


Indic Siyaq Numbers


Ottoman Siyaq Numbers


Arabic Mathematical Alphabetic Symbols


References


External links

* * * /software.sil.org/Scheherazade Scheherazadeor /fonts.google.com/specimen/Scheherazade+New?subset=arabic Scheherazade New an extended Arabic script font designed by
SIL International SIL International (formerly known as the Summer Institute of Linguistics) is an evangelical Christian non-profit organization whose main purpose is to study, develop and document languages, especially those that are lesser-known, in order to ex ...
, distributed under the
SIL Open Font License The SIL Open Font License (or OFL in short) is one of the major open font licenses, which allows embedding, or "bundling", of the font in commercially sold products. OFL is a free and open source license. It was created by SIL International ...
(OFL) * /fonts.google.com/specimen/Harmattan?subset=arabic Harmattan an extended Arabic script font designed by
SIL International SIL International (formerly known as the Summer Institute of Linguistics) is an evangelical Christian non-profit organization whose main purpose is to study, develop and document languages, especially those that are lesser-known, in order to ex ...
for West Africa, distributed under the
SIL Open Font License The SIL Open Font License (or OFL in short) is one of the major open font licenses, which allows embedding, or "bundling", of the font in commercially sold products. OFL is a free and open source license. It was created by SIL International ...
(OFL) {{Unicode navigation *
Unicode Unicode, formally The Unicode Standard,The formal version reference is is an information technology Technical standard, standard for the consistent character encoding, encoding, representation, and handling of Character (computing), text expre ...