HOME

TheInfoList



OR:

The frequency of letters in text has often been studied for use in cryptanalysis, and frequency analysis in particular. No language has an exact letter frequency distribution, as all writers write slightly differently. As a rule texts in different languages using the Arabic script (e.g.
Arabic Arabic (, ' ; , ' or ) is a Semitic language spoken primarily across the Arab world.Semitic languages: an international handbook / edited by Stefan Weninger; in collaboration with Geoffrey Khan, Michael P. Streck, Janet C. E.Watson; Walter ...
, Ottoman Turkish,
Persian Persian may refer to: * People and things from Iran, historically called ''Persia'' in the English language ** Persians, the majority ethnic group in Iran, not to be conflated with the Iranic peoples ** Persian language, an Iranian language of the ...
and
Urdu Urdu (;"Urdu"
'' Huffman coding In computer science and information theory, a Huffman code is a particular type of optimal prefix code that is commonly used for lossless data compression. The process of finding or using such a code proceeds by means of Huffman coding, an algo ...
.


Arabic letters

The Arabic alphabet consists of 28 primary letters, these are letters 1 to 28 in Table 1. The eight modified letters listed in positions 29 to 36 in the same table are used just the same. If these 8 modified forms are folded into the primary list based on shape or phonetic similarity, the outcome then is as shown in Table 2. For accurate frequency analysis, each of the 36 letters of Table 1 gets its frequency counted independently. The ordering of the alphabet shown in the tables is more logical than is used by the
Unicode Unicode, formally The Unicode Standard,The formal version reference is is an information technology standard for the consistent encoding, representation, and handling of text expressed in most of the world's writing systems. The standard, wh ...
standard. Although the full set of Arabic characters includes about ten diacritics as shown in the Figure 1, frequency analysis of Arabic characters is only concerned with computing the frequency of alphabet letters shown in Table 2.


Arabic letter frequency using general sources

The following Arabic sources are used to generate an acceptable amount of data on which frequency statistics are conducted. * The first seven volumes of the series البداية والنهاية (''The Beginning and The End'') of Ibn Kathir, with 2,855 pages, containing 1,096,047 words, containing 4,326,031 letters. * The book of الرحيق المختوم (''
The Sealed Nectar ''Ar-Raheeq Al-Makhtum'' ( ar, الرحيق المختوم; ), is a seerah book, or biography of the Prophet, which was written by Safiur Rahman Mubarakpuri. This book was awarded first prize by the Muslim World League in a worldwide competitio ...
'') of Almubarakfuri, with 284 pages, containing 134,662 words, containing 553,740 letters. * The book of تحفة العروسين (''The Masterpiece of the Brides'') of Al-shuri, with 239 pages, containing 66,550 words, containing 242,361 letters. Collectively, these sources add up to 3,378 pages, with 1,297,259 words, and 5,122,132 letters. The following graph shows the letter frequency distribution for the counted letters.


References


External links


Tools to analyze Arabic text letters and words

A detailed study of Statistical Distributions of Arabic Text Letters
{{DEFAULTSORT:Arabic letters Arabic letters Arabic language Quantitative linguistics