Gaj's Latin alphabet ( sh-Latn-Cyrl, Gajeva latinica, separator=" / ", Гајева латиница}, ), also known as ( sr-Cyrl, абецеда, ) or ( sr-Cyrl, гајица, link=no, ), is the form of the
Latin script
The Latin script, also known as the Roman script, is a writing system based on the letters of the classical Latin alphabet, derived from a form of the Greek alphabet which was in use in the ancient Greek city of Cumae in Magna Graecia. The Gree ...
used for writing all four
standard varieties of
Serbo-Croatian
Serbo-Croatian ( / ), also known as Bosnian-Croatian-Montenegrin-Serbian (BCMS), is a South Slavic language and the primary language of Serbia, Croatia, Bosnia and Herzegovina, and Montenegro. It is a pluricentric language with four mutually i ...
:
Bosnian,
Croatian,
Montenegrin, and
Serbian. It contains 27 individual letters and 3 digraphs. Each letter (including digraphs) represents one
Serbo-Croatian phoneme, yielding a highly
phonemic orthography. It closely corresponds to the
Serbian Cyrillic alphabet
The Serbian Cyrillic alphabet (, ), also known as the Serbian script, (, ), is a standardized variation of the Cyrillic script used to write the Serbian language. It originated in medieval Serbia and was significantly reformed in the 19th cen ...
.
The alphabet was initially devised by Croatian linguist
Ljudevit Gaj
Ljudevit Gaj (; born Ludwig Gay; ; 8 August 1809 – 20 April 1872) was a Croatian linguist, politician, journalist and writer. He was one of the central figures of the pan-Slavist Illyrian movement.
Biography
Origin
He was born in Krapina ( ...
in 1835 during the
Illyrian movement in
ethnically Croatian parts of the
Austrian Empire
The Austrian Empire, officially known as the Empire of Austria, was a Multinational state, multinational European Great Powers, great power from 1804 to 1867, created by proclamation out of the Habsburg monarchy, realms of the Habsburgs. Duri ...
. It was largely based on
Jan Hus
Jan Hus (; ; 1369 – 6 July 1415), sometimes anglicized as John Hus or John Huss, and referred to in historical texts as ''Iohannes Hus'' or ''Johannes Huss'', was a Czechs, Czech theologian and philosopher who became a Church reformer and t ...
's
Czech alphabet
Czech orthography is a system of rules for proper formal writing (orthography) in Czech language, Czech. The earliest form of separate Latin script specifically designed to suit Czech was devised by Czech theologian and church reformist Jan Hus, ...
and was meant to serve as a unified orthography for
three Croat-populated kingdoms within the Austrian Empire at the time, namely
Croatia
Croatia, officially the Republic of Croatia, is a country in Central Europe, Central and Southeast Europe, on the coast of the Adriatic Sea. It borders Slovenia to the northwest, Hungary to the northeast, Serbia to the east, Bosnia and Herze ...
,
Dalmatia
Dalmatia (; ; ) is a historical region located in modern-day Croatia and Montenegro, on the eastern shore of the Adriatic Sea. Through time it formed part of several historical states, most notably the Roman Empire, the Kingdom of Croatia (925 ...
and
Slavonia
Slavonia (; ) is, with Dalmatia, Croatia proper, and Istria County, Istria, one of the four Regions of Croatia, historical regions of Croatia. Located in the Pannonian Plain and taking up the east of the country, it roughly corresponds with f ...
, and their three dialect groups,
Kajkavian
Kajkavian is a South Slavic languages, South Slavic supradialect or language spoken primarily by Croats in much of Central Croatia and Gorski Kotar.
It is part of the South Slavic dialect continuum, being transitional to the supradialects of Č ...
,
Chakavian and
Shtokavian, which historically utilized different spelling rules. The alphabet's final form was defined in the late 19th century.
A
slightly reduced version is used as the alphabet for
Slovene, and a
slightly expanded version is used for modern standard Montenegrin. A modified version is used for the
romanization
In linguistics, romanization is the conversion of text from a different writing system to the Latin script, Roman (Latin) script, or a system for doing so. Methods of romanization include transliteration, for representing written text, and tra ...
of
Macedonian. It further influenced
alphabets of Romani languages that are spoken in
Southeast Europe
Southeast Europe or Southeastern Europe is a geographical sub-region of Europe, consisting primarily of the region of the Balkans, as well as adjacent regions and Archipelago, archipelagos. There are overlapping and conflicting definitions of t ...
, namely
Vlax and
Balkan Romani.
Letters
The alphabet consists of thirty
upper and
lower case
Letter case is the distinction between the letters that are in larger uppercase or capitals (more formally ''majuscule'') and smaller lowercase (more formally '' minuscule'') in the written representation of certain languages. The writing system ...
letters:

Letters are referred to by their name: ''a, be, ce, če, će, de, dže, đe, e, ef, ge, ha, i, je, ka, el, elj, em, en, enj, o, pe, er, es, eš, te, u, ve, ze, že'', or, in the case of consonants, by being appended by
schwa, e.g. . In mathematics, is commonly pronounced ''jot'', as in the
German of Germany.
Foreign letters
Various foreign letters are utilised in orthographically unadapted
loanwords and foreign proper names, such as ''Québec''. Orthographically unadapted spelling of foreign names and some loanwords is standard in Croatia, whereas Serbians prefer to use orthographically adapted spellings. Non-native letters
Q,
W,
X, and
Y appear on the
Serbo-Croatian keyboard. These four letters are usually named as follows: as ''kve'' or ''ku'', as ''duplo ve'' or ''dvostruko ve'', as ''iks'', and as ''ipsilon''.
Digraphs
Digraphs , and are considered to be single letters, and they signify single phonemes. However, they are distinguished from occurences of two such letters that signify two distinct phonemes: ''džep'' (, Cyrillic ''џеп'') uses the digraph, while ''nadživjeti'' (, Cyrillic ''надживјети'', morphological boundary: prefix ''nad-'' + base ''živjeti'') uses two separate letters.
* In dictionaries, ''njegov'' comes after ''novine'', in a separate section after the end of the section; ''bolje'' comes after ''bolnica''; ''nadžak'' (digraph ) comes after ''nadživjeti'' (+ sequence), and so forth.
*If only the initial letter of a word is capitalized, only the first of the two component letters is capitalized: ''Njemačka'' ('
Germany
Germany, officially the Federal Republic of Germany, is a country in Central Europe. It lies between the Baltic Sea and the North Sea to the north and the Alps to the south. Its sixteen States of Germany, constituent states have a total popu ...
'), not ''NJemačka''. Uppercase is used only if the entire word was capitalized: ''NJEMAČKA''. In
Unicode
Unicode or ''The Unicode Standard'' or TUS is a character encoding standard maintained by the Unicode Consortium designed to support the use of text in all of the world's writing systems that can be digitized. Version 16.0 defines 154,998 Char ...
, the form is referred to as ''
titlecase'', as opposed to the uppercase form , representing one of the few cases in which titlecase and uppercase differ.
*In vertical writing (such as on signs), , , are written horizontally, as a unit. For instance, if ''ulje'' ('oil') is written vertically, appears on the second line. In
crossword puzzles, , , each occupy a single square. The word ''mjenjačnica'' ('
bureau de change') is written vertically with on the fourth line, while and appear separately on the first and second lines, respectively, because contains two letters, not one.
*If words are written with a space between each letter (such as on signs), each digraph is written as a unit. For instance: ''U LJ E'', ''M J E NJ A Č N I C A''.
Accent marks
The vowels , , , , , along with the
syllabic consonant
A syllabic consonant or vocalic consonant is a consonant that forms the nucleus of a syllable on its own, like the ''m'', ''n'' and ''l'' in some pronunciations of the English words ''rhythm'', ''button'' and ''awful'', respectively. To represe ...
s and , can take one of 5 accents: the
double grave accent (◌̏) for a short vowel with falling tone, the
inverted breve
Inverse or invert may refer to:
Science and mathematics
* Inverse (logic), a type of conditional sentence which is an immediate inference made from another conditional sentence
* Additive inverse, the inverse of a number that, when added to the ...
(◌̑) for a long vowel with falling tone, the
grave accent
The grave accent () ( or ) is a diacritical mark used to varying degrees in French, Dutch, Portuguese, Italian, Catalan and many other Western European languages as well as for a few unusual uses in English. It is also used in other ...
(◌̀) for a short vowel with rising tone, the
acute accent
The acute accent (), ,
is a diacritic used in many modern written languages with alphabets based on the Latin alphabet, Latin, Cyrillic script, Cyrillic, and Greek alphabet, Greek scripts. For the most commonly encountered uses of the accen ...
(◌́) for long vowel with rising tone, and
macron (◌̄) for a non-tonic long vowel. These diacritic accents are typically used in dictionaries and linguistic publications, and in poetry to denote
metrically correct reading. In ordinary prose they occur when needed to resolve semantic ambiguity between
homographs: ('at') vs. ('code'), ('am') vs. ('alone'). For the same reason, the length of an unaccented syllable can be marked with ⟨◌̄⟩ or
circumflex
The circumflex () is a diacritic in the Latin and Greek scripts that is also used in the written forms of many languages and in various romanization and transcription schemes. It received its English name from "bent around"a translation of ...
⟨◌̂⟩, without accentuating the rest of the word. This is typically used to distinguish homographic nominative singular and genitive plural forms of nouns, where the genitive plural has a long final vowel: ('book' ) vs. or ('books' ).
History
Croatian Latin alphabet before Gaj
In Croatian writing the Latin alphabet became dominant in the 16th century, marginalising the
Cyrillic
The Cyrillic script ( ) is a writing system used for various languages across Eurasia. It is the designated national script in various Slavic, Turkic, Mongolic, Uralic, Caucasian and Iranic-speaking countries in Southeastern Europe, Ea ...
and the
Glagolitic
The Glagolitic script ( , , ''glagolitsa'') is the oldest known Slavic alphabet. It is generally agreed that it was created in the 9th century for the purpose of translating liturgical texts into Old Church Slavonic by Saints Cyril and Methodi ...
alphabets. In the 17th century there coalesced two major orthographic practices for using the Latin alphabet.
Dalmatia
Dalmatia (; ; ) is a historical region located in modern-day Croatia and Montenegro, on the eastern shore of the Adriatic Sea. Through time it formed part of several historical states, most notably the Roman Empire, the Kingdom of Croatia (925 ...
used a system based on the
Italian orthography, whereas the continental
Kaykavian writing was based on
Hungarian. In the 18th century the
Slavonia
Slavonia (; ) is, with Dalmatia, Croatia proper, and Istria County, Istria, one of the four Regions of Croatia, historical regions of Croatia. Located in the Pannonian Plain and taking up the east of the country, it roughly corresponds with f ...
n orthography arose as well, a mixture of the previous two. However, the specifics of the alphabetic systems tended to vary from writer to writer.
In addition to these three widely used systems, multiple individual writers attempted their own reforms of the alphabet. These include
Rajmund Đamanjić (1639), the early 1700s Dubrovnik academy work led by
Đuro Matijašević and
Ignjat Đurđević, as well as the early 1700s ''Lexicon Latino-Illyricum'' by
Pavao Ritter Vitezović.
Gaj's reform and its revisions
The Serbo-Croatian Latin alphabet was mostly designed by
Ljudevit Gaj
Ljudevit Gaj (; born Ludwig Gay; ; 8 August 1809 – 20 April 1872) was a Croatian linguist, politician, journalist and writer. He was one of the central figures of the pan-Slavist Illyrian movement.
Biography
Origin
He was born in Krapina ( ...
, who modelled it after
Czech (č, ž, š) and
Polish (ć), and invented , and , according to similar solutions in
Hungarian (ly, ny and dzs, although dž combinations exist also in Czech (and Polish as dż)). In 1830 in
Buda
Buda (, ) is the part of Budapest, the capital city of Hungary, that lies on the western bank of the Danube. Historically, “Buda” referred only to the royal walled city on Castle Hill (), which was constructed by Béla IV between 1247 and ...
, he published the book ''Kratka osnova horvatsko-slavenskog pravopisanja'' ("Brief basics of the Croatian-Slavonic orthography"), which was the first common Croatian
orthography
An orthography is a set of convention (norm), conventions for writing a language, including norms of spelling, punctuation, Word#Word boundaries, word boundaries, capitalization, hyphenation, and Emphasis (typography), emphasis.
Most national ...
book.
Gaj followed the example of Pavao Ritter Vitezović and the
Czech orthography, making one letter of the Latin script for each sound in the language. Following
Vuk Karadžić's reform of Cyrillic in the early nineteenth century, in the 1830s Ljudevit Gaj did the same for ''latinica'', using the Czech system and producing a one-to-one grapheme-phoneme correlation between the Cyrillic and Latin orthographies, resulting in a parallel system.
In 1878
Đuro Daničić proposed a replacement of the digraphs , , and with single letters: , , and respectively. Of the four, was accepted in
Ivan Broz's 1892 ''Hrvatski pravopis'' ("Croatian Orthography") and it thus became a part of the standard alphabet, though it was not immediately accepted by all writers and publishers. The other three letters remained in use only in certain philological publications. Names of individual people have sometimes retained the pre-''đ'' spelling: ''
Ksaver Šandor Gjalski'' (), ''
Gjuro Szabo'' ().
Correspondence between Cyrillic and Latin alphabets
Each Cyrillic and Latin Serbo-Croatian letter has its exact counterpart in the other alphabet, although Latin digraphs , and correspond to Cyrillic single letters , and . The following table provides the upper and lower case forms of Gaj's Latin alphabet, along with the equivalent forms in the Serbo-Croatian Cyrillic alphabet.
Computing
In the 1990s, there was a general confusion about the proper
character encoding
Character encoding is the process of assigning numbers to graphical character (computing), characters, especially the written characters of human language, allowing them to be stored, transmitted, and transformed using computers. The numerical v ...
to use to write text in Latin Croatian on computers.
*An attempt was made to apply the 7-bit "
YUSCII", later "CROSCII", which included the five letters with diacritics at the expense of five non-letter characters (
, @), but it was ultimately unsuccessful. Because the ASCII character @ sorts before A, this led to jokes calling it ''žabeceda'' (''žaba''=frog, ''abeceda''=alphabet).
*Other short-lived vendor-specific efforts were also undertaken.
*The
8-bit ISO 8859-2 (Latin-2) standard was developed by ISO.
*
MS-DOS
MS-DOS ( ; acronym for Microsoft Disk Operating System, also known as Microsoft DOS) is an operating system for x86-based personal computers mostly developed by Microsoft. Collectively, MS-DOS, its rebranding as IBM PC DOS, and a few op ...
introduced 8-bit encoding CP852 for Central European languages, disregarding the ISO standard.
*
Microsoft Windows
Windows is a Product lining, product line of Proprietary software, proprietary graphical user interface, graphical operating systems developed and marketed by Microsoft. It is grouped into families and subfamilies that cater to particular sec ...
spread yet another 8-bit encoding called
CP1250, which had a few letters mapped one-to-one with ISO 8859-2, but also had some mapped elsewhere.
*
Apple
An apple is a round, edible fruit produced by an apple tree (''Malus'' spp.). Fruit trees of the orchard or domestic apple (''Malus domestica''), the most widely grown in the genus, are agriculture, cultivated worldwide. The tree originated ...
's
Macintosh Central European encoding does not include the entire Gaj's Latin alphabet. Instead, a separate codepage, called
MacCroatian encoding, is used.
*
EBCDIC
Extended Binary Coded Decimal Interchange Code (EBCDIC; ) is an eight- bit character encoding used mainly on IBM mainframe and IBM midrange computer operating systems. It descended from the code used with punched cards and the corresponding si ...
also has a Latin-2 encoding.
The preferred
character encoding
Character encoding is the process of assigning numbers to graphical character (computing), characters, especially the written characters of human language, allowing them to be stored, transmitted, and transformed using computers. The numerical v ...
for Croatian today is either the
ISO 8859-2, or the
Unicode
Unicode or ''The Unicode Standard'' or TUS is a character encoding standard maintained by the Unicode Consortium designed to support the use of text in all of the world's writing systems that can be digitized. Version 16.0 defines 154,998 Char ...
encoding
UTF-8
UTF-8 is a character encoding standard used for electronic communication. Defined by the Unicode Standard, the name is derived from ''Unicode Transformation Format 8-bit''. Almost every webpage is transmitted as UTF-8.
UTF-8 supports all 1,112,0 ...
(with two bytes or 16 bits necessary to use the letters with diacritics). However, , one can still find programs as well as databases that use
CP1250, CP852 or even CROSCII.
Digraphs , and in their upper case, title case and lower case forms have dedicated Unicode code points as shown in the table below, However, these are included chiefly for backwards
compatibility with legacy encodings which kept a one-to-one correspondence with Cyrillic; modern texts use a sequence of characters.
Usage for Slovene
Since the early 1840s, Gaj's alphabet was increasingly used for
Slovene. In the beginning, it was most commonly used by Slovene authors who treated Slovene as a variant of Serbo-Croatian (such as
Stanko Vraz), but it was later accepted by a large spectrum of Slovene-writing authors. The breakthrough came in 1845, when the Slovene conservative leader
Janez Bleiweis started using Gaj's script in his journal ''
Kmetijske in rokodelske novice'' ("Agricultural and Artisan News"), which was read by a wide public in the countryside. By 1850, Gaj's alphabet (known as ''gajica'' in Slovene) became the only official
Slovene alphabet, replacing three other writing systems that had circulated in the
Slovene Lands since the 1830s: the traditional ''
bohoričica'', named after
Adam Bohorič, who codified it; the ''
dajnčica'', named after
Peter Dajnko; and the ''
metelčica'', named after
Franc Serafin Metelko.
The Slovene version of Gaj's alphabet differs from the Serbo-Croatian one in several ways:
*The Slovene alphabet does not have the characters and ; the sounds they represent do not occur in Slovene.
*In Slovene, the digraphs and are treated as two separate letters and represent separate sounds (the word
polje is pronounced or in Slovene, as opposed to in Serbo-Croatian).
*While the phoneme exists in modern Slovene and is written , it is used in only borrowed words and so and are considered separate letters, not a digraph.
As in Serbo-Croatian, Slovene orthography does not make use of diacritics to mark accent in words in regular writing, but
headwords in dictionaries are given with them to account for
homographs. For instance, letter can be pronounced in four ways (, , and ), and letter in two ( and , though the difference is not
phonemic). Also, it does not reflect consonant voicing assimilation: compare e.g. Slovene and Serbo-Croatian ('junkyard', 'waste').
Usage for Macedonian
Romanization
In linguistics, romanization is the conversion of text from a different writing system to the Latin script, Roman (Latin) script, or a system for doing so. Methods of romanization include transliteration, for representing written text, and tra ...
of
Macedonian is done according to Gaj's Latin alphabet
[Macedonian Latin alphabet, Pravopis na makedonskiot literaturen jazik, B. Vidoeski, T. Dimitrovski, K. Koneski, K. Tošev, R. Ugrinova Skalovska - Prosvetno delo Skopje, 1970, p.99] with slight modification. Gaj's ''ć'' and ''đ'' are not used at all, with ''ḱ'' and ''ǵ'' introduced instead. The rest of the letters of the alphabet are used to represent the equivalent Cyrillic letters. Also, Macedonian uses the letter ''dz'', which is not part of the Serbo-Croatian phonemic inventory. As per the orthography, both ''lj'' and ''ĺ'' are accepted as romanisations of љ and both ''nj'' and ''ń'' for њ. For informal purposes, like texting, most Macedonian speakers will omit the diacritics or use a digraph- and trigraph-based system for ease as there is no Macedonian Latin keyboard supported on most systems. For example, ''š'' becomes ''sh'' or ''s'', and ''dž'' becomes ''dzh'' or ''dz''.
Keyboard layout
The standard Gaj's Latin alphabet
keyboard layout
A keyboard layout is any specific physical, visual, or functional arrangement of the keys, legends, or key-meaning associations (respectively) of a computer keyboard, mobile phone, or other computer-controlled typographic keyboard. Standard keybo ...
for personal computers is as follows:
::
See also
*
Glagolitic alphabet
The Glagolitic script ( , , ''glagolitsa'') is the oldest known Slavic alphabet. It is generally agreed that it was created in the 9th century for the purpose of translating liturgical texts into Old Church Slavonic by Saints Cyril and Methodi ...
*
Yugoslav braille
*
Yugoslav manual alphabet
*
Romanization of Serbian – describes usage not the alphabet
*
Romanization of Montenegrin – describes usage not the alphabet
Notes
References
Sources
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
External links
Omniglot
{{List of writing systems
Latin alphabets
Alphabet
An alphabet is a standard set of letter (alphabet), letters written to represent particular sounds in a spoken language. Specifically, letters largely correspond to phonemes as the smallest sound segments that can distinguish one word from a ...
Alphabet
An alphabet is a standard set of letter (alphabet), letters written to represent particular sounds in a spoken language. Specifically, letters largely correspond to phonemes as the smallest sound segments that can distinguish one word from a ...
Serbo-Croatian language
Slovene alphabet
Writing systems introduced in the 19th century
Bosnian language
1835 introductions