In
Unicode
Unicode, formally The Unicode Standard,The formal version reference is is an information technology standard for the consistent encoding, representation, and handling of text expressed in most of the world's writing systems. The standard, wh ...
, a script is a collection of
letter
Letter, letters, or literature may refer to:
Characters typeface
* Letter (alphabet), a character representing one or more of the sounds used in speech; any of the symbols of an alphabet.
* Letterform, the graphic form of a letter of the alphabe ...
s and other written signs used to represent textual information in one or more
writing system
A writing system is a method of visually representing verbal communication, based on a script and a set of rules regulating its use. While both writing and speech are useful in conveying messages, writing differs in also being a reliable fo ...
s. Some scripts support one and only one writing system and
language
Language is a structured system of communication. The structure of a language is its grammar and the free components are its vocabulary. Languages are the primary means by which humans communicate, and may be conveyed through a variety of ...
, for example,
Armenian
Armenian may refer to:
* Something of, from, or related to Armenia, a country in the South Caucasus region of Eurasia
* Armenians, the national people of Armenia, or people of Armenian descent
** Armenian Diaspora, Armenian communities across the ...
. Other scripts support many different writing systems; for example, the
Latin script
The Latin script, also known as Roman script, is an alphabetic writing system based on the letters of the classical Latin alphabet, derived from a form of the Greek alphabet which was in use in the ancient Greek city of Cumae, in southern I ...
supports
English
English usually refers to:
* English language
* English people
English may also refer to:
Peoples, culture, and language
* ''English'', an adjective for something of, from, or related to England
** English national ide ...
,
French,
German
German(s) may refer to:
* Germany (of or related to)
** Germania (historical use)
* Germans, citizens of Germany, people of German ancestry, or native speakers of the German language
** For citizens of Germany, see also German nationality law
**Ge ...
,
Italian
Italian(s) may refer to:
* Anything of, from, or related to the people of Italy over the centuries
** Italians, an ethnic group or simply a citizen of the Italian Republic or Italian Kingdom
** Italian language, a Romance language
*** Regional Ita ...
,
Vietnamese
Vietnamese may refer to:
* Something of, from, or related to Vietnam, a country in Southeast Asia
** A citizen of Vietnam. See Demographics of Vietnam.
* Vietnamese people, or Kinh people, a Southeast Asian ethnic group native to Vietnam
** Overse ...
,
Latin
Latin (, or , ) is a classical language belonging to the Italic branch of the Indo-European languages. Latin was originally a dialect spoken in the lower Tiber area (then known as Latium) around present-day Rome, but through the power of the ...
itself, and several other languages. Some languages make use of multiple alternate writing systems and thus also use several scripts; for example, in
Turkish, the
Arabic
Arabic (, ' ; , ' or ) is a Semitic language spoken primarily across the Arab world.Semitic languages: an international handbook / edited by Stefan Weninger; in collaboration with Geoffrey Khan, Michael P. Streck, Janet C. E.Watson; Walter ...
script was used before the 20th century but transitioned to Latin in the early part of the 20th century. For a list of languages supported by each script, see the
list of languages by writing system. More or less complementary to scripts are
symbols and Unicode
control character
In computing and telecommunication, a control character or non-printing character (NPC) is a code point (a number) in a character set, that does not represent a written symbol. They are used as in-band signaling to cause effects other than the ...
s.
The unified
diacritical characters and unified
punctuation characters frequently have the "common" or "inherited" script property. However, the individual scripts often have their own
punctuation
Punctuation (or sometimes interpunction) is the use of spacing, conventional signs (called punctuation marks), and certain typographical devices as aids to the understanding and correct reading of written text, whether read silently or aloud. An ...
and
diacritic
A diacritic (also diacritical mark, diacritical point, diacritical sign, or accent) is a glyph added to a letter or to a basic glyph. The term derives from the Ancient Greek (, "distinguishing"), from (, "to distinguish"). The word ''diacriti ...
s, so that many scripts include not only letters but also diacritic and other marks, punctuation, numerals and even their own idiosyncratic symbols and
space
Space is the boundless three-dimensional extent in which objects and events have relative position and direction. In classical physics, physical space is often conceived in three linear dimensions, although modern physicists usually cons ...
characters.
Unicode 15.0 defines 161 separate scripts, including 94 modern scripts and 67 ancient or historic scripts. More scripts are in the process for encoding or have been tentatively allocated for encoding in roadmaps.
[https://www.unicode.org/roadmaps/ Roadmaps to Unicode]
Definition and classification
When multiple languages make use of the same script, there are frequently some differences, particularly in diacritics and other marks. For example, Swedish and English both use the Latin script. However,
Swedish
Swedish or ' may refer to:
Anything from or related to Sweden, a country in Northern Europe. Or, specifically:
* Swedish language, a North Germanic language spoken primarily in Sweden and Finland
** Swedish alphabet, the official alphabet used by ...
includes the character ''
å'' (sometimes called a Swedish ''O''), while English has no such character. Nor does English make use of the diacritic ''
combining ring above'' for any character. In general, the languages sharing the same scripts share many of the same characters. Despite these peripheral differences in the Swedish and English writing systems, they are said to use the same Latin script. Thus, the Unicode abstraction of scripts is a basic organizing technique. The differences among different alphabets or writing systems remain and are supported through Unicode’s flexible scripts, combining marks and collation algorithms.
Script versus writing system
''
Writing system
A writing system is a method of visually representing verbal communication, based on a script and a set of rules regulating its use. While both writing and speech are useful in conveying messages, writing differs in also being a reliable fo ...
'' is sometimes treated as a synonym for "script". However, it also can be used as the specific concrete writing system supported by a script. For example, the
Vietnamese writing system
Vietnamese ( vi, tiếng Việt, links=no) is an Austroasiatic language originating from Vietnam where it is the national and official language. Vietnamese is spoken natively by over 70 million people, several times as many as the rest of the Au ...
is supported by the Latin script. A writing system may also cover more than one script; for example, the Japanese writing system makes use of the
Han
Han may refer to:
Ethnic groups
* Han Chinese, or Han People (): the name for the largest ethnic group in China, which also constitutes the world's largest ethnic group.
** Han Taiwanese (): the name for the ethnic group of the Taiwanese p ...
,
Hiragana
is a Japanese syllabary, part of the Japanese writing system, along with ''katakana'' as well as ''kanji''.
It is a phonetic lettering system. The word ''hiragana'' literally means "flowing" or "simple" kana ("simple" originally as contrast ...
and
Katakana
is a Japanese syllabary, one component of the Japanese writing system along with hiragana, kanji and in some cases the Latin script (known as rōmaji). The word ''katakana'' means "fragmentary kana", as the katakana characters are derived f ...
scripts.
Most writing systems can be broadly divided into several categories: logographic, syllabic, alphabetic (or segmental), abugida, abjad and featural; however, all features of any of these may be found in any given writing system in varying proportions, often making it difficult to purely categorize a system. The term ''
complex system'' is sometimes used to describe those where the admixture makes classification problematic.
Unicode supports all of these types of writing systems through its numerous scripts. Unicode also adds further properties to characters to help differentiate the various characters and the ways they behave within Unicode text-processing algorithms.
Special script property values
In addition to explicit or specific script properties, Unicode uses three special values:
;Common: Unicode can assign a character in the
UCS to a single script only. However, many characters—those that are not part of a formal natural-language writing system or are unified across many writing systems—may be used in more than one script (for example, currency signs, symbols, numerals and punctuation marks). In these cases Unicode defines them as belonging to the "common" script (
ISO 15924
ISO 15924, ''Codes for the representation of names of scripts'', is an international standard defining codes for writing systems or ''scripts'' (a "set of graphic characters used for the written form of one or more languages"). Each script is given ...
code "Zyyy").
;Inherited: Many diacritics and non-spacing combining characters may be applied to characters from more than one script. In these cases Unicode assigns them to the "inherited" script (ISO 15924 code Zinh), which means that they have the same script class as the base character with which they combine, and so in different contexts they may be treated as belonging to different scripts. For example, may combine either with to create a Latin ''ë'' or with for the
Cyrillic ''ё''. In the former case, it inherits the Latin script of the base character, whereas in the latter case, it inherits the Cyrillic script of the base character.
;Unknown: The value of "unknown" script (ISO 15924 code Zzzz) is given to unassigned, private-use, noncharacter, and surrogate code points.
Character categories within scripts
Unicode provides a general category property for each character. So in addition to belonging to a script every character also has a general category. Typically scripts include letter characters including: uppercase letters, lowercase letter and modifier letters. Some characters are considered titlecase letters for a few
precomposed ligatures such as Dz (U+01F2). Such titlecase ligatures are all in the Latin and Greek scripts and are all
compatibility characters, and therefore Unicode discourages their use by authors. It is unlikely that new titlecase letters will be added in the future.
Most writing systems do not differentiate between uppercase and lowercase letters. For those scripts all letters are categorized as "other letter" or "modifier letter". Ideographs such as Unihan ideographs are also categorized as "other letters". A few scripts do differentiate between uppercase and lowercase however: Latin, Cyrillic, Greek, Armenian, Georgian, and Deseret. Even for these scripts there are some letters that are neither uppercase nor lowercase.
Scripts can also contain any other general category character such as marks (diacritic and otherwise), numbers (numerals), punctuation, separators (word separators such as spaces), symbols and non-graphical format characters. These are included in a particular script when they are unique to that script. Other such characters are generally unified and included in the punctuation or diacritic blocks. However, the bulk of characters in any script (other than the common and inherited scripts) are letters.
List of scripts in Unicode
Unicode defines over a hundred script names (called "Alias" or "Property value alias"), based on the ISO 15924 list.
Unicode uses the "Common" script name for ISO 15924's Zyyy (code for undetermined script), "Inherited" for ISO 15924's Zinh (code for inherited script), and "Unknown" for ISO 15924's Zzzz (code for uncoded script). Not used are, among others, the ISO 15924 script codes: Zsym (Symbols) and Zmth (Mathematical notation). These are considered not to be scripts in Unicode sense.
Missing scripts in Unicode
With each new version of Unicode, new writing systems are added to the international character code. According to a statement by linguist Dr Deborah Anderson of UC Berkeley, there are over 100 writing systems that have not yet been included in Unicode.
According to a list of the project Missing Scripts by the University of Applied Sciences Mainz, Germany, the ANRT Nancy, France and UC Berkeley, USA, there are 294 known writing systems of mankind according to the current state of research (January 2022). 131 of them have not yet been encoded in Unicode, i.e. cannot yet be used on a computer or mobile phone.
See also
*
Latin script in Unicode
Over a thousand characters from the Latin script are encoded in the Unicode Standard, grouped in several basic and extended Latin blocks. The extended ranges contain mainly precomposed letters plus diacritics that are equivalently encoded with co ...
*
Unicode characters
The Unicode Consortium and the ISO/IEC JTC 1/SC 2/ WG 2 jointly collaborate on the list of the characters in the Universal Coded Character Set. The Universal Coded Character Set, most commonly called the Universal Character Set ( UCS, officia ...
*
Unicode symbols
In computing, a Unicode symbol is a Unicode character which is not part of a script used to write a natural language, but is nonetheless available for use as part of a text.
Many of the symbols are drawn from existing character sets or ISO/IEC or ...
*
Phonemic and phonetic orthography
References
External links
Script Encoding Initiative A project at UC Berkeley, USA, working to get more scripts included in the Unicode standard.
The World’s Writing Systems An overview of all 294 known writing systems, each with a typographic reference glyph and their Unicode status.
{{Writing systems
Scripts
Script may refer to:
Writing systems
* Script, a distinctive writing system, based on a repertoire of specific elements or symbols, or that repertoire
* Script (styles of handwriting)
** Script typeface, a typeface with characteristics of handw ...