The
Vietnamese language
Vietnamese ( vi, tiếng Việt, links=no) is an Austroasiatic languages, Austroasiatic language originating from Vietnam where it is the national language, national and official language. Vietnamese is spoken natively by over 70 million people, ...
is written with a
Latin script
The Latin script, also known as Roman script, is an alphabetic writing system based on the letters of the classical Latin alphabet, derived from a form of the Greek alphabet which was in use in the ancient Greek city of Cumae, in southern Italy ...
with diacritics (
accent tones) which requires several accommodations when typing on phone or computers. Software-based systems are a form of writing Vietnamese on phones or computers with software that can be installed on the device or from third party software such as
UniKey.
Telex
The telex network is a station-to-station switched network of teleprinters similar to a Public switched telephone network, telephone network, using telegraph-grade connecting circuits for two-way text-based messages. Telex was a major method of ...
is the oldest input method devised to encode the Vietnamese language with its tones. Other input methods may also include
VNI
VNI Software Company is a developer of various education, entertainment, office, and utility computer software, software packages. They are known for developing an Character encoding, encoding (VNI encoding) and a popular input method (VNI Input) ...
(Number key-based keyboard) and
VIQR
Vietnamese Quoted-Readable (usually abbreviated VIQR), also known as Vietnet, is a convention for writing Vietnamese using ASCII characters encoded in only 7 bits, making possible for Vietnamese to be supported in computing and communication system ...
. VNI input method is not to be confused with VNI code page.
Historically, Vietnamese was also written in ', which is mainly used for ceremonial and traditional purposes in recent times, and remains in the field of historians and
philologists
Philology () is the study of language in oral and written historical sources; it is the intersection of textual criticism, literary criticism, history, and linguistics (with especially strong ties to etymology). Philology is also defined as th ...
. There have been attempts to type
chữ Hán
Chữ Hán (𡨸漢, literally "Chinese characters", ), Chữ Nho (𡨸儒, literally "Confucian characters", ) or Hán tự (漢字, ), is the Vietnamese term for Chinese characters, used to write Văn ngôn (which is a form of Classical Chinese ...
and
chữ Nôm
Chữ Nôm (, ; ) is a logographic writing system formerly used to write the Vietnamese language. It uses Chinese characters (''Chữ Hán'') to represent Sino-Vietnamese vocabulary and some native Vietnamese words, with other words represented ...
with existing Vietnamese input methods, but they are not widespread. Sometimes, Vietnamese can be typed without tone marks, which Vietnamese speakers can usually guess depending on context.
Fonts and character encodings
Vietnamese alphabet
Character encodings
There are as many as 46
character encoding
Character encoding is the process of assigning numbers to Graphics, graphical character (computing), characters, especially the written characters of Language, human language, allowing them to be Data storage, stored, Data communication, transmi ...
s for representing the
Vietnamese alphabet
The Vietnamese alphabet ( vi, chữ Quốc ngữ, lit=script of the National language) is the modern Latin writing script or writing system for Vietnamese language, Vietnamese. It uses the Latin script based on Romance languages originally develo ...
.
Unicode
Unicode, formally The Unicode Standard,The formal version reference is is an information technology Technical standard, standard for the consistent character encoding, encoding, representation, and handling of Character (computing), text expre ...
has become the most popular form for many of the world's writing systems, due to its great compatibility and software support. Diacritics may be encoded either as
combining character
In digital typography, combining characters are characters that are intended to modify other characters. The most common combining characters in the Latin script are the combining diacritical marks (including combining accents).
Unicode also ...
s or as
precomposed character
A precomposed character (alternatively composite character or decomposable character) is a Unicode entity that can also be defined as a sequence of one or more other characters. A precomposed character may typically represent a letter with a diacri ...
s, which are scattered among the
Latin Extended-A
Latin Extended-A is a Unicode block and is the third block of the Unicode standard. It encodes Latin letters from the Latin ISO character sets other than Latin-1 (which is already encoded in the Latin-1 Supplement block) and also legacy character ...
,
Latin Extended-B
Latin Extended-B is the fourth block (0180-024F) of the Unicode Standard. It has been included since version 1.0, where it was only allocated to the code points 0180-01FF and contained 113 characters. During unification with ISO 10646 for version ...
, and
Latin Extended Additional
Latin Extended Additional is a Unicode block.
The characters in this block are mostly precomposed combinations of Latin letters with one or more general diacritical marks. Ninety of the characters are used in the Vietnamese alphabet
The Vietna ...
blocks. The
Vietnamese đồng
The dong (Vietnamese: ''đồng'', Chữ Nôm: 銅) (; ; sign: ₫ or informally đ in Vietnamese; code: VND) has been the currency of Vietnam since 3 May 1978. It is issued by the State Bank of Vietnam. The dong was also the currency of the pre ...
symbol is encoded in the
Currency Symbols
A currency symbol or currency sign is a graphic symbol used to denote a currency unit. Usually it is defined by the monetary authority, like the national central bank for the currency concerned.
In formatting, the symbol can use various format ...
block. Historically, the Vietnamese language used other characters beyond the modern alphabet. The
Middle Vietnamese
Vietnamese ( vi, tiếng Việt, links=no) is an Austroasiatic language originating from Vietnam where it is the national and official language. Vietnamese is spoken natively by over 70 million people, several times as many as the rest of the ...
letter
B with flourish (ꞗ) is included in the
Latin Extended-D
Latin Extended-D is a Unicode block containing Latin characters for phonetic, Mayanist, and Medieval transcription and notation systems. 89 of the characters in this block are for medieval characters proposed by the Medieval Unicode Font Initiati ...
block. The
apex
The apex is the highest point of something. The word may also refer to:
Arts and media Fictional entities
* Apex (comics), a teenaged super villainess in the Marvel Universe
* Ape-X, a super-intelligent ape in the Squadron Supreme universe
*Apex ...
is not included in Unicode, but may serve as a rough approximation.
Early versions of Unicode assigned the characters and for the purpose of placing these marks beside a circumflex, as is common in Vietnamese typography. These two characters have been deprecated; and are now used regardless of any present circumflex.
For systems that lack support for Unicode, dozens of 8-bit Vietnamese
code page
In computing, a code page is a character encoding and as such it is a specific association of a set of printable characters and control characters with unique numbers. Typically each number represents the binary value in a single byte. (In some co ...
s have been designed.
The most commonly used of them were
VISCII
VISCII is an unofficially-defined modified ASCII character encoding for using the Vietnamese language with computers. It should not be confused with the similarly-named officially registered VSCII encoding. VISCII keeps the 95 printable chara ...
,
VSCII
VSCII (Vietnamese Standard Code for Information Interchange), also known as TCVN 5712, ISO-IR-180, .VN, ABC or simply the TCVN encodings, is a set of three closely related Vietnamese national standard character encodings for using the Vietname ...
(TCVN 5712:1993),
VNI
VNI Software Company is a developer of various education, entertainment, office, and utility computer software, software packages. They are known for developing an Character encoding, encoding (VNI encoding) and a popular input method (VNI Input) ...
,
VPS and
Windows-1258
Windows-1258 is a code page used in Microsoft Windows to represent Vietnamese texts. It makes use of combining diacritical marks.
Windows-1258 is compatible with neither the Vietnamese standard ( TCVN 5712 / VSCII), nor the various other encodin ...
.
Where
ASCII
ASCII ( ), abbreviated from American Standard Code for Information Interchange, is a character encoding standard for electronic communication. ASCII codes represent text in computers, telecommunications equipment, and other devices. Because of ...
is required, such as when ensuring readability in plain text e-mail, Vietnamese letters are often encoded according to
Vietnamese Quoted-Readable (VIQR) or
VSCII
VSCII (Vietnamese Standard Code for Information Interchange), also known as TCVN 5712, ISO-IR-180, .VN, ABC or simply the TCVN encodings, is a set of three closely related Vietnamese national standard character encodings for using the Vietname ...
Mnemonic (VSCII-MNEM),
though usage of either variable-width scheme has declined dramatically following the adoption of Unicode on the
World Wide Web
The World Wide Web (WWW), commonly known as the Web, is an information system enabling documents and other web resources to be accessed over the Internet.
Documents and downloadable media are made available to the network through web se ...
. For instance, support for all above mentioned 8-bit encodings, with the exception of Windows-1258, was dropped from
Mozilla
Mozilla (stylized as moz://a) is a free software community founded in 1998 by members of Netscape. The Mozilla community uses, develops, spreads and supports Mozilla products, thereby promoting exclusively free software and open standards, wi ...
software in 2014.
Many Vietnamese fonts intended for
desktop publishing
Desktop publishing (DTP) is the creation of documents using page layout software on a personal ("desktop") computer. It was first used almost exclusively for print publications, but now it also assists in the creation of various forms of online c ...
are encoded in
VNI
VNI Software Company is a developer of various education, entertainment, office, and utility computer software, software packages. They are known for developing an Character encoding, encoding (VNI encoding) and a popular input method (VNI Input) ...
or TCVN3 (
VSCII
VSCII (Vietnamese Standard Code for Information Interchange), also known as TCVN 5712, ISO-IR-180, .VN, ABC or simply the TCVN encodings, is a set of three closely related Vietnamese national standard character encodings for using the Vietname ...
).
Such fonts are known as "ABC fonts".
Popular
web browser
A web browser is application software for accessing websites. When a user requests a web page from a particular website, the browser retrieves its files from a web server and then displays the page on the user's screen. Browsers are used on ...
s lack support for specialty Vietnamese encodings, so any webpage that uses these fonts appears as unintelligible ''
mojibake
Mojibake ( ja, 文字化け; , "character transformation") is the garbled text that is the result of text being decoded using an unintended character encoding. The result is a systematic replacement of symbols with completely unrelated ones, ofte ...
'' on systems without them installed.
Vietnamese often stacks diacritics, so typeface designers must take care to prevent stacked diacritics from colliding with adjacent letters or lines. When a tone mark is used together with another diacritic, offsetting the tone mark to the right preserves consistency and avoids slowing down
saccade
A saccade ( , French for ''jerk'') is a quick, simultaneous movement of both eyes between two or more phases of fixation in the same direction.Cassin, B. and Solomon, S. ''Dictionary of Eye Terminology''. Gainesville, Florida: Triad Publishi ...
s. In advertising signage and in
cursive
Cursive (also known as script, among other names) is any style of penmanship in which characters are written joined in a flowing manner, generally for the purpose of making writing faster, in contrast to block letters. It varies in functionalit ...
handwriting, diacritics often take forms unfamiliar to other Latin alphabets. For example, the lowercase letter I retains its
tittle
A tittle or superscript dot is a small distinguishing mark, such as a diacritic in the form of a dot on a letter (for example, lowercase ''i'' or ''j''). The tittle is an integral part of the glyph of ''i'' and ''j'', but dot (diacritic), diacri ...
in ''ì'', ''ỉ'', ''ĩ'', and ''í''. These nuances are rarely accounted for in computing environments.
Approaches to character encoding
Vietnamese writing requires 134 additional letters (between both cases) besides the 52 already present in ASCII.
This exceeds the 128 additional characters available in a conventional
extended ASCII
Extended ASCII is a repertoire of character encodings that include (most of) the original 96 ASCII character set, plus up to 128 additional characters. There is no formal definition of "extended ASCII", and even use of the term is sometimes critic ...
encoding. Although this can be solved by using a
variable-width encoding
A variable-width encoding is a type of character encoding scheme in which codes of differing lengths are used to encode a character set (a repertoire of symbols) for representation, usually in a computer. Most common variable-width encodings are ...
(as is done by
UTF-8
UTF-8 is a variable-width encoding, variable-length character encoding used for electronic communication. Defined by the Unicode Standard, the name is derived from ''Unicode'' (or ''Universal Coded Character Set'') ''Transformation Format 8-bit'' ...
), a number of approaches have been used by other encodings to support Vietnamese without doing so:
* Replace at least six ASCII characters, selected either for being uncommon in Vietnamese, and/or for being non-invariant in
ISO 646
ISO/IEC 646 is a set of ISO/IEC standards, described as ''Information technology — ISO 7-bit coded character set for information interchange'' and developed in cooperation with ASCII at least since 1964. Since its first edition in 1 ...
or
DEC NRCS
The National Replacement Character Set (NRCS) was a feature supported by later models of Digital Equipment Corporation, Digital's (DEC) computer terminal systems, starting with the VT220, VT200 series in 1983. NRCS allowed individual characters fro ...
(as in
VNI for DOS).
* Drop the uppercase letters which are least frequently used,
or all uppercase letters with tone marks (as in
VSCII-3 (TCVN3)). These letters may still be supplied by means of all-capital fonts.
* Drop forms of the letter Y with tone marks, necessitating
use of the letter in those circumstances. This approach was rejected by the designers of
VISCII
VISCII is an unofficially-defined modified ASCII character encoding for using the Vietnamese language with computers. It should not be confused with the similarly-named officially registered VSCII encoding. VISCII keeps the 95 printable chara ...
on the basis that a character encoding should not attempt to settle a spelling reform issue.
* Replace at least six
C0 control characters
The C0 and C1 control code or control character sets define control codes for use in text by computer systems that use ASCII and derivatives of ASCII. The codes represent additional information about the text, such as the position of a cursor, ...
(as in
VISCII
VISCII is an unofficially-defined modified ASCII character encoding for using the Vietnamese language with computers. It should not be confused with the similarly-named officially registered VSCII encoding. VISCII keeps the 95 printable chara ...
,
VSCII-1 (TCVN1) and
VPS).
* Use combining characters, allowing one vowel with accents to be fully represented using a sequence of characters (as in
VNI
VNI Software Company is a developer of various education, entertainment, office, and utility computer software, software packages. They are known for developing an Character encoding, encoding (VNI encoding) and a popular input method (VNI Input) ...
,
VSCII-2 (TCVN2),
Windows-1258
Windows-1258 is a code page used in Microsoft Windows to represent Vietnamese texts. It makes use of combining diacritical marks.
Windows-1258 is compatible with neither the Vietnamese standard ( TCVN 5712 / VSCII), nor the various other encodin ...
and
ANSEL
ANSEL, the American National Standard for Extended Latin Alphabet Coded Character Set for Bibliographic Use, was a character set used in text encoding. It provided a table of coded values for the representation of characters of the extended Latin ...
).
Font substitution
Many fonts support a subset of the Latin writing system that omits much of the Vietnamese alphabet. Due to the high density of Vietnamese-specific characters in Vietnamese text, Web browsers that implement
font substitution
Font substitution is the process of using one typeface in place of another when the intended typeface either is not available or does not contain glyphs for the required characters.
Font substitution can be aided by:
* classifying fonts into ge ...
reliably produce a
ransom note effect
In typography, the ransom note effect is the result of using an excessive number of juxtaposed typefaces. It takes its name from the appearance of a stereotypical ransom note, with the message formed from words or letters cut randomly from a ma ...
when the webpage specifies an inadequate font.
'
Unicode includes over 10,000 ' characters as part of Unicode's repertoire of
CJK Unified Ideographs
The Chinese, Japanese and Korean (CJK) scripts share a common background, collectively known as CJK characters. In the process called Han unification, the common (shared) characters were identified and named CJK Unified Ideographs. As of Unicode ...
. Of these characters, 10,082 can be found in the
CJK Unified Ideographs Extension B
CJK Unified Ideographs Extension B is a Unicode block
A Unicode block is one of several contiguous ranges of numeric character codes ( code points) of the Unicode character set that are defined by the Unicode Consortium for administrative and ...
block, while the rest are distributed between the
CJK Unified Ideographs
The Chinese, Japanese and Korean (CJK) scripts share a common background, collectively known as CJK characters. In the process called Han unification, the common (shared) characters were identified and named CJK Unified Ideographs. As of Unicode ...
,
CJK Unified Ideographs Extension A, and
CJK Unified Ideographs Extension C
__FORCETOC__
CJK Unified Ideographs Extension C is a Unicode block containing rare and historic CJK ideographs for Chinese, Japanese, Korean, and Vietnamese.
The block has dozens of ideographic variation sequences registered in the Unicode Ide ...
blocks. A further 1,028 characters, including over 400 characters specific to the
Tày language
Tày or Thổ (a name shared with the unrelated Thổ and Cuoi languages) is the major Tai language of Vietnam, spoken by more than a million Tày people
The Tày people, also known as the Thô, T'o, Tai Tho, Ngan, Phen, Thu Lao, or Pa Di, ...
, are encoded in the
CJK Unified Ideographs Extension E
CJK Unified Ideographs Extension E is a Unicode block
A Unicode block is one of several contiguous ranges of numeric character codes ( code points) of the Unicode character set that are defined by the Unicode Consortium for administrative and d ...
block. The characters are taken from the Vietnamese standards
TCVN 5773:1993 and
TCVN 6909:2001 rror for TCVN 6056:1995? as well as from research by the Han-Nom Research Institute and other groups.
All the characters in TCVN 5773:1993 and about 95% of the characters in TCVN 6909:2001
rror for TCVN 6056:1995?have corresponding codepoints in Unicode 5.1, though TCVN 5773:1993 itself mapped most of its characters to the
Private Use Area
In Unicode, a Private Use Area (PUA) is a range of code points that, by definition, will not be assigned characters by the Unicode Consortium. Three private use areas are defined: one in the Basic Multilingual Plane (), and one each in, and nearl ...
of Unicode. Unicode 13.0 added two diacritical characters to the
Ideographic Symbols and Punctuation
Ideographic Symbols and Punctuation is a Unicode block containing symbols and punctuation marks used by ideographic scripts such as Tangut and Nüshu.
History
The following Unicode-related documents record the purpose and process of defining ...
block that were commonly used to indicate borrowed characters in .
The two most comprehensive ' fonts are the
Vietnamese Nôm Preservation Foundation's '' Light'' and the community-developed ''HAN NOM A''/''HAN NOM B'', both of which place a large number of unstandardized characters in the
Private Use Areas
In Unicode, a Private Use Area (PUA) is a range of code points that, by definition, will not be assigned characters by the Unicode Consortium. Three private use areas are defined: one in the Basic Multilingual Plane (), and one each in, and nearl ...
.
The Unicode Consortium's
Unihan
Han unification is an effort by the authors of Unicode and the Universal Character Set to map multiple character sets of the Han characters of the so-called CJK languages into a single set of unified characters. Han characters are a feature s ...
database includes Vietnamese readings of some characters but does not distinguish between
Sino-Vietnamese and ' readings.
Like other
CJKV writing systems, ' is traditionally
written vertically, from top to bottom and right to left.
and may also be annotated using
ruby character
Ruby characters or rubi characters () are small, annotative gloss (annotation), glosses that are usually placed above or to the right of logogram, logographic characters of languages in the East Asian cultural sphere, such as Sinitic languages, Ch ...
s, which is the same as
chữ Quốc Ngữ
The Vietnamese alphabet ( vi, chữ Quốc ngữ, lit=script of the National language) is the modern Latin writing script or writing system for Vietnamese. It uses the Latin script based on Romance languages originally developed by Portuguese m ...
for Vietnamese.
Text input
A purely physical Vietnamese keyboard would be impractical, due to the sheer number of letter-diacritic-diacritic combinations in the alphabet e.g. á, à, ả, ã, ạ, â, ấ, etc. Instead, Vietnamese input relies on formulaic software-based keyboard layouts,
virtual keyboard
A virtual keyboard is a software component that allows the Input device, input of characters without the need for physical keys. The interaction with the virtual Computer keyboard, keyboard happens mostly via a touchscreen interface, but can also ...
s, or
input method
An input method (or input method editor, commonly abbreviated IME) is an operating system component or program that enables users to generate characters not natively available on their input devices by using sequences of characters (or mouse o ...
s (also known as IMEs).
Keyboard layouts
Vietnamese keyboard layouts rely on
dead key
A dead key is a special kind of modifier key on a mechanical typewriter, or computer keyboard, that is typically used to attach a specific diacritic to a base letter. The dead key does not generate a (complete) character by itself, but modifies th ...
s to compose letters with diacritics. Most desktop operating systems include a Vietnamese keyboard layout similar to , a Vietnamese national standard. Previously, typewriters used an AZERTY-based Vietnamese layout (AĐERTY).
Input methods
The three most common Vietnamese input methods are
Telex
The telex network is a station-to-station switched network of teleprinters similar to a Public switched telephone network, telephone network, using telegraph-grade connecting circuits for two-way text-based messages. Telex was a major method of ...
,
VNI
VNI Software Company is a developer of various education, entertainment, office, and utility computer software, software packages. They are known for developing an Character encoding, encoding (VNI encoding) and a popular input method (VNI Input) ...
, and
VIQR
Vietnamese Quoted-Readable (usually abbreviated VIQR), also known as Vietnet, is a convention for writing Vietnamese using ASCII characters encoded in only 7 bits, making possible for Vietnamese to be supported in computing and communication system ...
. Telex indicates diacritics using letters that are unlikely to appear at the end of a word, while VNI repurposes the number keys or function keys and VIQR repurposes various punctuation marks. The Telex and VIQR conventions originated in an earlier era of
telex
The telex network is a station-to-station switched network of teleprinters similar to a Public switched telephone network, telephone network, using telegraph-grade connecting circuits for two-way text-based messages. Telex was a major method of ...
machines and typewriters, respectively.
Support for these input methods is provided by input method editors (IMEs), which are known in Vietnamese as ', literally "peckers" or "percussion" in more general terms. IMEs may be provided by the operating system, installed as a third-party application, installed as a
browser extension
A browser extension is a small software module for customizing a web browser. Browsers typically allow a variety of extensions, including user interface modifications, cookie management, ad blocking, and the custom scripting and styling of web p ...
, or provided by an individual website in the form of a
script
Script may refer to:
Writing systems
* Script, a distinctive writing system, based on a repertoire of specific elements or symbols, or that repertoire
* Script (styles of handwriting)
** Script typeface, a typeface with characteristics of handw ...
. Common third-party applications include GoTiengViet,
UniKey, VietKey,
VPSKeys,
WinVNKey, and xvnkb. On
Unix-like
A Unix-like (sometimes referred to as UN*X or *nix) operating system is one that behaves in a manner similar to a Unix system, although not necessarily conforming to or being certified to any version of the Single UNIX Specification. A Unix-li ...
operating systems, the
IBus
When drinking beer, there are many factors to be considered. Principal among them are bitterness, the variety of flavours present in the beverage and their intensity, alcohol content, and colour. Standards for those characteristics allow a more o ...
and
SCIM frameworks both support Vietnamese. IME scripts such as AVIM, Mudim, and VietTyping can be found on most Vietnamese
message board
An Internet forum, or message board, is an online discussion site where people can hold conversations in the form of posted messages. They differ from chat rooms in that messages are often longer than one line of text, and are at least temporar ...
s, the
Vietnamese Wikipedia
The Vietnamese Wikipedia ( vi, Wikipedia tiếng Việt) is the Vietnamese-language edition of Wikipedia, a free, publicly editable, online encyclopedia supported by the Wikimedia Foundation. As with other language editions of Wikipedia, the ...
, and other text-intensive websites. The Vietnamese Web browser
Cốc Cốc comes with an input method built-in.
Input methods allow words to be composed in a more flexible order than keyboard layouts allow. For example, to enter the word "" using the TCVN 6064:1995 keyboard layout, one must type , in that order. By contrast, most IMEs permit the user to insert diacritics at the end of the word: in Telex, in VNI, or in VIQR. Some IMEs even allow diacritics to be entered before their base letters. Depending on an IME's implementation, it may also be possible to edit an existing word's diacritics without retyping the word.
Some
virtual keyboard
A virtual keyboard is a software component that allows the Input device, input of characters without the need for physical keys. The interaction with the virtual Computer keyboard, keyboard happens mostly via a touchscreen interface, but can also ...
s supplement the standard dead keys with dedicated shortcut keys. For example, with the VIQR keyboard built into
iOS
iOS (formerly iPhone OS) is a mobile operating system created and developed by Apple Inc. exclusively for its hardware. It is the operating system that powers many of the company's mobile devices, including the iPhone; the term also includes ...
, it is possible to add a
horn
Horn most often refers to:
*Horn (acoustic), a conical or bell shaped aperture used to guide sound
** Horn (instrument), collective name for tube-shaped wind musical instruments
*Horn (anatomy), a pointed, bony projection on the head of various ...
to "U" by tapping either or the dedicated key, which has no analogue on a physical keyboard.
Borrowing a feature common amongst
Chinese input methods, some Vietnamese IMEs allow one to skip diacritics altogether and instead, after typing the base letters, the user can select the accented word from a candidate list. In order to provide this
autocomplete
Autocomplete, or word completion, is a feature in which an application predicts the rest of a word a user is typing. In Android and iOS smartphones, this is called predictive text. In graphical user interfaces, users can typically press the tab ...
list, the IME may need to communicate with a
Web service. Some IMEs also use candidate lists to allow the user to convert text from the Vietnamese alphabet to ', because there is no one-to-one correspondence between alphabetic words and ' characters.
Other considerations
Typical Vietnamese text contains a high proportion of compound words. Compound words are never hyphenated in contemporary usage, so
spell checker In software, a spell checker (or spelling checker or spell check) is a software feature that checks for misspellings in a text. Spell-checking features are often embedded in software or services, such as a word processor, email client, electronic di ...
s are limited to checking individual syllables unless a statistical
language model
A language model is a probability distribution over sequences of words. Given any sequence of words of length , a language model assigns a probability P(w_1,\ldots,w_m) to the whole sequence. Language models generate probabilities by training on ...
is consulted.
Vietnamese has rigid spelling rules and few exceptions, so
text-to-speech engines may avoid dictionary lookups except when encountering a foreign loan word. TTS engines must account for
tones, which are essential to the meaning of any Vietnamese word e.g. má (mother) is a different word to mà (but).
Internationalized user interfaces are generally unable to use the full complement of
Vietnamese pronouns
In general, a Vietnamese pronoun ( vi, đại từ nhân xưng, translation=person-calling pronoun, or ) can serve as a noun phrase. In Vietnamese, a pronoun usually connotes a degree of family relationship or kinship. In polite speech, the aspect ...
that would be expected in a traditional social setting, even when much is known about the user. Instead, user interfaces typically use generic pronouns such as and , some of which make potentially incorrect assumptions about the user's age and relationship to other users. For example, when a
social media
Social media are interactive media technologies that facilitate the creation and sharing of information, ideas, interests, and other forms of expression through virtual communities and networks. While challenges to the definition of ''social medi ...
platform notifies a user about a younger user, it may refer to the latter in the third person as instead of , leading the user to misinterpret the notification as a reference to someone else.
See also
*
Chinese input methods for computers
Chinese input methods are methods that allow a computer user to input Chinese characters. Most, if not all, Chinese input methods fall into one of two categories: phonetic readings or root shapes. Methods under the phonetic category usually are e ...
*
Japanese language and computers
In relation to the Japanese language and computers many adaptation issues arise, some unique to Japanese and others common to languages which have a very large number of characters. The number of characters needed in order to write in English is ...
*
Korean language and computers
The writing system of the Korean language is a syllabic alphabet of character parts () organized into character blocks () representing syllables. The character parts cannot be written from left to right on the computer, as in many Western lan ...
References
Further reading
*
External links
Computing in Vietnamese: Progress & Challenges2005 International Macintosh Users Group presentation
Vietnamese Conversions{snd online tool for recovering Vietnamese
mojibake
Mojibake ( ja, 文字化け; , "character transformation") is the garbled text that is the result of text being decoded using an unintended character encoding. The result is a systematic replacement of symbols with completely unrelated ones, ofte ...
Natural language and computing
Science and technology in Vietnam
Vietnamese character input
Vietnamese software