Thai Industrial Standard 620-2533, commonly referred to as TIS-620, is the most common
character set
Character encoding is the process of assigning numbers to graphical characters, especially the written characters of human language, allowing them to be stored, transmitted, and transformed using digital computers. The numerical values that ...
and
character encoding
Character encoding is the process of assigning numbers to Graphics, graphical character (computing), characters, especially the written characters of Language, human language, allowing them to be Data storage, stored, Data communication, transmi ...
for the
Thai language. The standard is published by the
Thai Industrial Standards Institute
Thai or THAI may refer to:
* Of or from Thailand, a country in Southeast Asia
** Thai people, the dominant ethnic group of Thailand
** Thai language, a Tai-Kadai language spoken mainly in and around Thailand
*** Thai script
*** Thai (Unicode bloc ...
(TISI), an organ of the Ministry of Industry under the Royal Thai Government, and is the sole official standard for encoding Thai in
Thailand
Thailand ( ), historically known as Siam () and officially the Kingdom of Thailand, is a country in Southeast Asia, located at the centre of the Indochinese Peninsula, spanning , with a population of almost 70 million. The country is bo ...
.
The descriptive name of the standard is "Standard for Thai Character Codes for Computers" (Thai: รหัสสำหรับอักขระไทยที่ใช้กับคอมพิวเตอร์). "2533" refers to year 2533 of the
Buddhist Era
The Buddhist calendar is a set of lunisolar calendars primarily used in Cambodia, Laos, Myanmar, India, Sri Lanka, and Thailand as well as in Malaysia, Singapore and Vietnam by Chinese populations for religious or official occasions. While the ...
(1990), the year the present version of the standard was published; a previous revision, TIS 620-2529 (1986), is now obsolete. The code page layout is the same between the two editions.
TIS-620 is the
IANA
The Internet Assigned Numbers Authority (IANA) is a standards organization that oversees global IP address allocation, autonomous system number allocation, root zone management in the Domain Name System (DNS), media types, and other Interne ...
preferred charset name for TIS-620, and that charset name is used also for
ISO/IEC 8859-11
ISO/IEC 8859-11:2001, ''Information technology — 8-bit single-byte coded graphic character sets — Part 11: Latin/Thai alphabet'', is part of the ISO/IEC 8859 series of ASCII-based standard character encodings, first edition published in 2001. I ...
(which adds a no-break space character at 0xA0, which is unassigned in TIS-620). When the IANA name is used the codes are supplemented with the
C0 and C1 control codes
The C0 and C1 control code or control character sets define control codes for use in text by computer systems that use ASCII and derivatives of ASCII. The codes represent additional information about the text, such as the position of a cursor, ...
from
ISO/IEC 6429
ISO/IEC JTC 1, entitled "Information technology", is a joint technical committee (JTC) of the International Organization for Standardization (ISO) and the International Electrotechnical Commission (IEC). Its purpose is to develop, maintain and pr ...
.
Structure
TIS-620 is a conventionally structured
Extended ASCII
Extended ASCII is a repertoire of character encodings that include (most of) the original 96 ASCII character set, plus up to 128 additional characters. There is no formal definition of "extended ASCII", and even use of the term is sometimes critic ...
national character set that retains full compatibility with 7-bit
ASCII
ASCII ( ), abbreviated from American Standard Code for Information Interchange, is a character encoding standard for electronic communication. ASCII codes represent text in computers, telecommunications equipment, and other devices. Because of ...
and uses the 8-bit range hex A1 to FB for encoding the
Thai alphabet
The Thai script ( th, อักษรไทย, ) is the abugida used to write Thai, Southern Thai and many other languages spoken in Thailand. The Thai alphabet itself (as used to write Thai) has 44 consonant symbols ( th, พยัญชน ...
. Due to the complex combining nature of Thai vowels and diacritics, TIS-620 is intended for information interchange only, and an additional display engine is required to compose characters correctly.
Variants
A nearly identical version of TIS-620 has been adopted as
ISO/IEC 8859-11
ISO/IEC 8859-11:2001, ''Information technology — 8-bit single-byte coded graphic character sets — Part 11: Latin/Thai alphabet'', is part of the ISO/IEC 8859 series of ASCII-based standard character encodings, first edition published in 2001. I ...
in 2001, the sole difference being that ISO/IEC 8859-11 defines hex A0 as a
non-breaking space
In word processing and digital typesetting, a non-breaking space, , also called NBSP, required space, hard space, or fixed space (though it is not of fixed width), is a space character that prevents an automatic line break at its position. In s ...
, while TIS-620 leaves it undefined but reserved. (In practice, this small distinction is usually ignored.)
The ISO/IEC 8859-11 set has also been registered as ISO-IR-166 by
Ecma International
Ecma International () is a nonprofit standards organization for information and communication systems. It acquired its current name in 1994, when the European Computer Manufacturers Association (ECMA) changed its name to reflect the organizatio ...
, but this variation adds explicit escape codes for signaling the beginning and end of Thai character sequences.
The TIS-620 character set ordering has been used essentially as is within
Unicode
Unicode, formally The Unicode Standard,The formal version reference is is an information technology Technical standard, standard for the consistent character encoding, encoding, representation, and handling of Character (computing), text expre ...
(
ISO/IEC 10646
ISO/IEC JTC 1, entitled "Information technology", is a joint technical committee (JTC) of the International Organization for Standardization (ISO) and the International Electrotechnical Commission (IEC). Its purpose is to develop, maintain and pr ...
) as well. Unicode's
Thai block is U+0E01 through U+0E7F, and TIS-620 Thai characters can be converted to
UTF-16
UTF-16 (16-bit computing, 16-bit Unicode Transformation Format) is a character encoding capable of encoding all 1,112,064 valid code points of Unicode (in fact this number of code points is dictated by the design of UTF-16). The encoding is variab ...
simply by prefixing each byte with 0E and subtracting hex A0 from the value.
Character set
In the table above, 20 is the regular SPACE character. Code values 00-1F, 7F, 80-9F, A0, DB-DE and FC-FF are not assigned to characters by TIS-620.
Code values D1, D4-DA, E7-EE are
combining character
In digital typography, combining characters are characters that are intended to modify other characters. The most common combining characters in the Latin script are the combining diacritical marks (including combining accents).
Unicode also ...
s.
Further reading
*
References
External links
Official reference(in Thai)
* Announcement in Royal Gazette o
TIS 620-2533an
TIS 620-2529*
{{Character encoding
Encodings of Thai
Thai Industrial Standards
Computer-related introductions in 1986