HOME

TheInfoList




This article compares
Unicode Unicode, formally the Unicode Standard, is an information technology Technical standard, standard for the consistent character encoding, encoding, representation, and handling of Character (computing), text expressed in most of the world's wri ...

Unicode
encodings. Two situations are considered:
8-bit-clean The phrase ''8-bit clean'' describes a computer system A computer is a machine that can be programmed to carry out Sequence, sequences of arithmetic or logical operations automatically. Modern computers can perform generic sets of operations ...
environments (which can be assumed), and environments that forbid use of
byte The byte is a unit of digital information that most commonly consists of eight bit The bit is a basic unit of information in computing Computing is any goal-oriented activity requiring, benefiting from, or creating computing machinery. It ...
values that have the high bit set. Originally such prohibitions were to allow for links that used only seven data bits, but they remain in some standards and so some standard-conforming software must generate messages that comply with the restrictions.
Standard Compression Scheme for Unicode The Standard Compression Scheme for Unicode (SCSU) is a Unicode Technical Standard for reducing the number of bytes needed to represent Unicode text, especially if that text uses mostly characters from one or a small number of per-language characte ...
and
Binary Ordered Compression for Unicode Binary Ordered Compression for Unicode (BOCU) is a MIME #REDIRECT Mime artist A mime artist or just mime (from Greek , , "imitator, actor") is a person who uses mime as a theatrical medium or as a performance art Performance art is an ar ...
are excluded from the comparison tables because it is difficult to simply quantify their size.


Compatibility issues

A
UTF-8 UTF-8 is a variable-width character encoding Character encoding is the process of assigning numbers to Graphics, graphical character (computing), characters, especially the written characters of Language, human language, allowing them to be ...
file that contains only
ASCII ASCII ( ), abbreviated from American Standard Code for Information Interchange, is a character encoding Character encoding is the process of assigning numbers to graphical Graphics (from Greek Greek may refer to: Greece Anything of, ...
characters is identical to an ASCII file. Legacy programs can generally handle UTF-8 encoded files, even if they contain non-ASCII characters. For instance, the C
printf printf format string refers to a control parameter used by a class of functions Function or functionality may refer to: Computing * Function key A function key is a key on a computer A computer is a machine that can be programmed t ...

printf
function can print a UTF-8 string, as it only looks for the ASCII '%' character to define a formatting string, and prints all other bytes unchanged, thus non-ASCII characters will be output unchanged.
UTF-16 UTF-16 (16-bit 16-bit microcomputer A microcomputer is a small, relatively inexpensive computer A computer is a machine that can be programmed to carry out sequences of arithmetic or logical operations automatically. Modern computers ...
and
UTF-32 UTF-32 (32-bit The bit is a basic unit of information in computing Computing is any goal-oriented activity requiring, benefiting from, or creating computing machinery. It includes the study and experimentation of algorithm of an algorith ...
are incompatible with ASCII files, and thus require
Unicode Unicode, formally the Unicode Standard, is an information technology Technical standard, standard for the consistent character encoding, encoding, representation, and handling of Character (computing), text expressed in most of the world's wri ...

Unicode
-aware programs to display, print and manipulate them, even if the file is known to contain only characters in the ASCII subset. Because they contain many zero bytes, the strings cannot be manipulated by normal
null-terminated string In computer programming Computer programming is the process of designing and building an executable computer program to accomplish a specific computing result or to perform a particular task. Programming involves tasks such as analysis, generat ...
handling for even simple operations such as copy. Therefore, even on most UTF-16 systems such as
Windows Microsoft Windows, commonly referred to as Windows, is a group of several proprietary {{Short pages monitor