UTF-7
UTF-7 (7- bit Unicode Transformation Format) is an obsolete variable-length character encoding for representing Unicode text using a stream of ASCII characters. It was originally intended to provide a means of encoding Unicode text for use in Internet E-mail messages that was more efficient than the combination of UTF-8 with quoted-printable. UTF-7 (according to its RFC) isn't a " Unicode Transformation Format", as the definition can only encode code points in the BMP (the first 65536 Unicode code points, which does not include emojis and many other characters). However if a UTF-7 translator is to/from UTF-16 then it can (and probably does) encode each surrogate half as though it was a 16-bit code point, and thus can encode all code points. It is unclear if other UTF-7 software (such as translators to UTF-32 or UTF-8) support this. UTF-7 has never been an official standard of the Unicode Consortium. It is known to have security issues, which is why software has been changed ... [...More Info...]       [...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]   |
|
Modified Base64
In computer programming, Base64 is a group of binary-to-text encoding schemes that transforms binary data into a sequence of printable characters, limited to a set of 64 unique characters. More specifically, the source binary data is taken 6 bits at a time, then this group of 6 bits is mapped to one of 64 unique characters. As with all binary-to-text encoding schemes, Base64 is designed to carry data stored in binary formats across channels that only reliably support text content. Base64 is particularly prevalent on the World Wide Web where one of its uses is the ability to embed image files or other binary assets inside textual assets such as HTML and CSS files. Base64 is also widely used for sending e-mail attachments, because SMTP – in its original form – was designed to transport 7-bit ASCII characters only. Encoding an attachment as Base64 before sending, and then decoding when received, assures older SMTP servers will not interfere with the attachment. Base ... [...More Info...]       [...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]   |
|
Unicode Transformation Format
Unicode or ''The Unicode Standard'' or TUS is a character encoding standard maintained by the Unicode Consortium designed to support the use of text in all of the world's writing systems that can be digitized. Version 16.0 defines 154,998 characters and 168 scripts used in various ordinary, literary, academic, and technical contexts. Unicode has largely supplanted the previous environment of a myriad of incompatible character sets used within different locales and on different computer architectures. The entire repertoire of these sets, plus many additional characters, were merged into the single Unicode set. Unicode is used to encode the vast majority of text on the Internet, including most web pages, and relevant Unicode support has become a common consideration in contemporary software development. Unicode is ultimately capable of encoding more than 1.1 million characters. The Unicode character repertoire is synchronized with ISO/IEC 10646, each being code-for-code id ... [...More Info...]       [...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]   |
|
US-ASCII
ASCII ( ), an acronym for American Standard Code for Information Interchange, is a character encoding standard for representing a particular set of 95 (English language focused) printable character, printable and 33 control character, control characters a total of 128 code points. The set of available punctuation had significant impact on the syntax of computer languages and text markup. ASCII hugely influenced the design of character sets used by modern computers; for example, the first 128 code points of Unicode are the same as ASCII. ASCII encodes each code-point as a value from 0 to 127 storable as a seven-bit integer. Ninety-five code-points are printable, including digits ''0'' to ''9'', lowercase letters ''a'' to ''z'', uppercase letters ''A'' to ''Z'', and commonly used punctuation symbols. For example, the letter is represented as 105 (decimal). Also, ASCII specifies 33 non-printing control codes which originated with ; most of which are now obsolete. The control cha ... [...More Info...]       [...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]   |
|
Unicode
Unicode or ''The Unicode Standard'' or TUS is a character encoding standard maintained by the Unicode Consortium designed to support the use of text in all of the world's writing systems that can be digitized. Version 16.0 defines 154,998 Character (computing), characters and 168 script (Unicode), scripts used in various ordinary, literary, academic, and technical contexts. Unicode has largely supplanted the previous environment of a myriad of incompatible character sets used within different locales and on different computer architectures. The entire repertoire of these sets, plus many additional characters, were merged into the single Unicode set. Unicode is used to encode the vast majority of text on the Internet, including most web pages, and relevant Unicode support has become a common consideration in contemporary software development. Unicode is ultimately capable of encoding more than 1.1 million characters. The Unicode character repertoire is synchronized with Univers ... [...More Info...]       [...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]   |
|
UTF-8
UTF-8 is a character encoding standard used for electronic communication. Defined by the Unicode Standard, the name is derived from ''Unicode Transformation Format 8-bit''. Almost every webpage is transmitted as UTF-8. UTF-8 supports all 1,112,064 valid Unicode code points using a variable-width encoding of one to four one- byte (8-bit) code units. Code points with lower numerical values, which tend to occur more frequently, are encoded using fewer bytes. It was designed for backward compatibility with ASCII: the first 128 characters of Unicode, which correspond one-to-one with ASCII, are encoded using a single byte with the same binary value as ASCII, so that a UTF-8-encoded file using only those characters is identical to an ASCII file. Most software designed for any extended ASCII can read and write UTF-8, and this results in fewer internationalization issues than any alternative text encoding. UTF-8 is dominant for all countries/languages on the internet, with 99% global ... [...More Info...]       [...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]   |
|
HZ-GB-2312
The HZ character encoding is an encoding of GB 2312 that was formerly commonly used in email and USENET postings. It was designed in 1989 by Fung Fung Lee () of Stanford University, and subsequently codified in 1995 into RFC 1843. The HZ, short for ''Hanzi'' (), encoding was invented to facilitate the use of Chinese characters through e-mail, which at that time only allowed 7-bit characters. Therefore, in lieu of standard ISO 2022 escape sequences (as in the case of ISO-2022-JP) or 8-bit characters (as in the case of EUC), the HZ code uses only printable, 7-bit characters to represent Chinese characters. It was also popular in USENET networks, which in the late 1980s and early 1990s, generally did not allow transmission of 8-bit characters or escape characters. History HZ superseded the earlier "zW" encoding, which marked entire lines as being GB 2312 text by beginning them with the characters zW. Structure and use In the HZ encoding system, the character sequences "~" act as e ... [...More Info...]       [...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]   |
|
8BITMIME
The Simple Mail Transfer Protocol (SMTP) is an Internet standard communication protocol for electronic mail transmission. Mail servers and other message transfer agents use SMTP to send and receive mail messages. User-level email clients typically use SMTP only for sending messages to a mail server for relaying, and typically submit outgoing email to the mail server on port 465 or 587 per . For retrieving messages, IMAP (which replaced the older POP3) is standard, but proprietary servers also often implement proprietary protocols, e.g., Exchange ActiveSync. SMTP's origins began in 1980, building on concepts implemented on the ARPANET since 1971. It has been updated, modified and extended multiple times. The protocol version in common use today has extensible structure with various extensions for authentication, encryption, binary data transfer, and internationalized email addresses. SMTP servers commonly use the Transmission Control Protocol on port number 25 (between servers) ... [...More Info...]       [...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]   |
|
Pound Sign
The pound sign () is the currency symbol, symbol for the pound unit of account, unit of Pound sterling, sterling – the currency of the United Kingdom and its associated Crown Dependencies and British Overseas Territories and previously of Kingdom of Great Britain, Great Britain and of the Kingdom of England. The same symbol is used for other currencies called pound (currency), pound, such as the Egyptian pound, Egyptian and Syrian pound, Syrian pounds. The sign may be drawn with one or two bars depending on personal preference, but the Bank of England has used the one-bar style exclusively on banknotes since 1975. In the United States, "pound sign" refers to the symbol (number sign). In Canada, "pound sign" can mean or . Origin The symbol derives from the upper case Latin letter , representing ''libra pondo'', the basic unit of weight in the Roman Empire, which in turn is derived from the Latin word ''libra'', meaning Weighing scale, scales or a balance. The pound became a ... [...More Info...]       [...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]   |
|
Hyphen-minus
The symbol , known in Unicode as hyphen-minus, is the form of hyphen most commonly used in digital documents. On most keyboards, it is the only character that resembles a minus sign or a dash, so it is also used for these. The name ''hyphen-minus'' derives from the original ASCII standard, where it was called ''hyphen (minus)''. The character is referred to as a ''hyphen'', a ''minus sign'', or a ''dash'' according to the context where it is being used. Description In early typewriters and character encodings, a single key/code was almost always used for hyphen, minus, various dashes, and strikethrough, since they all have a similar appearance. The current Unicode Standard specifies distinct characters for several different dashes, an unambiguous minus sign (sometimes called the ''Unicode minus'') at code point U+2212, an unambiguous hyphen (sometimes called the ''Unicode hyphen'') at U+2010, the hyphen-minus at U+002D and a variety of other hyphen symbols for various uses ... [...More Info...]       [...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]   |
|
JIS-Roman
Code page 895 (CCSID 895) is a 7-bit character set and is Japan's national ISO 646 variant. It is the Roman set (first or left half) of the JIS X 0201 (formerly JIS C 6220) Japanese Standard and is variously called Japan 7-Bit Latin, JISCII, JIS Roman, JIS C6220-1969-ro, ISO646-JP or Japanese-Roman. Its ISO-IR registration number is 14. Amongst IBM's code pages, it accompanies code page 896 (half-width katakana), which encodes the Kana set of JIS X 0201 with extensions, and code page 897 which encodes the 8-bit form of JIS X 0201. It is used in Unix-like systems and, when combined with code page 896 and the 2-byte IBM code page 952 and code page 953, makes up the four code-sets of code page 954, one of IBM's versions of EUC-JP. Codepage layout See also * Shift JIS Shift JIS (also SJIS, MIME name Shift_JIS, known as PCK in Solaris contexts) is a character encoding for the Japanese language, originally developed by the Japanese company ASCII Corporation in conjunc ... [...More Info...]       [...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]   |