HOME
TheInfoList



345px, The KCharSelect character mapping tool shown displaying a subset of the alt= Unicode input is the insertion of a specific Unicode character on a computer by a user; it is a common way to input characters not directly supported by a physical keyboard. Unicode characters can be produced either by selecting them from a display or by typing a certain sequence of keys on a physical keyboard. In addition, a character produced by one of these methods in one web page or document can be copied into another. In contrast to
ASCII ASCII ( ), abbreviated from American Standard Code for Information Interchange, is a character encoding standard for electronic communication. ASCII codes represent text in computers, telecommunications equipment, and other devices. Most mod ...
's 96 element character set (which it contains), Unicode encodes hundreds of thousands of
grapheme In linguistics Linguistics is the science, scientific study of language. It encompasses the analysis of every aspect of language, as well as the methods for studying and modeling them. The traditional areas of linguistic analysis include ...
s (characters) from almost all of the world's written languages and many other signs and symbols besides. A Unicode input system must provide for a large repertoire of characters, ideally all valid Unicode code points. This is different from a keyboard layout which defines keys and their combinations only for a limited number of characters appropriate for a certain locale.


Unicode numbers

Unicode characters are distinguished by code points, which are conventionally represented by "U+" followed by four, five or six hexadecimal digits, for example U+00AE or U+1D310. Characters in the Basic Multilingual Plane (BMP), containing modern Writing system, scripts – including many Chinese and Japanese characters – and many symbols, have a 4-digit code. Historic scripts, but also many modern symbols and pictographs (such as emoticons, emojis, playing cards and many CJK characters) have 5-digit codes.


Availability

An application can display a character only if it can access a font which contains a glyph for the character.Andrew Marcuse
"How to enter Unicode characters in Microsoft Windows"
Access date: September 13, 2012
Very few fonts have full Unicode coverage; most only contain the glyphs needed to support a few writing systems. However, most modern browsers and other text-processing applications are able to display multilingual content because they perform font substitution, automatically switching to a fallback font when necessary to display characters which are not supported in the current font. Which fonts are used for fallback and the thoroughness of Unicode coverage varies by software and operating system; some software will search for a suitable glyph in all of the installed fonts, others only search within certain fonts. If an application does not have access to a glyph, the character will usually be shown as the font's ".notdef." glyph which often appears as an empty box (nicknamed "tofu" based on the shape), a box with an X in it, or a box with a question mark in it. Modern implementations use .notdef. for unsupported characters, and the replacement character only for encoding errors.


Selection from a screen

Many systems provide a way to select Unicode characters visually. ISO/IEC 14755 refers to this as a ''screen-selection entry method''. Microsoft Windows has provided a Unicode version of the Character Map (Windows), Character Map program, appearing in the consumer edition since XP. This is limited to characters in the Basic Multilingual Plane (BMP). Characters are searchable by Unicode character name, and the table can be limited to a particular code block. More advanced third-party tools of the same type are also available (a notable freeware example is Andrew West (linguist)#BabelMap, BabelMap, which supports all Unicode characters). On most Linux desktop environments, equivalent tools – such as gucharmap (GNOME) or List of KDE applications, kcharselect (KDE) – are available. Generally these tools let the user "copy" the selected characters into the clipboard, and then paste them into the document, rather than pretending to directly type them. It is often practical to just find the desired character on the web or in another document, and copy and paste it from there.


Decimal input

Some programs running in Microsoft Windows, including recent versions of Microsoft Word, Word and Wordpad, can produce characters from their Unicode code points expressed in decimal and entered on the numeric keypad with the key held down. For example, the Euro sign has 20AC as its hexadecimal code point, which is 8364 in decimal, so will produce the symbol. Similarly, produces the doublestrike, double-struck character . Decimal code points in the range 160 –255 must be entered with a leading zero (so that the Windows code page is chosen) and furthermore the Windows code page must be set to match Unicode (CP1252 must be used). For example, yields a , corresponding to its code point, but the character produced by depends on the , such as Code page 437, and may yield a . In programs in which Alt codes over 255 do not work, the character retrieved usually corresponds to the modulo operation, remainder when the number is divided by 256. The text editor Vim (text editor), Vim allows characters to be specified by two-character mnemonics (confusingly called digraph (computing), "digraphs" by Vim developers). The installed set can be augmented by custom mnemonics defined for arbitrary code points, specified in decimal. For example, as decimal 9881 is equal to hexadecimal 2699, associates "Gr" with . See #HTML, below for use of decimal code points in HTML.


Hexadecimal input

Clause 5.1 of ISO/IEC 14755 describes a ''Basic method'' whereby a ''beginning sequence'' is followed by the hexadecimal, hex number representation of the code point and the ''ending sequence''. Most modern systems have some method to emulate this, sometimes limited to four digits (thus only the Plane (Unicode)#Basic Multilingual Plane, Basic Multilingual Plane).


In Microsoft Windows

Hexadecimal Unicode input can be enabled by adding a string type (REG_SZ) value called EnableHexNumpad to the Windows Registry, registry key HKEY_CURRENT_USER\Control Panel\Input Method and assigning the value data 1 to it. Users will need to log off and back in after editing the registry for this input method to start working. (In versions earlier than Vista, users needed to reboot for it to start working.) Unicode characters can then be entered by holding down , and typing on the numeric keypad, followed by the hexadecimal code – using the numeric keypad for digits from 0 to 9 and letter keys for A to F – and then releasing . This may not work for 5-digit hexadecimal codes like . If one prefers not to edit the registry or if, as on many laptops, the numeric keypad is unavailable, third-party software such as ''UnicodeInput'' can be used. AutoHotkey scripts support substitution of Unicode characters for keystrokes. For example, the command Send will insert an Dash#Em dash, em dash in a text field in the active window. In some applications (Microsoft Word, Word, WordPad and LibreOffice programs) a simpler method is supported: one first enters the character's code point (between two and six hexadecimal digits), then types which will replace the digits with the Unicode character. For example, entering f1 and then pressing the combination will produce the character 'ñ'. Unless it is six hexadecimal digits long, the code must not be preceded by any digit or letters a–f as they may be treated as part of the code to be converted. For example, entering af1 followed by will produce '૱' (U+0AF1), but entering a0000f1 followed by will produce 'añ' ('a' followed by character U+00F1). One can generate a desired character by this technique in Word (for example) and then copy and paste it into an application that does not directly support this method.


In MacOS

Hex input of Unicode must be enabled. In Mac OS 8.5 and later, one can choose the ''Unicode Hex Input'' keyboard layout; in OS X Yosemite, OS X (10.10) Yosemite, this can be added in Keyboard → Input Sources. Holding down , one types the four-digit hexadecimal Unicode code point and the equivalent character appears; one can then release the key.typing special and accented characters
Characters outside of the BMP (the Basic Multilingual Plane) exceed the four-digit limit of the Unicode hex input mechanism but can be entered by using UTF-16#Description, surrogate pairs: holding down the key while entering the first surrogate, the , the second surrogate, then releasing the Option key.


In X11 (Linux and other Unix variants including Chrome OS)

In many applications one or both of the following methods work to directly input Unicode characters: * Holding and typing followed by the hex digits, then releasing . * Entering , releasing, then typing the hex digits and pressing (or or even, on some systems, pressing and releasing or ). This is supported by GTK and Qt applications, and possibly others. In Chrome OS, this is an operating system function.


In platform-independent applications

* In Emacs, or . * In LibreOffice 5.1 onwards, the method described above for Windows works. * In Opera (browser), Opera versions that use the Presto layout engine—i.e. up to and including version 12.xx—, entering the hexadecimal number of the desired symbol or character and then pressing (alternative shortcut on macOS). * In the Vim (text editor), Vim editor, in insert mode, the user first types (for codepoints up to 4 hex digits long; using for longer), then types in the hexadecimal number of the symbol or character desired, and it will be converted into the symbol. (On Microsoft Windows, may be required instead of .Vim documentation: gui_w32
/ref>) * In AutoCAD or three shortcuts , , .


HTML

In Unicode and HTML, HTML and Extensible markup language, XML, character codes to be rendered as characters are prefixed by ampersand and number sign (&#), and are followed by a semicolon (;). The code point can be either in decimal or in hexadecimal; in the latter case it is preceded by an "x". Leading zeros may be omitted. A number of characters may be represented by a Character entity reference, named entity. ''Example:'' In HTML/XML, the copyright sign © (U+00A9) may be coded as: * © (decimal code point) * © (hexadecimal code point) * © (entity name) This works in many pieces of software that accept HTML markup, such as Mozilla Thunderbird, Thunderbird and Wikipedia editing.


See also

*
ASCII ASCII ( ), abbreviated from American Standard Code for Information Interchange, is a character encoding standard for electronic communication. ASCII codes represent text in computers, telecommunications equipment, and other devices. Most mod ...
* Digraph (programming) * AltGr key * Compose key


Notes


References

{{DEFAULTSORT:Unicode Input Unicode, Input Input methods