HOME

TheInfoList



OR:

The zero-width space , abbreviated ZWSP, is a
non-printing character In computing and telecommunication, a control character or non-printing character (NPC) is a code point (a number) in a character set, that does not represent a written symbol. They are used as in-band signaling to cause effects other than the ...
used in computerized
typesetting Typesetting is the composition of text by means of arranging physical ''type'' (or ''sort'') in mechanical systems or ''glyphs'' in digital systems representing ''characters'' (letters and other symbols).Dictionary.com Unabridged. Random Ho ...
to indicate word boundaries to text-processing systems in scripts that do not use explicit spacing, or after characters (such as the
slash Slash may refer to: * Slash (punctuation), the "/" character Arts and entertainment Fictional characters * Slash (Marvel Comics) * Slash (''Teenage Mutant Ninja Turtles'') Music * Harry Slash & The Slashtones, an American rock band * Nash ...
) that are not followed by a visible
space Space is the boundless three-dimensional extent in which objects and events have relative position and direction. In classical physics, physical space is often conceived in three linear dimensions, although modern physicists usually consider ...
but after which there may nevertheless be a line break. It is also used with languages without visible space between words, for example,
Japanese Japanese may refer to: * Something from or related to Japan, an island country in East Asia * Japanese language, spoken mainly in Japan * Japanese people, the ethnic group that identifies with Japan through ancestry or culture ** Japanese diaspor ...
. Normally, it is not a visible separation, but it may expand in passages that are fully justified.


Usage

In
HTML The HyperText Markup Language or HTML is the standard markup language for documents designed to be displayed in a web browser. It can be assisted by technologies such as Cascading Style Sheets (CSS) and scripting languages such as JavaScri ...
pages, the zero-width space can be used to mark a potential line break ''without'' hyphenation, as can the HTML element <wbr>; for hyphenated line breaks, a
soft hyphen In computing and typesetting, a soft hyphen (ISO 8859: 0xAD, Unicode , HTML: &#xAD; or &#173; or &shy;) or syllable hyphen (EBCDIC: 0xCA), abbreviated SHY, is a code point reserved in some coded character sets for the purpose of breaki ...
is used. The zero-width space was not supported in some older
web browser A web browser is application software for accessing websites. When a user requests a web page from a particular website, the browser retrieves its files from a web server and then displays the page on the user's screen. Browsers are used on ...
s. To show the effect of the zero-width space, the following words have been separated with zero-width spaces:
And the following words are not separated with these spaces:
On browsers supporting zero-width spaces, resizing the window will re-break the first text only at word boundaries, while the second text will not be broken at all.


Prohibited in URLs

ICANN The Internet Corporation for Assigned Names and Numbers (ICANN ) is an American multistakeholder group and nonprofit organization responsible for coordinating the maintenance and procedures of several databases related to the namespaces ...
rules prohibit
domain names A domain name is a string that identifies a realm of administrative autonomy, authority or control within the Internet. Domain names are often used to identify services provided through the Internet, such as websites, email services and more. As ...
from including non-displayed characters such as zero-width space, and most browsers prohibit their use within domain names because they can be used to create a homograph attack, where a malicious URL is visually indistinguishable from a legitimate one.


Encoding

The zero-width space character is encoded in
Unicode Unicode, formally The Unicode Standard,The formal version reference is is an information technology Technical standard, standard for the consistent character encoding, encoding, representation, and handling of Character (computing), text expre ...
as , and input in HTML as , or . Contrary to what their names suggest, the character entities &NegativeThickSpace;, &NegativeMediumSpace;, &NegativeThinSpace;, and &NegativeVeryThinSpace; also refer to the zero-width space. The
TeX Tex may refer to: People and fictional characters * Tex (nickname), a list of people and fictional characters with the nickname * Joe Tex (1933–1982), stage name of American soul singer Joseph Arrington Jr. Entertainment * ''Tex'', the Italian ...
representation is ; the
LaTeX Latex is an emulsion (stable dispersion) of polymer microparticles in water. Latexes are found in nature, but synthetic latexes are common as well. In nature, latex is found as a milky fluid found in 10% of all flowering plants (angiosperms ...
representation is \hspace; and the groff representation is \:. Its semantics and
HTML The HyperText Markup Language or HTML is the standard markup language for documents designed to be displayed in a web browser. It can be assisted by technologies such as Cascading Style Sheets (CSS) and scripting languages such as JavaScri ...
implementation are similar to the soft hyphen, except that soft hyphens display a hyphen character at the point where the line is broken.


See also

*
Hair space In computer programming, whitespace is any character or series of characters that represent horizontal or vertical space in typography. When rendered, a whitespace character does not correspond to a visible mark, but typically does occupy an area ...
*
Whitespace character In computer programming, whitespace is any character or series of characters that represent horizontal or vertical space in typography. When rendered, a whitespace character does not correspond to a visible mark, but typically does occupy an area ...
– including a table comparing various space-like characters *
Word divider In punctuation, a word divider is a glyph that separates written words. In languages which use the Latin, Cyrillic, and Arabic alphabets, as well as other scripts of Europe and West Asia, the word divider is a blank space, or ''whitespace''. T ...
*
Word wrapping Line breaking, also known as word wrapping, is breaking a section of text into lines so that it will fit into the available width of a page, window or other display area. In text display, line wrap is continuing on a new line when a line is ful ...
*
Word joiner The word joiner (WJ) is a format character in Unicode used to indicate that word separation should not occur at a position, when using scripts such as Arabic that do not use explicit spacing. It is encoded since Unicode version 3.2 (released in ...
(U+2060: ⁠), as well as ''zero-width no-break space'' (U+FEFF: ) *
Zero-width joiner The zero-width joiner (ZWJ, ) is a non-printing character used in the computerized typesetting of writing systems in which the shape or positioning of a grapheme depends on its relation to other graphemes (complex scripts), such as the Arabic s ...
(U+200D: ‍) *
Zero-width non-joiner The zero-width non-joiner (ZWNJ) is a non-printing character used in the computerization of writing systems that make use of ligatures. When placed between two characters that would otherwise be connected into a ligature, a ZWNJ causes them to ...
(U+200C: ‌)


References


Citations


Sources

*
Unicode Consortium The Unicode Consortium (legally Unicode, Inc.) is a 501(c)(3) non-profit organization incorporated and based in Mountain View, California. Its primary purpose is to maintain and publish the Unicode Standard which was developed with the intenti ...
,
Special Areas and Format Characters
(Chapter 16), ''The Unicode Standard'', Version 5.2. *
Victor H. Mair Victor Henry Mair (; born March 25, 1943) is an American sinologist. He is a professor of Chinese at the University of Pennsylvania. Among other accomplishments, Mair has edited the standard '' Columbia History of Chinese Literature'' and the ''C ...
, Yongquan Liu, ''Characters and computers'', IOS Press, 1991. {{DEFAULTSORT:Zero-Width Space Control characters Typography Unicode formatting code points Whitespace