General Punctuation is a
Unicode block
A Unicode block is one of several contiguous ranges of numeric character codes (code points) of the Unicode character set that are defined by the Unicode Consortium for administrative and documentation purposes. Typically, proposals such as the ...
containing
punctuation
Punctuation marks are marks indicating how a piece of writing, written text should be read (silently or aloud) and, consequently, understood. The oldest known examples of punctuation marks were found in the Mesha Stele from the 9th century BC, c ...
,
spacing, and formatting characters for use with all scripts and writing systems. Included are the defined-width
spaces, joining formats, directional formats,
smart quotes, archaic and novel punctuation such as the
interrobang, and invisible mathematical operators.
Additional punctuation characters are in the
Supplemental Punctuation block and sprinkled in dozens of other Unicode blocks.
Block
Several characters in this block are usually not rendered with a directly visible glyph.
Ten
whitespace character
A whitespace character is a character data element that represents white space when text is
rendered for display by a computer.
For example, a ''space'' character (, ASCII 32) represents blank space such as a word divider in a Western scrip ...
s—U+2002 through U+200B (fixed ''en'' or ''1⁄2 em'', ''em'', ''1⁄3 em'', ''1⁄4 em'', ''1⁄6 em'', ''figure'' and ''punctuation space'', variable ''thin'' or ''1⁄5 em'' and ''hair space'', fixed ''zero-width space'')—and U+205F (''math medium'' or ''2⁄9 em space'') differ by horizontal width, while U+2000 and U+2001 (''en'' and ''em quad'') are effectively aliases of U+2002 and U+2003, respectively; another two, U+202F and U+2060 (ill-termed ''word joiner''), are variants of U+2009 or U+2004 and U+200B that prohibit line breaks.
Three zero-width characters, U+200B through U+200D (''space'', ''non-joiner'' and ''joiner''), differ in how they affect
ligation and shaping of adjacent letters such as
contextual forms in Arabic.
Eleven invisible characters—U+200E, U+200F (''left-to-right'' and ''right-to-left mark''), U+202A through U+202E (''embeds, pops'' and ''overrides'') and U+2066 through U+2069 (''isolates'')—control the directionality of text unless higher-level markup overrides them.
There are explicit ''line'' and ''paragraph separators'' at U+2028 and U+2029.
Variation selectors
Starting with Unicode 16 (2024), the block has
variation sequences defined for East Asian punctuation positional variants of the curly quotation marks . They use (VS01) and (VS02):
The non-fullwidth forms are expected to be separated with a space on one side, the fullwidth forms are not:

In vertical text, the fullwidth forms should display somewhat differently, and even as regular
CJK quotation marks 「...」 and 『...』 if the vertical orientation property is set to "Hans":
Emoji
The General Punctuation block contains two
emoji
An emoji ( ; plural emoji or emojis; , ) is a pictogram, logogram, ideogram, or smiley embedded in text and used in electronic messages and web pages. The primary function of modern emoji is to fill in emotional cues otherwise missing from type ...
:
U+203C and U+2049.
The block has four
standardized variants defined to specify emoji-style (U+FE0F VS16) or text presentation (U+FE0E VS15) for the
two emoji, both of which default to a text presentation.
History
The following Unicode-related documents record the purpose and process of defining specific characters in the General Punctuation block:
References
{{reflist
Unicode blocks