HOME

TheInfoList



OR:

In
Unicode Unicode or ''The Unicode Standard'' or TUS is a character encoding standard maintained by the Unicode Consortium designed to support the use of text in all of the world's writing systems that can be digitized. Version 16.0 defines 154,998 Char ...
, a Private Use Area (PUA) is a range of
code point A code point, codepoint or code position is a particular position in a Table (database), table, where the position has been assigned a meaning. The table may be one dimensional (a column), two dimensional (like cells in a spreadsheet), three dime ...
s that, by definition, will not be assigned characters by the standard. Three Private Use Areas are defined: one in the
Basic Multilingual Plane In the Unicode standard, a plane is a contiguous group of 65,536 (216) code points. There are 17 planes, identified by the numbers 0 to 16, which corresponds with the possible values 00–1016 of the first two positions in six position hexadecimal ...
(), and one each in, and nearly covering, planes 15 and 16 (, ). They are intentionally left undefined so that third parties may assign their own characters without conflicting with Unicode Standard assignments. Under the Unicode Stability Policy, the Private Use Areas will remain allocated for that purpose in all future Unicode versions. Assignments to private-use code points need not be "private" in the sense of strictly internal to an organisation; a number of assignment schemes have been published by several organisations. Such publication may include a font that supports the definition (showing the glyphs), and software making use of the private-use characters (e.g., a graphics character for a "print document" function). By definition, multiple private parties may assign different characters to the same code point, with the consequence that a user may see one private character from an installed font where a different one was intended.


Definition

Under the Unicode definition, code points in the Private Use Areas are not noncharacters, reserved, or unassigned. Their
category Category, plural categories, may refer to: General uses *Classification, the general act of allocating things to classes/categories Philosophy * Category of being * ''Categories'' (Aristotle) * Category (Kant) * Categories (Peirce) * Category ( ...
is "Other, private use (Co)", and no character names are specified. No representative glyphs are provided, and character semantics are left to private agreement.
Private-use characters are assigned Unicode code points whose interpretation is not specified by this standard and whose use may be determined by private agreement among cooperating users. These characters are designated for private use and do not have defined, interpretable semantics except by private agreement. ... No charts are provided for private-use characters, as any such characters are, by their very nature, defined only outside the context of this standard.


Blocks

There are three PUA blocks in Unicode. In the Basic Multilingual Plane (plane 0), the block titled Private Use Area (PUA) has 6400 code points. Planes 15 and 16 are almostThe last two characters of every plane are defined to be noncharacters. The remaining 65,534 characters of each of planes 15 and 16 are assigned as private-use characters. entirely assigned to two further Private Use Areas: Supplementary Private Use Area-A (SPUA-A) and Supplementary Private Use Area-B (SPUA-B). In
UTF-16 UTF-16 (16-bit Unicode Transformation Format) is a character encoding that supports all 1,112,064 valid code points of Unicode. The encoding is variable-length as code points are encoded with one or two ''code units''. UTF-16 arose from an earli ...
a subset of the high surrogates (U+DB80..U+DBFF) is used for these and only these planes, and are called High Private Use Surrogates.


History

In Unicode 1.0.0, the Private Use Area extended from U+E800 to U+FDFF (i.e. did not include U+E000..E7FF, but additionally included the U+F900..FDFF range now occupied by CJK Compatibility Ideographs,
Alphabetic Presentation Forms Alphabetic Presentation Forms is a Unicode block containing standard ligatures for the Latin, Armenian, and Hebrew scripts. Block History The following Unicode-related documents record the purpose and process of defining specific characters in ...
and Arabic Presentation Forms-A). This was changed to U+E000..F8FF in Unicode 1.0.1, and remained so in Unicode 1.1. The range U+D800..DFFF, used for
UTF-16 UTF-16 (16-bit Unicode Transformation Format) is a character encoding that supports all 1,112,064 valid code points of Unicode. The encoding is variable-length as code points are encoded with one or two ''code units''. UTF-16 arose from an earli ...
surrogates since Unicode 2.0, was unassigned and not part of the Private Use Area in any Unicode 1.x version. Planes E0 (224) through FF (255), and groups 60 (96) though 7F (127) of the
Universal Coded Character Set The Universal Coded Character Set (UCS, Unicode) is a standard set of character (computing), characters defined by the international standard International Organization for Standardization, ISO/International Electrotechnical Commission, IEC  ...
(i.e. U+E00000 through U+FFFFFF and U+60000000 through U+7FFFFFFF) were also designated as private use. These ranges were removed when UCS was restricted to the seventeen planes reachable in UTF-16.


Usage


Standardization initiative uses

Many people and institutions have created character collections for the PUA. Some of these private use agreements are published, so other PUA implementers can aim for unused or less-used code points to prevent overlaps. Several characters and scripts previously encoded in private use agreements have actually been fully encoded in Unicode, necessitating mappings from the PUA to other Unicode code points. One of the more well-known and broadly implemented PUA agreements is maintained by the
ConScript Unicode Registry The ConScript Unicode Registry is a volunteer project to coordinate the assignment of code points in the Unicode Private Use Areas (PUA) for the encoding of artificial scripts, such as those for constructed languages. It was founded by John Woldema ...
(CSUR). The CSUR, which is not officially endorsed or associated with the Unicode Consortium, provides a mapping for constructed scripts, such as Klingon pIqaD and Ferengi script (Star Trek),
Tengwar The Tengwar () script is an artificial script, one of Tolkien's scripts, several scripts created by J. R. R. Tolkien, the author of ''The Lord of the Rings''. Within the context of Tolkien's fictional world, the Tengwar were invented by the ...
and
Cirth The Cirth (, meaning "runes"; sg. certh ) is a semi‑artificial script, based on real‑life runic alphabets, one of Tolkien's scripts, several scripts invented by J. R. R. Tolkien for the constructed languages he devised and used in his wor ...
(J.R.R. Tolkien's cursive and runic scripts), Alexander Melville Bell's Visible Speech, and Dr. Seuss's alphabet from '' On Beyond Zebra''. The CSUR previously encoded the undeciphered
Phaistos Phaistos (, ; Ancient Greek: , , Linear B: ''Pa-i-to''; Linear A: ''Pa-i-to''), also Transliteration, transliterated as Phaestos, Festos and Latin Phaestus, is a Bronze Age archaeological site at modern Faistos, a municipality in south centr ...
characters, as well as the Shavian and Deseret alphabets, which have all been accepted for official encoding in Unicode. Another common PUA agreement is maintained by the
Medieval Unicode Font Initiative In digital typography, the Medieval Unicode Font Initiative (MUFI) is a project which aims to coordinate the encoding and display of special characters in medieval texts written in the Latin alphabet or in runes, which are not otherwise encoded ...
(MUFI). This project is attempting to support all of the scribal abbreviations, ligatures,
precomposed character A precomposed character (alternatively composite character or decomposable character) is a Unicode entity that can also be defined as a sequence of one or more other characters. A precomposed character may typically represent a letter with a diac ...
s, symbols, and alternate
letterforms A letterform, letter-form or letter form is a term used especially in typography, palaeography, calligraphy and epigraphy to mean a letter's shape. A letterform is a type of glyph, which is a specific, concrete way of writing an abstract charac ...
found in medieval texts written in the Latin alphabet. The express purpose of MUFI is to experimentally determine which characters are necessary to represent these texts, and to have those characters officially encoded in Unicode. As of Unicode version 5.1, 152 MUFI characters have been incorporated into the official Unicode encoding. Some agreed-upon PUA character collections exist in part or whole because the Unicode Consortium is in no hurry to encode them. Some, such as unrepresented languages, are likely to end up encoded in the future. Some unusual cases such as fictional languages are outside the usual scope of Unicode but not explicitly ruled out by the principles of Unicode, and may show up eventually (such as the Star Trek and Tolkien writing systems). In other cases, the proposed encoding violates one or more Unicode principles and hence is unlikely to ever be officially recognized by Unicode—mostly where users want to directly encode alternate forms, ligatures, or base-character-plus-diacritic combinations (such as the TUNE scheme). *
Emoji An emoji ( ; plural emoji or emojis; , ) is a pictogram, logogram, ideogram, or smiley embedded in text and used in electronic messages and web pages. The primary function of modern emoji is to fill in emotional cues otherwise missing from type ...
were originally defined in unused spaces in
Shift JIS Shift JIS (also SJIS, MIME name Shift_JIS, known as PCK in Solaris contexts) is a character encoding for the Japanese language, originally developed by the Japanese company ASCII Corporation in conjunction with Microsoft and standardized as JIS ...
mobile encodings, with different carriers supporting different emoji characters. Before
emoji An emoji ( ; plural emoji or emojis; , ) is a pictogram, logogram, ideogram, or smiley embedded in text and used in electronic messages and web pages. The primary function of modern emoji is to fill in emotional cues otherwise missing from type ...
were added to the Unicode Standard in Unicode 6.0, Google and major Japanese phone carriers each defined their own Private Use Area mappings for emoji. The Japanese carriers defined their encoding schemes in the Basic Multilingual Plane's Private Use Area, whereas Google defined theirs in Supplementary Private Use Area-A. * GB/T 20542-2006 ("Tibetan Coded Character Set Extension A") and GB/T 22238-2008 ("Tibetan Coded Character Set Extension B") are Chinese national standards that use the PUA to encode precomposed Tibetan ligatures. * GBK and earlier versions of
GB 18030 GB 18030 is a Chinese government standard, described as ''Information Technology — Chinese coded character set'' and defines the required language and character support necessary for software in China. GB18030 is the registered Internet n ...
used the PUA to provisionally encode characters not found in Unicode standards at the time of publication. In the 2022 version of the standard (GB 18030-2022), characters are instead mapped to their standard Unicode codepoints. * The
Institute of the Estonian Language The Institute of the Estonian Language () is the official language-regulatory authority of the Estonian language. It is located in the capital city of Estonia, Tallinn. Its stated formal goal is to contribute to the long-term survival of the Est ...
uses the PUA to encode Latin and Cyrillic precomposed characters that have no Unicode encoding. * Th
Free Tengwar Font Project
uses a different mapping from the
ConScript Unicode Registry The ConScript Unicode Registry is a volunteer project to coordinate the assignment of code points in the Unicode Private Use Areas (PUA) for the encoding of artificial scripts, such as those for constructed languages. It was founded by John Woldema ...
that largely follows Michael Everson's 2001-03-07 Tengwar discussion paper, but diverges in some details. * The MARC 21 standard uses the PUA to encode East Asian characters present in MARC-8 that have no Unicode encoding. * The SIL Corporate PUA uses the PUA to encode characters used in minority languages that have not yet been accepted into Unicode. * The STIX Fonts project uses the PUA to provide a comprehensive font set of mathematical symbols and alphabets, many of which are also available in the SMP now, e.g. in the
Mathematical Alphanumeric Symbols Mathematical Alphanumeric Symbols is a Unicode block comprising styled forms of Latin alphabet, Latin and Greek alphabet, Greek letters and decimal numerical digit, digits that enable mathematicians to denote different notions with different l ...
block. * The SMuFL uses the PUA to encode new music notation symbols, extending the Musical Symbols Unicode block. * The Tamil Unicode New Encoding (TUNE) is a proposed scheme for encoding
Tamil Tamil may refer to: People, culture and language * Tamils, an ethno-linguistic group native to India, Sri Lanka, and some other parts of Asia **Sri Lankan Tamils, Tamil people native to Sri Lanka ** Myanmar or Burmese Tamils, Tamil people of Ind ...
that overcomes perceived deficiencies in the current Unicode encoding.


Vendor use

Informally, the range U+F000 through U+F8FF is known as the Corporate Use Area. This originates from early versions of Unicode, which defined an "End User Zone" extending from U+E000 upward and a "Corporate Use Zone" extending from U+F8FF downward, with the boundary between the two left undefined. * The Adobe Glyph List used to use the PUA for some of its glyphs. *
Apple An apple is a round, edible fruit produced by an apple tree (''Malus'' spp.). Fruit trees of the orchard or domestic apple (''Malus domestica''), the most widely grown in the genus, are agriculture, cultivated worldwide. The tree originated ...
lists a range of 1,280 characters in its developer documentation from U+F400–U+F8FF within the PUA for Apple's use. Of those, only 311 are used, in the range U+F700–U+F8FF (
NeXT NeXT, Inc. (later NeXT Computer, Inc. and NeXT Software, Inc.) was an American technology company headquartered in Redwood City, California that specialized in computer workstations for higher education and business markets, and later develope ...
(
NeXTSTEP NeXTSTEP is a discontinued object-oriented, multitasking operating system based on the Mach kernel and the UNIX-derived BSD. It was developed by NeXT, founded by Steve Jobs, in the late 1980s and early 1990s and was initially used for its ...
and
OPENSTEP OpenStep is an object-oriented application programming interface (API) specification developed by NeXT. It provides a framework for building graphical user interfaces (GUIs) and developing software applications. OpenStep was designed to be plat ...
) and
Apple An apple is a round, edible fruit produced by an apple tree (''Malus'' spp.). Fruit trees of the orchard or domestic apple (''Malus domestica''), the most widely grown in the genus, are agriculture, cultivated worldwide. The tree originated ...
(
macOS macOS, previously OS X and originally Mac OS X, is a Unix, Unix-based operating system developed and marketed by Apple Inc., Apple since 2001. It is the current operating system for Apple's Mac (computer), Mac computers. With ...
AppKit)). ** One of these is U+F8FF, the
Apple logo The marketing of Apple Inc. encompasses the Apple Inc. advertising, company's advertising, distribution, and branding. After Steve Jobs returned to Apple in 1997, he made industrial design a key element of the company's branding strategy. Apple's p ...
, generally supported by Apple's 8-bit sets. * WGL4 uses the PUA (U+F001 and U+F002) to encode duplicates of the ligatures (U+FB01) (U+FB02). *
Microsoft Microsoft Corporation is an American multinational corporation and technology company, technology conglomerate headquartered in Redmond, Washington. Founded in 1975, the company became influential in the History of personal computers#The ear ...
's defunct Services For Macintosh feature used U+F001 through U+F029 as replacements for special characters allowed in
HFS HFS may refer to: Businesses and organisations * Croatian Film Association () * Hellenic Fire Service, Greece * Hospitality Franchise Systems, US Computing * Hierarchical file system, a system for organizing directories and files * Hierarchica ...
but forbidden in
NTFS NT File System (NTFS) (commonly called ''New Technology File System'') is a proprietary journaling file system developed by Microsoft in the 1990s. It was developed to overcome scalability, security and other limitations with File Allocation Tabl ...
, and U+F02A for the Apple logo. * In old versions of its RichEdit component, Microsoft mapped U+F020–U+F0FF within the PUA to symbol fonts. For any character in this range, RichEdit would show a character from a symbol font instead of the end-user-defined character (EUDC). * uses U+F8FC–U+F8FE for ⌀ (diameter sign), ± (
plus–minus sign The plus–minus sign or plus-or-minus sign () and the complementary minus-or-plus sign () are symbols with broadly similar multiple meanings. *In mathematics, the sign generally indicates a choice of exactly two possible values, one of which i ...
) and ° (degree sign) respectively. * Some fonts place the Windows logo at U+F000. * The code point U+F000 is a numeral succession starting at 13 or 18 in some video games like '' Agar.io''. * On
Ubuntu Ubuntu ( ) is a Linux distribution based on Debian and composed primarily of free and open-source software. Developed by the British company Canonical (company), Canonical and a community of contributors under a Meritocracy, meritocratic gover ...
, U+E0FF is displayed as the "Circle Of Friends" logo and U+F200 is "ubuntu" in the Ubuntu typeface with a superscripted "Circle Of Friends" (this itself is U+F0FF). * Th
3270
font includes the
Debian Debian () is a free and open-source software, free and open source Linux distribution, developed by the Debian Project, which was established by Ian Murdock in August 1993. Debian is one of the oldest operating systems based on the Linux kerne ...
logo at U+F100. * In the
Linux Libertine Linux Libertine is a typeface released in 2003 by the Libertine Open Fonts Project, which aims to create FOSS, free and open alternatives to Proprietary software, proprietary typefaces such as Times New Roman. It was developed with the free font e ...
font, U+E000 displays Tux, the mascot of
Linux Linux ( ) is a family of open source Unix-like operating systems based on the Linux kernel, an kernel (operating system), operating system kernel first released on September 17, 1991, by Linus Torvalds. Linux is typically package manager, pac ...
. * The Font Awesome icon font uses the PUA to display various glyphs. * Powerline, a status line plugin for Vim, uses U+E0A0–U+E0A2 and U+E0B0–U+E0B3 for extra
box-drawing character Box-drawing characters, also known as line-drawing characters, are a form of semigraphics widely used in text user interfaces to draw various geometric frames and boxes. These characters are characterized by being designed to be connected horiz ...
s. * In the Fira Sans typeface used in
Firefox OS Firefox OS (project name: ''Boot to Gecko'', also known as ''B2G'') is a discontinued Open-source software, open-source operating system made for smartphones, tablet computers, smart TVs, and Matchstick TV, dongles designed by Mozilla and exte ...
, U+E003 is displayed as the
Mozilla Mozilla is a free software community founded in 1998 by members of Netscape. The Mozilla community uses, develops, publishes and supports Mozilla products, thereby promoting free software and open standards. The community is supported institution ...
logo (the dinosaur head). *
Lotus Multi-Byte Character Set The Lotus Multi-Byte Character Set (LMBCS) is a proprietary multi-byte character encoding originally conceived in 1988 at Lotus Development Corporation with input from Bob Balaban and others. Created around the same time and addressing some of the ...
(LMBCS), the encoding and character set internally used by Lotus/
IBM International Business Machines Corporation (using the trademark IBM), nicknamed Big Blue, is an American Multinational corporation, multinational technology company headquartered in Armonk, New York, and present in over 175 countries. It is ...
Lotus 1-2-3 Lotus 1-2-3 is a discontinued spreadsheet program from Lotus Software (later part of IBM). It was the first killer application of the IBM PC, was hugely popular in the 1980s, and significantly contributed to the success of IBM PC-compatibles ...
,
Symphony A symphony is an extended musical composition in Western classical music, most often for orchestra. Although the term has had many meanings from its origins in the ancient Greek era, by the late 18th century the word had taken on the meaning c ...
, SmartSuite,
Notes Note, notes, or NOTE may refer to: Music and entertainment * Musical note, a pitched sound (or a symbol for a sound) in music * ''Notes'' (album), a 1987 album by Paul Bley and Paul Motian * ''Notes'', a common (yet unofficial) shortened versi ...
,
Domino Dominoes is a family of tile-based games played with gaming pieces. Each domino is a rectangular tile, usually with a line dividing its face into two square ''ends''. Each end is marked with a number of spots (also called '' pips'' or ''dots'' ...
as well as a number of third-party products such as
Microsoft Works Microsoft Works is a discontinued office suite, productivity software suite developed by Microsoft and sold from 1987 to 2009. Its core functionality includes a word processor, a spreadsheet and a database management system. Later versions have a ...
, uses some characters (U+F862-U+F89F and U+F8FB-U+F8FE) in the Private Use Area for symbols not defined in Unicode. Of these, U+F8FB is known to be reserved for a crown currency symbol ("Kr"), and U+F8FC and U+F8FD were later mapped to U+FB02 () and U+FB01 () respectively. Additionally, when UTF-16 codes are embedded in LMBCS, the UTF-16 codes corresponding to U+F601 through U+F6FF are substituted for UTF-16 codes which would contain null bytes, since LMBCS is designed to not contain embedded null bytes. * IBM reserved several code page IDs for PUA code pages: code page 1446 for the generic plane 15, code page 1447 for the generic plane 16, code page 1448 for the generic BMP PUA, code page 1445 (IBM AFP PUA No. 1) for plane 15 with IBM allocations in U+FFF00–U+FFFFD, and code page 1449 (IBM default PUA) for the BMP PUA with IBM allocations in U+F83D–U+F8FF. * The file system found in Windows uses the U+F000 to U+F0FF block to escape special characters. *
NetApp NetApp, Inc. is an American data infrastructure company that provides unified data storage, integrated data services, and cloud operations (CloudOps) solutions to enterprise customers. The company is based in San Jose, California. It has ranked ...
translates characters in filenames that are allowed on Unix but invalid for SMB clients to PUA characters. *
Twitter Twitter, officially known as X since 2023, is an American microblogging and social networking service. It is one of the world's largest social media platforms and one of the most-visited websites. Users can share short text messages, image ...
's Chirp font provides some additional icons, like U+E000 which corresponds to a left down arrow, U+EA00 which corresponds to the Twitter bird, and U+F8FF which corresponds to an Apple logo, possibly for compatibility with Apple fonts.


Private-use characters in other character sets

The concept of reserving specific code points for private use is based on similar earlier usage in other character sets. In particular, many otherwise obsolete characters in East Asian scripts continue to be used in specific names or other situations, and so some character sets for those scripts made allowance for private-use characters (such as the user-defined planes of CNS 11643, or '' gaiji'' in certain Japanese encodings). The Unicode standard references these uses under the name "End User Character Definition" (EUCD). Additionally, the C1 control block contains two codes intended for private use "control functions" by
ECMA-48 ANSI escape sequences are a standard for in-band signaling to control cursor location, color, font styling, and other options on video text terminals and terminal emulators. Certain sequences of bytes, most starting with an ASCII escape cha ...
: 0x91 (PU1) and 0x92 (PU2). Unicode includes these at and but defines them as control characters (category Cc), not private-use characters (category Co). Encodings that do not have private use areas but have more or less unused areas, such as
ISO/IEC 8859 ISO/IEC 8859 is a joint International Organization for Standardization, ISO and International Electrotechnical Commission, IEC series of standards for 8-bit character encodings. The series of standards consists of numbered parts, such as ISO/IEC ...
and
Shift JIS Shift JIS (also SJIS, MIME name Shift_JIS, known as PCK in Solaris contexts) is a character encoding for the Japanese language, originally developed by the Japanese company ASCII Corporation in conjunction with Microsoft and standardized as JIS ...
, have seen uncontrolled variants of these encodings evolve. For Unicode, software companies can use the Private Use Areas for their desired additions.


Notes


References

{{DEFAULTSORT:Private Use (Unicode) * Articles with unsupported Private Use Area characters