A Unicode font is a
computer font
A computer font is implemented as a digital data file containing a set of graphically related glyphs. A computer font is designed and created using a font editor. A computer font specifically designed for the computer screen, and not for printi ...
that maps
glyph
A glyph ( ) is any kind of purposeful mark. In typography, a glyph is "the specific shape, design, or representation of a character". It is a particular graphical representation, in a particular typeface, of an element of written language. A ...
s to
code point
A code point, codepoint or code position is a particular position in a Table (database), table, where the position has been assigned a meaning. The table may be one dimensional (a column), two dimensional (like cells in a spreadsheet), three dime ...
s defined in the
Unicode Standard. The vast majority of modern computer fonts use Unicode mappings, even those fonts which only include glyphs for a single
writing system
A writing system comprises a set of symbols, called a ''script'', as well as the rules by which the script represents a particular language. The earliest writing appeared during the late 4th millennium BC. Throughout history, each independen ...
, or even only support the
basic Latin alphabet. The distinction is historic: before Unicode, when most computer systems used only eight-bit
byte
The byte is a unit of digital information that most commonly consists of eight bits. Historically, the byte was the number of bits used to encode a single character of text in a computer and for this reason it is the smallest addressable un ...
s, no more than 256 characters (or control codes) could be encoded. This meant that each
character repertoire had to have its own
codepoint assignments and thus a given codepoint could have multiple meanings. By assuring unique assignments, Unicode resolved this issue.
Fonts which support a wide range of
Unicode scripts and
Unicode symbols are sometimes referred to as "pan-Unicode fonts", although as the maximum number of glyphs that can be defined in a
TrueType
TrueType is an Computer font#Outline fonts, outline font standardization, standard developed by Apple Inc., Apple in the late 1980s as a competitor to Adobe Inc., Adobe's PostScript fonts#Type 1, Type 1 fonts used in PostScript. It has become the ...
font is restricted to 65,535, it is not possible for a single TrueType font to provide individual glyphs for all defined Unicode characters (). This article lists some widely used Unicode fonts (those shipped with an operating system or produced by a well-known commercial font company) that support a comparatively large number and broad range of Unicode characters.
Background
The
Unicode
Unicode or ''The Unicode Standard'' or TUS is a character encoding standard maintained by the Unicode Consortium designed to support the use of text in all of the world's writing systems that can be digitized. Version 16.0 defines 154,998 Char ...
standard does not specify or create any font (
typeface
A typeface (or font family) is a design of Letter (alphabet), letters, Numerical digit, numbers and other symbols, to be used in printing or for electronic display. Most typefaces include variations in size (e.g., 24 point), weight (e.g., light, ...
), a collection of graphical shapes called glyphs, itself. Rather, it defines the abstract characters as a specific number (known as a ''code point'') and also defines the required changes of shape depending on the context the glyph is used in (e.g.,
combining character
In digital typography, combining characters are Character (computing), characters that are intended to modify other characters. The most common combining characters in the Latin script are the combining diacritic, diacritical marks (including c ...
s,
precomposed character
A precomposed character (alternatively composite character or decomposable character) is a Unicode entity that can also be defined as a sequence of one or more other characters. A precomposed character may typically represent a letter with a diac ...
s and
letter-
diacritic
A diacritic (also diacritical mark, diacritical point, diacritical sign, or accent) is a glyph added to a letter or to a basic glyph. The term derives from the Ancient Greek (, "distinguishing"), from (, "to distinguish"). The word ''diacrit ...
combinations). The choice of font, which governs how the abstract characters in the Universal Coded Character Set (UCS) are converted into a bitmap or vector output that can then be viewed on a screen or printed, is left up to the user. If a font is chosen which does not contain a glyph for a code point used in the document, it typically displays a question mark, a box, or some other
substitute character.
Computer font
A computer font is implemented as a digital data file containing a set of graphically related glyphs. A computer font is designed and created using a font editor. A computer font specifically designed for the computer screen, and not for printi ...
s use various techniques to display characters or glyphs. A
bitmap font contains a grid of dots known as
pixel
In digital imaging, a pixel (abbreviated px), pel, or picture element is the smallest addressable element in a Raster graphics, raster image, or the smallest addressable element in a dot matrix display device. In most digital display devices, p ...
s forming an image of each glyph in each face and size.
Outline fonts (also known as vector fonts) use drawing instructions or mathematical formulae to describe each glyph.
Stroke fonts use a series of specified lines (for the glyph's border) and additional information to define the ''profile'', or ''size'' and shape of the line in a specific face and size, which together describe the appearance of the glyph.
Fonts also include embedded special
orthographic rules to output certain combinations of letterforms (and alternative symbols for the same letter) be combined into special
ligature forms (mixed characters).
Operating system
An operating system (OS) is system software that manages computer hardware and software resources, and provides common daemon (computing), services for computer programs.
Time-sharing operating systems scheduler (computing), schedule tasks for ...
s, web browsers (
user agent
On the Web, a user agent is a software agent responsible for retrieving and facilitating end-user interaction with Web content. This includes all web browsers, such as Google Chrome and Safari
A safari (; originally ) is an overland jour ...
), and other software that extensively use typography, use a font to display text on the screen or print media, and can be programmed to use those embedded rules. Alternatively, they may use external script-shaping technologies (rendering technology or “
smart font” engine), and they can also be programmed to use either a large Unicode font, or use multiple different fonts for different characters or languages.
No single "Unicode font" includes all the characters defined in the present
revision of
ISO 10646 (Unicode) standard, as more and more languages and characters
are continually added to it, and common font formats cannot contain more than 65,535 glyphs (about half the number of characters encoded in Unicode). As a result, font developers and foundries incorporate new characters in newer versions or revisions of a font, or in separate auxiliary fonts intended specifically for particular languages.
UCS has over 1.1 million code points, but only the first 65,536 (the Plane 0:
Basic Multilingual Plane, or BMP) had entered into common use before 2000.
:''See the
Unicode planes article for more information on other planes, including: Plane 1:
Supplementary Multilingual Plane
In the Unicode standard, a plane is a contiguous group of 65,536 (216) code points. There are 17 planes, identified by the numbers 0 to 16, which corresponds with the possible values 00–1016 of the first two positions in six position hexadecimal ...
(SMP), Plane 2:
Supplementary Ideographic Plane (SIP), Plane 14:
Supplementary Special-purpose Plane (SSP), Plane 15 and 16: reserved for
Private Use Areas (PUA).''
The first Unicode fonts (with very large character sets and supporting many
Unicode blocks
A Unicode block is one of several contiguous ranges of numeric character codes (code points) of the Unicode character set that are defined by the Unicode Consortium for administrative and documentation purposes. Typically, proposals such as the a ...
) were
Lucida Sans Unicode
Lucida Sans Unicode is an OpenType typeface from the design studio of Bigelow & Holmes,All Bigelow & Holmes Lucida typefaces are distributed by the designers througThe Lucida Fonts Storeand a subset of Lucida fonts is distributed bAscender Corpo ...
(released March 1993),
Unihan font (1993), and
Everson Mono (1995).
Issues
There are typographical ambiguities in Unicode, so that some of the
unified Han characters (seen in Chinese, Japanese, and Korean) will be typographically different in different regions. For example, Unicode point is typographically different between simplified Chinese and traditional Chinese. This has implications for the idea that a single typeface can satisfy the needs of all locales.
[Ken Lunde, ''CJKV Information Processing'', O'Reilly Inc, 1999. Page 128, "CJKV character form differences"]
The design of Unicode ensures that such differences do not create semantic ambiguity, but the use of incorrect forms is often considered visually awkward or aesthetically inappropriate to native readers of East Asian languages.
Application of Unicode fonts
Unicode
Unicode or ''The Unicode Standard'' or TUS is a character encoding standard maintained by the Unicode Consortium designed to support the use of text in all of the world's writing systems that can be digitized. Version 16.0 defines 154,998 Char ...
is now the standard encoding for many new standards and protocols, and is built into the architecture of operating systems (
Microsoft Windows
Windows is a Product lining, product line of Proprietary software, proprietary graphical user interface, graphical operating systems developed and marketed by Microsoft. It is grouped into families and subfamilies that cater to particular sec ...
,
Apple
An apple is a round, edible fruit produced by an apple tree (''Malus'' spp.). Fruit trees of the orchard or domestic apple (''Malus domestica''), the most widely grown in the genus, are agriculture, cultivated worldwide. The tree originated ...
Mac OS, and many versions of
Unix
Unix (, ; trademarked as UNIX) is a family of multitasking, multi-user computer operating systems that derive from the original AT&T Unix, whose development started in 1969 at the Bell Labs research center by Ken Thompson, Dennis Ritchie, a ...
and
Linux
Linux ( ) is a family of open source Unix-like operating systems based on the Linux kernel, an kernel (operating system), operating system kernel first released on September 17, 1991, by Linus Torvalds. Linux is typically package manager, pac ...
), programming languages (
Ada,
Perl
Perl is a high-level, general-purpose, interpreted, dynamic programming language. Though Perl is not officially an acronym, there are various backronyms in use, including "Practical Extraction and Reporting Language".
Perl was developed ...
,
Python,
Java
Java is one of the Greater Sunda Islands in Indonesia. It is bordered by the Indian Ocean to the south and the Java Sea (a part of Pacific Ocean) to the north. With a population of 156.9 million people (including Madura) in mid 2024, proje ...
,
Common LISP
Common Lisp (CL) is a dialect of the Lisp programming language, published in American National Standards Institute (ANSI) standard document ''ANSI INCITS 226-1994 (S2018)'' (formerly ''X3.226-1994 (R1999)''). The Common Lisp HyperSpec, a hyperli ...
,
APL), and libraries (IBM
International Components for Unicode
International Components for Unicode (ICU) is an open-source project of mature C/ C++ and Java libraries for Unicode support, software internationalization, and software globalization. ICU is widely portable to many operating systems and envir ...
(ICU), along with the
Pango,
Graphite
Graphite () is a Crystallinity, crystalline allotrope (form) of the element carbon. It consists of many stacked Layered materials, layers of graphene, typically in excess of hundreds of layers. Graphite occurs naturally and is the most stable ...
,
Scribe
A scribe is a person who serves as a professional copyist, especially one who made copies of manuscripts before the invention of Printing press, automatic printing.
The work of scribes can involve copying manuscripts and other texts as well as ...
,
Uniscribe, and
ATSUI rendering engines), font formats (
TrueType
TrueType is an Computer font#Outline fonts, outline font standardization, standard developed by Apple Inc., Apple in the late 1980s as a competitor to Adobe Inc., Adobe's PostScript fonts#Type 1, Type 1 fonts used in PostScript. It has become the ...
and
OpenType) and so on. Many other standards are also getting upgraded to be Unicode-compliant.
Utility software
Here is a selection of some of the
utility software
Utility software is a program specifically designed to help manage and tune system or application software. It is used to support the computer infrastructure - in contrast to application software, which is aimed at directly performing tasks that b ...
that can identify the characters present in a font file:
*
Character Map, applet included with Microsoft Windows
*
Font Book, application included with Mac OS
*
GNOME Character Map, application included with the GNOME desktop environment
*
BabelMap, third-party software for Windows
List of Unicode fonts
Of the many Unicode fonts available, those listed below are the most commonly used worldwide on mainstream computing
platforms.
; Note
:OTF+TTO:
OpenType font with
TrueType
TrueType is an Computer font#Outline fonts, outline font standardization, standard developed by Apple Inc., Apple in the late 1980s as a competitor to Adobe Inc., Adobe's PostScript fonts#Type 1, Type 1 fonts used in PostScript. It has become the ...
outlines.
:OpenType fonts sometimes don't contain a one-by-one kernpair table but a kern-by-classes table where groups of similar characters are seen as one kern group. For instance, ''V'' and ''W'' have nearly the same left and right geometry. So “0” doesn't mean that no kerning is supported.
:Register after "reasonable" period (author's words).
:Includes more than 27,000 Hanzi glyphs from WenQuanYi Bitmap Song font.
:Han Nom A covers mainly CJK U Ideographs Ext A, and Han Nom B covers mostly Ext B.
:Sun-Ext A covers 102 blocks of different languages. Sun-ExtB covers mostly CJK Supplement, CJK U Ideographs Ext B, C, TaiXuan Jing.
:Zen Hei, Zen Hei Mono and Zen Hei Sharp co-exist in a single TTC file; also with embedded bitmaps. Latin/Hangul derived from UnDotum, Bopomofo derived from cwTeX, mono-spaced Latin from M+ M2 Light. Full CJK coverage. Included with Fedora Linux, Ubuntu Linux.
Comparison of fonts
Number of characters included by the above version of fonts, for different
Unicode block
A Unicode block is one of several contiguous ranges of numeric character codes (code points) of the Unicode character set that are defined by the Unicode Consortium for administrative and documentation purposes. Typically, proposals such as the ...
s are listed below. ''Basic Latin (128: )'' means that in the range called 'Basic Latin', there are 128 assigned codes, numbered 0 to
7F. The cells then show the number of those codes which are covered by each font. Unicode blocks listed are valid for
Unicode version 8.0.
:Cells shaded green indicate complete coverage.
:Cells shaded blue are not complete, but are the most complete of the fonts listed.
:Empty cells indicate that no character exists in that block.
0000–077F
0780–139F
13A0–1DBF
1DC0–257F
2580–2DFF
2E00–4DBF
4DC0–FAFF
FB00–FFFF
List of SMP Unicode fonts
10000–1F9FF
Unicode blocks listed are valid for
Unicode version 8.0.
List of SIP Unicode fonts
20000–2FFFF
Unicode blocks listed are valid for
Unicode version 8.0.
List of SSP Unicode fonts
E0000–EFFFF
Unicode blocks listed are valid for
Unicode version 8.0.
See also
References
External links
ISO/IEC JTC1/SC2/WG2 the working group in charge of ISO 10646
at Unicode.org
Unicode Font Guide For Free/Libre Open Source Operating Systems— A huge index of high quality free fonts.
— Index of free and commercial Unicode fonts.
— Enable Unicode for applications.
Microsoft Typography – Fonts and Products— Reference for determining which fonts are supplied with Microsoft products.
{{Unicode navigation
font
In metal typesetting, a font is a particular size, weight and style of a ''typeface'', defined as the set of fonts that share an overall design.
For instance, the typeface Bauer Bodoni (shown in the figure) includes fonts " Roman" (or "regul ...
Unicode typefaces
Natural language and computing