Ruby characters or rubi characters () are small, annotative
glosses that are usually placed above or to the right of
logographic
In a written language, a logogram (from Ancient Greek 'word', and 'that which is drawn or written'), also logograph or lexigraph, is a written character that represents a semantic component of a language, such as a word or morpheme. Chinese c ...
characters of languages in the
East Asian cultural sphere
The Sinosphere, also known as the Chinese cultural sphere, East Asian cultural sphere, or the Sinic world, encompasses multiple countries in East Asia and Southeast Asia that were historically heavily influenced by Chinese culture. The Sinosph ...
, such as
Chinese ''hanzi'',
Japanese ''
kanji
are logographic Chinese characters, adapted from Chinese family of scripts, Chinese script, used in the writing of Japanese language, Japanese. They were made a major part of the Japanese writing system during the time of Old Japanese and are ...
'', and
Korean ''
hanja
Hanja (; ), alternatively spelled Hancha, are Chinese characters used to write the Korean language. After characters were introduced to Korea to write Literary Chinese, they were adapted to write Korean as early as the Gojoseon period.
() ...
'', to show the logographs' pronunciation; these were formerly also used for
Vietnamese ''
chữ Hán
( , ) are the Chinese characters that were used to write Literary Chinese in Vietnam, Literary Chinese (; ) and Sino-Vietnamese vocabulary in Vietnamese language, Vietnamese. They were officially used in Vietnam after the Red River Delta region ...
'' and ''
chữ Nôm
Chữ Nôm (, ) is a logographic writing system formerly used to write the Vietnamese language. It uses Chinese characters to represent Sino-Vietnamese vocabulary and some native Vietnamese words, with other words represented by new characters ...
'', and may still occasionally be seen in that context when reading archaic texts. Typically called just ruby or rubi, such annotations are most commonly used as pronunciation guides for characters that are likely to be unfamiliar to the reader.
Examples
Here is an example of Japanese ruby characters (called ''
furigana
is a Japanese reading aid consisting of smaller kana (syllabic characters) printed either above or next to kanji (logographic characters) or other characters to indicate their pronunciation. It is one type of ruby text. Furigana is also know ...
'') for
Tokyo
Tokyo, officially the Tokyo Metropolis, is the capital of Japan, capital and List of cities in Japan, most populous city in Japan. With a population of over 14 million in the city proper in 2023, it is List of largest cities, one of the most ...
(""):
Most are written with the ''hiragana'' syllabary, but ''
katakana
is a Japanese syllabary, one component of the Japanese writing system along with hiragana, kanji and in some cases the Latin script (known as rōmaji).
The word ''katakana'' means "fragmentary kana", as the katakana characters are derived fr ...
'' and ''
romaji
The romanization of Japanese is the use of Latin script to write the Japanese language. This method of writing is sometimes referred to in Japanese as .
Japanese is normally written in a combination of logogram, logographic characters borrowe ...
'' are also occasionally used. Alternatively, sometimes foreign words (usually English) are printed with furigana to provide the meaning, and vice versa. Textbooks sometimes render
on-readings with katakana and
kun-readings with hiragana.
Here is an example of ruby characters for
Beijing
Beijing, Chinese postal romanization, previously romanized as Peking, is the capital city of China. With more than 22 million residents, it is the world's List of national capitals by population, most populous national capital city as well as ...
("") in Zhuyin (a.k.a. Bopomofo),
Xiao'erjing
Xiao'erjing, Xiaorjing, Xiaojing or Benjing, is a Arabic script, Perso-Arabic script used to write Sinitic languages, including Lanyin Mandarin, Zhongyuan Mandarin, Northeastern Mandarin, and Dungan language, Dungan. It is used on occasion ...
, and Pinyin.
In Taiwan, the main syllabary used for Chinese ruby characters is ''
Zhuyin fuhao'' (also known as ''Bopomofo''); in mainland China ''
pinyin
Hanyu Pinyin, or simply pinyin, officially the Chinese Phonetic Alphabet, is the most common romanization system for Standard Chinese. ''Hanyu'' () literally means 'Han Chinese, Han language'—that is, the Chinese language—while ''pinyin' ...
'' is mainly used. Typically, unlike the example shown above, zhuyin is used with a vertical traditional writing and zhuyin is written on the right side of the characters. In mainland China, horizontal script is used and ruby characters (pinyin) are written above the Chinese characters.
Xiao'erjing
Xiao'erjing, Xiaorjing, Xiaojing or Benjing, is a Arabic script, Perso-Arabic script used to write Sinitic languages, including Lanyin Mandarin, Zhongyuan Mandarin, Northeastern Mandarin, and Dungan language, Dungan. It is used on occasion ...
is a Perso-Arabic alphabet, adopted by
Hui Muslims and at times utilized as ruby characters in various manuscripts. This system does have its shortcomings, mainly that it has no way of indicating tones. With the spread of pinyin, the usage of this system has been in decline in the past decades. Most manuscripts that do mark the characters with Xiao'erjing, do so from right-to-left, which is quite unique, compared to other systems. This is because usually such manuscripts include Arabic texts such as the Quran, and the Chinese writing is the explanation or translation.
Books with phonetic guides (especially pinyin) are popular with children and foreigners learning Chinese.
Here is an example of the Korean ruby characters for
Korea
Korea is a peninsular region in East Asia consisting of the Korean Peninsula, Jeju Island, and smaller islands. Since the end of World War II in 1945, it has been politically Division of Korea, divided at or near the 38th parallel north, 3 ...
(""):
Romaja is normally used in foreign textbooks until Hangul is introduced. Ruby characters can be quite common on signs in certain parts of South Korea.
Here is an example of the Vietnamese ruby characters () for
Hanoi
Hanoi ( ; ; ) is the Capital city, capital and List of cities in Vietnam, second-most populous city of Vietnam. The name "Hanoi" translates to "inside the river" (Hanoi is bordered by the Red River (Asia), Red and Black River (Asia), Black Riv ...
(""):

Chinese characters and its derivations of it (''chữ Hán'' and ''chữ Nôm'') which was used by the
Vietnamese have fallen out of use in favour of
Latin
Latin ( or ) is a classical language belonging to the Italic languages, Italic branch of the Indo-European languages. Latin was originally spoken by the Latins (Italic tribe), Latins in Latium (now known as Lazio), the lower Tiber area aroun ...
-based script ''
chữ Quốc ngữ'' during the French colonial period when it was made a part of compulsory education (1920s onwards). Currently still used by
Gin people.
Uses
Ruby may be used for different reasons:
* because the character is rare and the pronunciation unknown to many—personal name characters often fall into this category;
* because the character has more than one pronunciation, and the context is insufficient to determine which to use;
* because the intended readers of the text are still learning the language and are not expected to always know the pronunciation or meaning of a term;
* because the author is using a nonstandard pronunciation for a character or a term
Also, ruby may be used to show the meaning, rather than pronunciation, of a possibly-unfamiliar (usually foreign) or slang word. This is generally used with spoken dialogue and applies only to Japanese publications. The most common form of ruby is called ''furigana'' or ''yomigana'' and is found in Japanese instructional books, newspapers, comics and books for children.
In Japanese, certain characters, such as the
sokuon () (little ''tsu'', ) that indicates a pause before the consonant it precedes, are normally written at about half the size of normal characters. When written as ruby, such characters are usually the same size as other ruby characters. Advancements in technology now allow certain characters to render accurately.
In Chinese, the practice of providing phonetic cues via ruby is rare, but does occur systematically in grade-school level text books or dictionaries. The Chinese have no special name for this practice, as it is not as widespread as in Japan. In Taiwan, it is known as "
zhuyin", from the name of the phonetic system employed for this purpose there. It is virtually always used vertically, because publications are normally in a vertical format, and zhuyin is not as easy to read when presented horizontally. Where zhuyin is not used, other Chinese phonetic systems like
pinyin
Hanyu Pinyin, or simply pinyin, officially the Chinese Phonetic Alphabet, is the most common romanization system for Standard Chinese. ''Hanyu'' () literally means 'Han Chinese, Han language'—that is, the Chinese language—while ''pinyin' ...
are employed.
In academic settings, Vietnamese text written in or may be glossed with ruby for modern readers.
Sometimes
interlinear glosses are visually similar to ruby, appearing above or below the main text in smaller type. However, this is a distinct practice used for helping students of a foreign language by giving glosses for the words in a text, as opposed to the pronunciation of lesser-known characters.
Ruby annotation can also be used in handwriting.
History

In British typography, ''
ruby
Ruby is a pinkish-red-to-blood-red-colored gemstone, a variety of the mineral corundum ( aluminium oxide). Ruby is one of the most popular traditional jewelry gems and is very durable. Other varieties of gem-quality corundum are called sapph ...
'' was originally the name for type with a height of 5.5
points, which printers used for interlinear annotations in printed documents. In Japanese, rather than referring to a font size, the word became the name for typeset ''furigana''. When transliterated back into English, some texts rendered the word as ''rubi'' (a typical
romanisation
In linguistics, romanization is the conversion of text from a different writing system to the Roman (Latin) script, or a system for doing so. Methods of romanization include transliteration, for representing written text, and transcription, ...
of the Japanese word , instead of (''rubī''), the expected transliteration of ''ruby''). However, the spelling "ruby" has become more common since the
W3C
The World Wide Web Consortium (W3C) is the main international standards organization for the World Wide Web. Founded in 1994 by Tim Berners-Lee, the consortium is made up of member organizations that maintain full-time staff working together in ...
published a recommendation for ''ruby
markup''. In the US, the font size had been called "
agate", a term in use since 1831 according to the ''
Oxford English Dictionary
The ''Oxford English Dictionary'' (''OED'') is the principal historical dictionary of the English language, published by Oxford University Press (OUP), a University of Oxford publishing house. The dictionary, which published its first editio ...
''.
HTML markup
In 2001, the W3C published the Ruby Annotation specification
for supplementing
XHTML
Extensible HyperText Markup Language (XHTML) is part of the family of XML markup languages which mirrors or extends versions of the widely used HyperText Markup Language (HTML), the language in which Web pages are formulated.
While HTML, pr ...
with ruby markup. Ruby markup is incorporated into the XHTML 1.1 specification and in HTML5.
For browsers that do not support Ruby natively, Ruby support is most easily added by using
CSS rules that are available on the web.
[CSS Ruby Support](_blank)
—Works in all modern browsers
Ruby markup is structured such that a fallback rendering, consisting of the ruby characters in parentheses immediately after the main text, appears if the browser does not support ruby.
The W3C is also working on a specific ruby module for
CSS level 2, which additionally allows the grouping of ruby and automatic omission of furigana matching their annotated part.
Markup examples
Below are a few examples of ruby markup. The markup is shown first, and the rendered markup is shown next, followed by the unmarked version. Web browsers either render it with the correct size and positioning as shown in the table-based examples above, or use the fallback rendering with the ruby characters in parentheses:
Note that Chinese ruby text would normally be displayed in vertical columns to the right of each character. This approach is not typically supported in browsers at present.
This is a table-based example of vertical columns:
Complex ruby markup
Complex ruby markup makes it possible to associate more than one ruby text with a base text, or parts of ruby text with parts of base text.
[Complex ruby markup](_blank)
/ref>
Unicode
Unicode
Unicode or ''The Unicode Standard'' or TUS is a character encoding standard maintained by the Unicode Consortium designed to support the use of text in all of the world's writing systems that can be digitized. Version 16.0 defines 154,998 Char ...
and its companion standard, the Universal Character Set
The Universal Coded Character Set (UCS, Unicode) is a standard set of characters defined by the international standard ISO/ IEC 10646, ''Information technology — Universal Coded Character Set (UCS)'' (plus amendments to that standard), w ...
, support ruby via these ''interlinear annotation'' characters:
* Code point FFF9
( hex)—Interlinear annotation anchor—marks start of annotated text
* Code point FFFA
(hex)—Interlinear annotation separator—marks start of annotating character(s)
* Code point FFFB
(hex)—Interlinear annotation terminator—marks end of annotated text
Few applications implement these characters. Unicode Technical Report #20 clarifies that these characters are not intended to be exposed to users of markup languages and software applications, and are instead for internal use either in systems or the applications themselves. It suggests that ruby markup be used instead, where appropriate.
The interlinear annotation characters are part of the "Specials" Unicode block:
ANSI
ISO/IEC 6429 (also known as ECMA-48) which defines the ANSI escape code
ANSI escape sequences are a standard for in-band signaling to control cursor location, color, font styling, and other options on video text terminals and terminal emulators. Certain sequences of bytes, most starting with an Escape character#ASC ...
s also provided a mechanism for ruby text for use by text terminals, although few terminals and terminal emulators implement it. The PARALLEL TEXTS (PTX) escape code accepted six parameter values giving the following escape sequences for marking ruby text:
* CSI 0 \
(or simply CSI \
since 0 is used as the default value for this control) – end of parallel texts
* CSI 1 \
– beginning of a string of principal parallel text
* CSI 2 \
– beginning of a string of supplementary parallel text
* CSI 3 \
– beginning of a string of supplementary Japanese phonetic annotation
* CSI 4 \
– beginning of a string of supplementary Chinese phonetic annotation
* CSI 5 \
– end of a string of supplementary phonetic annotations
See also
*, and Furigana
is a Japanese reading aid consisting of smaller kana (syllabic characters) printed either above or next to kanji (logographic characters) or other characters to indicate their pronunciation. It is one type of ruby text. Furigana is also know ...
(Japanese)
* Emphasis points, marks use for emphasis, which can be implemented similarly to ruby
* Harakat – vocalised Arabic script
The Arabic script is the writing system used for Arabic (Arabic alphabet) and several other languages of Asia and Africa. It is the second-most widely used alphabetic writing system in the world (after the Latin script), the second-most widel ...
diacritical marks that provide phonetic assistance for reading texts in Arabic
Arabic (, , or , ) is a Central Semitic languages, Central Semitic language of the Afroasiatic languages, Afroasiatic language family spoken primarily in the Arab world. The International Organization for Standardization (ISO) assigns lang ...
.
*Niqqud
In Hebrew orthography, niqqud or nikud ( or ) is a system of diacritical signs used to represent vowels or distinguish between alternative pronunciations of letters of the Hebrew alphabet. Several such diacritical systems were developed in the Ea ...
– vocalised Hebrew script
The Hebrew alphabet (, ), known variously by scholars as the Ktav Ashuri, Jewish script, square script and block script, is a unicase, unicameral abjad script used in the writing of the Hebrew language and other Jewish languages, most notably ...
vowel pointings that provide phonetic assistance for reading Hebrew
Hebrew (; ''ʿÎbrit'') is a Northwest Semitic languages, Northwest Semitic language within the Afroasiatic languages, Afroasiatic language family. A regional dialect of the Canaanite languages, it was natively spoken by the Israelites and ...
. (The Hebrew abjad
An abjad ( or abgad) is a writing system in which only consonants are represented, leaving the vowel sounds to be inferred by the reader. This contrasts with alphabets, which provide graphemes for both consonants and vowels. The term was introd ...
represents only the consonants.)
References
Further reading
* {{cite book, title=CJKV Information Processing, first=Ken, last=Lunde, author-link=Ken Lunde, location=Sebastopol, California, publisher=O'Reilly Media
O'Reilly Media, Inc. (formerly O'Reilly & Associates) is an American learning company established by Tim O'Reilly that provides technical and professional skills development courses via an online learning platform. O'Reilly also publishes b ...
, year=2009, isbn=978-0-596-51447-1, url=https://books.google.com/books?id=SA92uQqTB-AC&pg=PA529, via=Google Books
Sino-Tibetan languages
Japanese writing system
Phonetic guides
Typography
HTML
East Asian typography