( ja, 文字鏡), also known by its full name , is a

character encoding Character encoding is the process of assigning numbers to Graphics, graphical character (computing), characters, especially the written characters of Language, human language, allowing them to be Data storage, stored, Data communication, transmi ...

scheme. The , which published the character set, also published

computer software Software is a set of computer programs and associated documentation and data. This is in contrast to hardware, from which the system is built and which actually performs the work. At the lowest programming level, executable code consists ...

and TrueType

font In metal typesetting, a font is a particular size, weight and style of a typeface. Each font is a matched set of type, with a piece (a "sort") for each glyph. A typeface consists of a range of such fonts that shared an overall design. In mod ...

s to accompany it. The Mojikyō Institute, chaired by , originally had its character set and related software and data redistributed on

CD-ROMs A CD-ROM (, compact disc read-only memory) is a type of read-only memory consisting of a pre-pressed optical compact disc that contains data. Computers can read—but not write or erase—CD-ROMs. Some CDs, called enhanced CDs, hold both compute ...

sold in

Kinokuniya is a Japanese bookstore chain operated by , founded in 1927, with its first store located in Shinjuku, Tokyo, Japan. Its name translates to "Bookstore of Kii Province". The company has its headquarters in Meguro, Tokyo. One of the company's g ...

stores. Conceptualized in 1996, the first version of the CD-ROM was released in July 1997. For a time, the Mojikyō Institute also offered a web subscription, termed " WEB" (), which had more up-to-date characters. , ''Mojikyō'' encoded 174,975 characters. Among those, 150,366 characters (

\approx

86%) then belonged to the extended Chinese–Japanese–Korean–Vietnamese (CJKV)For Korean,

Hanja Hanja (Hangul: ; Hanja: , ), alternatively known as Hancha, are Chinese characters () used in the writing of Korean. Hanja was used as early as the Gojoseon period, the first ever Korean kingdom. (, ) refers to Sino-Korean vocabulary, wh ...

are referred to. For Vietnamese,

Chữ Nôm Chữ Nôm (, ; ) is a logographic writing system formerly used to write the Vietnamese language. It uses Chinese characters (''Chữ Hán'') to represent Sino-Vietnamese vocabulary and some native Vietnamese words, with other words represented ...

. family. Many of ''Mojikyō'''s characters are considered obsolete or obscure, and are not encoded by any other character set, including the most widely used international text encoding standard,

Unicode Unicode, formally The Unicode Standard,The formal version reference is is an information technology Technical standard, standard for the consistent character encoding, encoding, representation, and handling of Character (computing), text expre ...

. Originally a paid proprietary software product, as of 2015, the Mojikyō Institute began to upload its latest releases to

Internet Archive The Internet Archive is an American digital library with the stated mission of "universal access to all knowledge". It provides free public access to collections of digitized materials, including websites, software applications/games, music, ...

freeware Freeware is software, most often proprietary, that is distributed at no monetary cost to the end user. There is no agreed-upon set of rights, license, or EULA that defines ''freeware'' unambiguously; every publisher defines its own rules for the f ...

, as a

memorial A memorial is an object or place which serves as a focus for the memory or the commemoration of something, usually an influential, deceased person or a historical, tragic event. Popular forms of memorials include landmark objects or works of a ...

to honor one of its developers, , who died that year. On December 15, 2018, version 4.0 was released. The next day, Ishikawa announced that without Furuya this would be the final release of ''Mojikyō''.

Premise

The encoding was created to provide a complete index of Chinese, Korean, and Japanese characters. It also encodes a large number of characters in ancient scripts, such as the

oracle bone script Oracle bone script () is an ancient form of Chinese characters that were engraved on oracle bonesanimal bones or Turtle shell#Plastron, turtle plastrons used in pyromancy, pyromantic divination. Oracle bone script was used in the late 2nd millen ...

, the

seal script Seal script, also sigillary script () is an ancient style of writing Chinese characters that was common throughout the latter half of the 1st millennium BC. It evolved organically out of the Zhou dynasty bronze script. The Qin variant of seal ...

, and

Sanskrit Sanskrit (; attributively , ; nominally , , ) is a classical language belonging to the Indo-Aryan branch of the Indo-European languages. It arose in South Asia after its predecessor languages had diffused there from the northwest in the late ...

( Siddhaṃ). For many characters, it is the only

to encode them, and its data is often used as a starting point for

proposals. However, has much looser standards than Unicode for encoding, which leads to have many encoded glyphs of dubious, or even unintentionally fictional, origin. As such, while many non-Unicode characters are suitable for addition to Unicode, not all can become Unicode characters, due to the differing standards of evidence required by each.

Composition

The fonts () are

TrueType fonts TrueType is an outline font standard developed by Apple in the late 1980s as a competitor to Adobe's Type 1 fonts used in PostScript. It has become the most common format for fonts on the classic Mac OS, macOS, and Microsoft Windows operating sy ...

that come in a

ZIP file ZIP is an archive file format that supports lossless data compression. A ZIP file may contain one or more files or directories that may have been compressed. The ZIP file format permits a number of compression algorithms, though DEFLATE is th ...

and are each around 25

megabytes The megabyte is a multiple of the unit byte for digital information. Its recommended unit symbol is MB. The unit prefix ''mega'' is a multiplier of (106) in the International System of Units (SI). Therefore, one megabyte is one million bytes o ...

; the different fonts contain different numbers of characters.Download the file fro
the official website
/ref> Also included is a Windows executable that implements a

graphical Graphics () are visual images or designs on some surface, such as a wall, canvas, screen, paper, or stone, to inform, illustrate, or entertain. In contemporary usage, it includes a pictorial representation of data, as in design and manufacture, ...

character map Character Map is a utility included with Microsoft Windows operating systems and is used to view the characters in any installed font, to check what keyboard input ( Alt code) is used to enter those characters, and to copy characters to the cli ...

, the " Character Map" (), .English name from the title of the window produced by running the executable; Japanese name from the icon of the executable.Also called the "Mojikyō Cmap". allows users to browse through the fonts, and copy and paste characters in lieu of typing them on the keyboard. As opposed to the regular Windows character map, or for that matter KCharSelect, which both support TrueType fonts, displays the numbered encoding slot of the requested character.See the screenshots o
the official website
/ref> In order for to work, all fonts must be installed.Into the system fonts directory .

Encoding

When referring to a character encoded in , the format MJXXXXXX is often used, similar to the U+XXXX format used for Unicode. For example, ''

hentaigana In the Japanese writing system, are variant forms of hiragana. History Today, with few exceptions, there is only one hiragana for each of the forty-five moras that are written without diacritics or digraphs. However, traditionally the ...

'' has encoding MJ090007 and Unicode encoding U+1B008. A difference, however, is that encodings displayed this way are

decimal The decimal numeral system (also called the base-ten positional numeral system and denary or decanary) is the standard system for denoting integer and non-integer numbers. It is the extension to non-integer numbers of the Hindu–Arabic numeral ...

, while Unicode's U+ encoding is

hexadecimal In mathematics and computing, the hexadecimal (also base-16 or simply hex) numeral system is a positional numeral system that represents numbers using a radix (base) of 16. Unlike the decimal system representing numbers using 10 symbols, hexa ...

. From the earliest days of Unicode, has both influenced—and been influenced by—the standard. Glyphs originating from first appear in a proposal to the

Ideographic Rapporteur Group The Ideographic Research Group (IRG), formerly called the Ideographic Rapporteur Group, is a subgroup of Working Group 2 (WG2) of ISO/IEC JTC 1/SC 2 (SC 2), the subcommittee of the Joint Technical Committee of ISO and IEC which is responsible for d ...

(IRG),As of 2019, the IRG rebranded as the Ideographic Research Group. which is responsible for maintaining all CJK blocks in Unicode, on 18 April 2002. In May 2007, played a minor role in an eventually successful series of proposals to encode the

Tangut script The Tangut script ( Tangut: ; ) was a logographic writing system, used for writing the extinct Tangut language of the Western Xia dynasty. According to the latest count, 5863 Tangut characters are known, excluding variants. The Tangut character ...

in Unicode;The history of the encoding of the Tangut script is quite complicated, see for a full listing of all the related proposals and a timeline. already had within its encoding 6,000 Tangut characters by October 2002. The Unicode Standard's Unihan Database refers to as the "Japanese

KOKUJI are the logographic Chinese characters taken from the Chinese script and used in the writing of Japanese. They were made a major part of the Japanese writing system during the time of Old Japanese and are still used, along with the subsequent ...

Collection" (), abbreviated "JK". For example, ,

Ideographic Description Sequence The Chinese character description languages are several proposed languages to most accurately and completely describe Chinese (or CJK) characters and information such as their list of components, list of strokes (basic and complex), their order, a ...

: an ideograph read in Japanese as , has a J-SourceThis is a column name in the

Unihan database Han unification is an effort by the authors of Unicode and the Universal Character Set to map multiple character sets of the Han characters of the so-called CJK languages into a single set of unified characters. Han characters are a feature s ...

; ⟨J⟩ here is short for "Japanese glyph source". The full name of the column is . Under

Han unification Han unification is an effort by the authors of Unicode and the Universal Character Set to map multiple character sets of the Han characters of the so-called CJK languages into a single set of unified characters. Han characters are a feature s ...

, there are nine such sources. See §3.1 of UAX#38 for a complete list and more information. equal to JK-66038. All Unicode characters with a JK-prefixed J-Source originate from .Other J-Source prefixes exist, such as J4, meaning the character originates from JIS X 0213:2004. According to

Ken Lunde Ken Roger Lunde (, born 12 August 1965 in Madison, Wisconsin)Lunde, 2008. is an American specialist in information processing for East Asian languages. Academic Background Ken majored in linguistics at University of Wisconsin–Madison in 1985, w ...

, a subject matter expert in character encodings and

East Asian languages The East Asian languages are a language family (alternatively ''macrofamily'' or ''superphylum'') proposed by Stanley Starosta in 2001. The proposal has since been adopted by George van Driem. Classifications Early proposals Early proposals of s ...

, as of Unicode 13.0, 782 ideographs in Unicode originate from , split somewhat evenly between two blocks:

CJK Unified Ideographs Extension C __FORCETOC__ CJK Unified Ideographs Extension C is a Unicode block containing rare and historic CJK ideographs for Chinese, Japanese, Korean, and Vietnamese. The block has dozens of ideographic variation sequences registered in the Unicode Ide ...

, with 367, and

CJK Unified Ideographs Extension E CJK Unified Ideographs Extension E is a Unicode block A Unicode block is one of several contiguous ranges of numeric character codes ( code points) of the Unicode character set that are defined by the Unicode Consortium for administrative and d ...

, with 415. Not all Unicode characters with origins (JK-prefixed J-Sources) have the same representative glyph in the code chart as in the font;That is to say, a glyph made up of the same radicals in the same positions. some characters had their shapes changed before final encoding, as investigation showed the shapes assigned by the Mojikyō Institute were wrong.Errors in large collections of ideographs are, of course, not uncommon. Such errors even accidentally occur in well funded government-produced collections, such as the famous kanji from unknown sources in the

Japanese Industrial Standards Committee The is a standards organization and is the International Organization for Standardization (ISO) member body for Japan. It is also a member of the International Electrotechnical Commission. The committee consists of a Council under the Ministry o ...

JIS X 0208 JIS X 0208 is a 2-byte character set specified as a Japanese Industrial Standard, containing 6879 graphic characters suitable for writing text, place names, personal names, and so forth in the Japanese language. The official title of the current ...

double-byte character encoding standard. All of these JIS X 0208 error kanji (; e.g., ) have made their way into Unicode despite not being "real" kanji.

Blocks

it encoded 174,975 characters. Among those, 150,366 characters then belonged to the extended

CJKV In internationalization, CJK characters is a collective term for the Chinese, Japanese, and Korean languages, all of which include Chinese characters and derivatives in their writing systems, sometimes paired with other scripts. Collectively, the ...

family. Many of the encoded characters are considered obsolete or otherwise obscure, and are not encoded by any other character set, including the international standard, Unicode. Each character has a unique number, and the characters are organized into blocks. puts CJKV characters in different blocks according to their traditional ''Kangxi'' radical. Common radicals containing an especially high number of characters, such as Radicals 9 () and 162 (), are split further by stroke order.For proof, see the list in the Mojikyō Character Map, .

No unification

Unlike Unicode, purposely avoids

; no attempt at compactness of the encoding is made, nor is there an attempt to keep all common characters below U+FFFF as there is in Unicode. Unicode, on the other hand, sorts its CJK into blocks based on how common they are: the most common are generally put into the

Basic Multilingual Plane In the Unicode standard, a plane is a continuous group of 65,536 (216) code points. There are 17 planes, identified by the numbers 0 to 16, which corresponds with the possible values 00–1016 of the first two positions in six position hexadecimal ...

, while those that are rare or obscure are put into the Astral Planes. For example,

Radical 9 Radical 9 or radical man () meaning "person" is a Kangxi radicals. Of the 214 radicals, Radical 9 is one of 23 which are composed of 2 strokes. When appearing at the left side of a Chinese character, it usually transforms into . In the ''Kangxi ...

has two characters where Unicode has one: MJ054435 (), and MJ059031 (), both represented in Unicode as .

License

proprietary software Proprietary software is software that is deemed within the free and open-source software to be non-free because its creator, publisher, or other rightsholder or rightsholder partner exercises a legal monopoly afforded by modern copyright and int ...

under a restrictive license. Originally, the Mojikyō Institute tried to prevent its character data from being used, and threatened those who published conversion tables to and from its character set. In July 2010, the Mojikyō Institute abandoned its legal efforts to stop at least one Japanese user from publishing conversion tables or converting characters encoded in to Unicode or other character sets. Mere data, sometimes including the shapes of letters, are considered in many jurisdictions to be

common property Common ownership refers to holding the assets of an organization, Business, enterprise or community indivisibly rather than in the names of the individual members or groups of members as common property. Forms of common ownership exist in eve ...

as they do not meet the

threshold of originality Threshold may refer to: Architecture * Threshold (door), the sill of a door Media * ''Threshold'' (1981 film) * ''Threshold'' (TV series), an American science fiction drama series produced during 2005-2006 * "Threshold" (''Stargate SG-1''), ...

.See also:

fictitious entry Fictitious or fake entries are deliberately incorrect entries in reference works such as dictionaries, encyclopedias (including Wikipedia), maps, and directories. There are more specific terms for particular kinds of fictitious entry, such as Moun ...

;

trap street In cartography, a trap street is a fictitious entry in the form of a misrepresented street on a map, often outside the area the map nominally covers, for the purpose of "trapping" potential plagiarists of the map who, if caught, would be unable t ...

. Due to this legacy, however, disallowed data as of 2020.

Collected writing systems

Living

Chinese Chinese can refer to: * Something related to China * Chinese people, people of Chinese nationality, citizenship, and/or ethnicity **''Zhonghua minzu'', the supra-ethnic concept of the Chinese nation ** List of ethnic groups in China, people of va ...

—

Hanzi Chinese characters () are logograms developed for the writing of Chinese. In addition, they have been adapted to write other East Asian languages, and remain a key component of the Japanese writing system where they are known as ''kanji' ...

Japanese Japanese may refer to: * Something from or related to Japan, an island country in East Asia * Japanese language, spoken mainly in Japan * Japanese people, the ethnic group that identifies with Japan through ancestry or culture ** Japanese diaspor ...

—

Kanji are the logographic Chinese characters taken from the Chinese family of scripts, Chinese script and used in the writing of Japanese language, Japanese. They were made a major part of the Japanese writing system during the time of Old Japanese ...

Kana The term may refer to a number of syllabaries used to write Japanese phonological units, morae. Such syllabaries include (1) the original kana, or , which were Chinese characters (kanji) used phonetically to transcribe Japanese, the most pr ...

(including

Hentaigana In the Japanese writing system, are variant forms of hiragana. History Today, with few exceptions, there is only one hiragana for each of the forty-five moras that are written without diacritics or digraphs. However, traditionally the ...

) *

Korean Korean may refer to: People and culture * Koreans, ethnic group originating in the Korean Peninsula * Korean cuisine * Korean culture * Korean language **Korean alphabet, known as Hangul or Chosŏn'gŭl **Korean dialects and the Jeju language ** ...

—

Latin alphabet The Latin alphabet or Roman alphabet is the collection of letters originally used by the ancient Romans to write the Latin language. Largely unaltered with the exception of extensions (such as diacritics), it used to write English and the o ...

with diacritics *

Cyrillic script The Cyrillic script ( ), Slavonic script or the Slavic script, is a writing system used for various languages across Eurasia. It is the designated national script in various Slavic languages, Slavic, Turkic languages, Turkic, Mongolic languages, ...

with diacritics

Dead or obsolete

* Ancient Chinese **

Oracle bone script Oracle bone script () is an ancient form of Chinese characters that were engraved on oracle bonesanimal bones or Turtle shell#Plastron, turtle plastrons used in pyromancy, pyromantic divination. Oracle bone script was used in the late 2nd millen ...

Seal script Seal script, also sigillary script () is an ancient style of writing Chinese characters that was common throughout the latter half of the 1st millennium BC. It evolved organically out of the Zhou dynasty bronze script. The Qin variant of seal ...

Taiwanese kana Taiwanese kana (, Pe̍h-ōe-jī : "tâi oân gí ká biêng", IPA : ) is a katakana-based writing system that was used to write Taiwanese Hokkien (commonly called "Taiwanese") when the island of Taiwan was under Japanese rule. It functioned ...

Vietnamese Vietnamese may refer to: * Something of, from, or related to Vietnam, a country in Southeast Asia ** A citizen of Vietnam. See Demographics of Vietnam. * Vietnamese people, or Kinh people, a Southeast Asian ethnic group native to Vietnam ** Overse ...

—

— Siddhaṃ *

* Sui script

References

Notes

External links

* Character sets Encodings of Asian languages Encodings of Japanese 1997 establishments in Japan Software companies established in 1997 Han character input Chinese-language computing Japanese-language computing Korean-language computing Indic computing Language software for Windows CJK typefaces Symbol typefaces Latin-script typefaces Tangut script Windows-only freeware {{Character encodings