The Lotus Multi-Byte Character Set (LMBCS) is a proprietary multi-byte

character encoding Character encoding is the process of assigning numbers to Graphics, graphical character (computing), characters, especially the written characters of Language, human language, allowing them to be Data storage, stored, Data communication, transmi ...

originally conceived in 1988 at

Lotus Development Corporation Lotus Software (called Lotus Development Corporation before its acquisition by IBM) was an American software company based in Massachusetts; it was "offloaded" to India's HCL Technologies in 2018. Lotus is most commonly known for the Lotus 1-2- ...

with input from Bob Balaban and others. Created around the same time and addressing some of the same problems, LMBCS could be viewed as parallel development and possible alternative to

Unicode Unicode, formally The Unicode Standard,The formal version reference is is an information technology Technical standard, standard for the consistent character encoding, encoding, representation, and handling of Character (computing), text expre ...

. For maximum compatibility, later issues of LMBCS incorporate UTF-16 as a subset. Commercially, LMBCS was first introduced as the default character set of

Lotus 1-2-3 Release 3 Lotus 1-2-3 is a discontinued spreadsheet program from Lotus Software (later part of IBM). It was the first killer application of the IBM PC, was hugely popular in the 1980s, and significantly contributed to the success of IBM PC-compatibles i ...

for DOS in March 1989 and Lotus 1-2-3/G Release 1 for OS/2 in 1990 replacing the 8-bit

Lotus International Character Set The Lotus International Character Set (LICS) is a proprietary single-byte character encoding introduced in 1985 by Lotus Development Corporation. It is based on the 1983 DEC Multinational Character Set (MCS) for VT220 terminals. As such, LICS is ...

(LICS) and

ASCII ASCII ( ), abbreviated from American Standard Code for Information Interchange, is a character encoding standard for electronic communication. ASCII codes represent text in computers, telecommunications equipment, and other devices. Because of ...

used in earlier DOS-only versions of Lotus 1-2-3 and

Symphony A symphony is an extended musical composition in Western classical music, most often for orchestra. Although the term has had many meanings from its origins in the ancient Greek era, by the late 18th century the word had taken on the meaning com ...

. LMBCS is also used in IBM/

Lotus Lotus may refer to: Plants *Lotus (plant), various botanical taxa commonly known as lotus, particularly: ** ''Lotus'' (genus), a genus of terrestrial plants in the family Fabaceae **Lotus flower, a symbolically important aquatic Asian plant also ...

SmartSuite, Notes and Domino, as well as in a number of third-party products. LMBCS encodes the characters required for languages using the

Latin Latin (, or , ) is a classical language belonging to the Italic branch of the Indo-European languages. Latin was originally a dialect spoken in the lower Tiber area (then known as Latium) around present-day Rome, but through the power of the ...

Arabic Arabic (, ' ; , ' or ) is a Semitic languages, Semitic language spoken primarily across the Arab world.Semitic languages: an international handbook / edited by Stefan Weninger; in collaboration with Geoffrey Khan, Michael P. Streck, Janet C ...

Hebrew Hebrew (; ; ) is a Northwest Semitic language of the Afroasiatic language family. Historically, it is one of the spoken languages of the Israelites and their longest-surviving descendants, the Jews and Samaritans. It was largely preserved ...

, Greek and

Cyrillic , bg, кирилица , mk, кирилица , russian: кириллица , sr, ћирилица, uk, кирилиця , fam1 = Egyptian hieroglyphs , fam2 = Proto-Sinaitic , fam3 = Phoenician , fam4 = G ...

scripts, the

Thai Thai or THAI may refer to: * Of or from Thailand, a country in Southeast Asia ** Thai people, the dominant ethnic group of Thailand ** Thai language, a Tai-Kadai language spoken mainly in and around Thailand *** Thai script *** Thai (Unicode block ...

Chinese Chinese can refer to: * Something related to China * Chinese people, people of Chinese nationality, citizenship, and/or ethnicity **''Zhonghua minzu'', the supra-ethnic concept of the Chinese nation ** List of ethnic groups in China, people of va ...

, Japanese and Korean writing systems, and technical symbols.

Encodings

Technically, LMBCS is a lead-byte encoding where code point 00_hex as well as code points 20_hex (32) to 7F_hex (127) are identical to

(as well as to LICS). Code point 00_hex is always treated as

NUL character The null character (also null terminator) is a control character with the value zero. It is present in many character sets, including those defined by the Baudot and ITA2 codes, ISO/IEC 646 (or ASCII), the C0 control code, the Universal Coded Ch ...

to ensure maximum code compatibility with existing software libraries dealing with null-terminated strings in many programming languages such as C. This applies even to the UTF-16be codes, where code words with the form xx00_hex are mapped to private-use codes with the form F6xx_hex during encoding in order to avoid the use of NUL bytes, and to escaped control characters, where 20_hex is added to the C0 (but not C1) control characters following the 0F_hex lead byte. Code points 01_hex to 1F_hex, which serve as control codes in ASCII, are used as lead bytes to switch the definition of code points above 7F_hex between several ''code groups'' (similar to

code page In computing, a code page is a character encoding and as such it is a specific association of a set of printable characters and control characters with unique numbers. Typically each number represents the binary value in a single byte. (In some co ...

s) and at the same time determine either a single- or multi-byte nature for the corresponding code group. For example, code group 1 (with group byte 01_hex) is almost identical to the SBCS code page 850, whereas code group 16 (with group byte 10_hex) is similar to the Japanese MBCS code page 932. Multi-byte characters can thus occupy two or three bytes. In canonical LMBCS, each character starts with its group byte. To reduce the length, in optimized or compressed LMBCS a ''default code group'' or ''optimization group code'' can be defined on a per application or process basis (ideally chosen according to the highest likelihood of occurrence) and must be communicated to the interpreting code in some way (f.e. by specifying the corresponding "LMBCS-''n''" name). Thereby, the group byte can be omitted for these characters. Lotus 1-2-3 retrieves the optimization group code from the file header of the corresponding source file, whereas for Lotus Notes the optimization group code is fixed to be always 01_hex.

Character set

Without prefix byte the code points 32 (20_hex) to 127 (7F_hex) are interpreted as follows (corresponding to LMBCS codes 32 to 127):

Group 1

LMBCS group 1 code points 128 (80_hex) to 255 (FF_hex) are identical to the corresponding code points in code page 850 (DOS Latin-1), whereas code points 1 (01_hex) to 127 (7F_hex) are defined according to the following exception list (corresponding to LMBCS codes 256 to 383):

Group 2

LMBCS group 2 code points 128 (80_hex) to 255 (FF_hex) are identical to the corresponding code points in

code page 851 Code page 851 (CCSID 851) (CP 851, IBM 851, OEM 851) is a code page used under DOS to write Greek language although it lacks the letters Ϊ and Ϋ. It covers the German language as well. It also covers some accented letters of the French language, ...

(DOS Greek), whereas code points 1 (01_hex) to 127 (7F_hex) are defined according to the following exception list:

Group 6

LMBCS group 6 code points 128 (80_hex) to 255 (FF_hex) are identical to the corresponding code points in

code page 852 Code page 852 (CCSID 852) (also known as CP 852, IBM 00852, OEM 852 (Latin II), MS-DOS Latin 2) is a code page used under DOS to write Central European languages that use Latin script (such as Bosnian, Croatian, Czech, Hungarian, Polish, Romani ...

(DOS Latin-2), whereas code points 1 (01_hex) to 127 (7F_hex) are defined according to the following exception list:

Notes

References

External links

* {{Character encodings, state=collapsed Character encoding Character sets Computer-related introductions in 1989 IBM Lotus SmartSuite Lotus Software software