The Lotus Multi-Byte Character Set (LMBCS) is a proprietary multi-byte

character encoding Character encoding is the process of assigning numbers to Graphics, graphical character (computing), characters, especially the written characters of Language, human language, allowing them to be Data storage, stored, Data communication, transmi ...

originally conceived in 1988 at Lotus Development Corporation with input from Bob Balaban and others. Created around the same time and addressing some of the same problems, LMBCS could be viewed as parallel development and possible alternative to

Unicode Unicode, formally The Unicode Standard,The formal version reference is is an information technology standard for the consistent encoding, representation, and handling of text expressed in most of the world's writing systems. The standard, wh ...

. For maximum compatibility, later issues of LMBCS incorporate

UTF-16 UTF-16 (16-bit Unicode Transformation Format) is a character encoding capable of encoding all 1,112,064 valid code points of Unicode (in fact this number of code points is dictated by the design of UTF-16). The encoding is variable-length, as cod ...

as a subset. Commercially, LMBCS was first introduced as the default character set of

Lotus 1-2-3 Release 3 Lotus 1-2-3 is a discontinued spreadsheet program from Lotus Software (later part of IBM). It was the first killer application of the IBM PC, was hugely popular in the 1980s, and significantly contributed to the success of IBM PC-compatibles i ...

for

DOS DOS is shorthand for the MS-DOS and IBM PC DOS family of operating systems. DOS may also refer to: Computing * Data over signalling (DoS), multiplexing data onto a signalling channel * Denial-of-service attack (DoS), an attack on a communicat ...

in March 1989 and Lotus 1-2-3/G Release 1 for

OS/2 OS/2 (Operating System/2) is a series of computer operating systems, initially created by Microsoft and IBM under the leadership of IBM software designer Ed Iacobucci. As a result of a feud between the two companies over how to position OS/2 r ...

in 1990 replacing the 8-bit Lotus International Character Set (LICS) and

ASCII ASCII ( ), abbreviated from American Standard Code for Information Interchange, is a character encoding standard for electronic communication. ASCII codes represent text in computers, telecommunications equipment, and other devices. Because ...

used in earlier DOS-only versions of Lotus 1-2-3 and Symphony. LMBCS is also used in IBM/ Lotus SmartSuite,

Notes Note, notes, or NOTE may refer to: Music and entertainment * Musical note, a pitched sound (or a symbol for a sound) in music * ''Notes'' (album), a 1987 album by Paul Bley and Paul Motian * ''Notes'', a common (yet unofficial) shortened versio ...

and

Domino Dominoes is a family of tile-based games played with gaming pieces, commonly known as dominoes. Each domino is a rectangular tile, usually with a line dividing its face into two square ''ends''. Each end is marked with a number of spots (also c ...

, as well as in a number of third-party products. LMBCS encodes the characters required for languages using the

Latin Latin (, or , ) is a classical language belonging to the Italic branch of the Indo-European languages. Latin was originally a dialect spoken in the lower Tiber area (then known as Latium) around present-day Rome, but through the power of the ...

Arabic Arabic (, ' ; , ' or ) is a Semitic language spoken primarily across the Arab world.Semitic languages: an international handbook / edited by Stefan Weninger; in collaboration with Geoffrey Khan, Michael P. Streck, Janet C. E.Watson; Walter ...

Hebrew Hebrew (; ; ) is a Northwest Semitic language of the Afroasiatic language family. Historically, it is one of the spoken languages of the Israelites and their longest-surviving descendants, the Jews and Samaritans. It was largely preserved ...

Greek Greek may refer to: Greece Anything of, from, or related to Greece, a country in Southern Europe: *Greeks, an ethnic group. *Greek language, a branch of the Indo-European language family. **Proto-Greek language, the assumed last common ancestor ...

and Cyrillic scripts, the Thai,

Chinese Chinese can refer to: * Something related to China * Chinese people, people of Chinese nationality, citizenship, and/or ethnicity **''Zhonghua minzu'', the supra-ethnic concept of the Chinese nation ** List of ethnic groups in China, people of ...

Japanese Japanese may refer to: * Something from or related to Japan, an island country in East Asia * Japanese language, spoken mainly in Japan * Japanese people, the ethnic group that identifies with Japan through ancestry or culture ** Japanese diaspor ...

and

Korean Korean may refer to: People and culture * Koreans, ethnic group originating in the Korean Peninsula * Korean cuisine * Korean culture * Korean language **Korean alphabet, known as Hangul or Chosŏn'gŭl **Korean dialects and the Jeju language ** ...

writing systems, and technical symbols.

Encodings

Technically, LMBCS is a lead-byte encoding where code point 00_hex as well as code points 20_hex (32) to 7F_hex (127) are identical to

(as well as to LICS). Code point 00_hex is always treated as

NUL character The null character (also null terminator) is a control character with the value zero. It is present in many character sets, including those defined by the Baudot and ITA2 codes, ISO/IEC 646 (or ASCII), the C0 control code, the Universal Coded Ch ...

to ensure maximum code compatibility with existing software libraries dealing with

null-terminated string In computer programming, a null-terminated string is a character string stored as an array containing the characters and terminated with a null character (a character with a value of zero, called NUL in this article). Alternative names are C str ...

s in many programming languages such as C. This applies even to the UTF-16be codes, where code words with the form xx00_hex are mapped to private-use codes with the form F6xx_hex during encoding in order to avoid the use of NUL bytes, and to escaped control characters, where 20_hex is added to the C0 (but not C1) control characters following the 0F_hex lead byte. Code points 01_hex to 1F_hex, which serve as control codes in ASCII, are used as lead bytes to switch the definition of code points above 7F_hex between several ''code groups'' (similar to code pages) and at the same time determine either a single- or multi-byte nature for the corresponding code group. For example, code group 1 (with group byte 01_hex) is almost identical to the

SBCS SBCS, or Single Byte Character Set, is used to refer to character encodings that use exactly one byte for each graphic character. An SBCS can accommodate a maximum of 256 symbols, and is useful for scripts that do not have many symbols or accented ...

code page 850 Code page 850 ( CCSID 850) (also known as CP 850, IBM 00850, OEM 850, DOS Latin 1) is a code page used under DOS and Psion's EPOC16 operating systems in Western Europe. Depending on the country setting and system configuration, code page 850 i ...

, whereas code group 16 (with group byte 10_hex) is similar to the Japanese MBCS code page 932. Multi-byte characters can thus occupy two or three bytes. In canonical LMBCS, each character starts with its group byte. To reduce the length, in optimized or compressed LMBCS a ''default code group'' or ''optimization group code'' can be defined on a per application or process basis (ideally chosen according to the highest likelihood of occurrence) and must be communicated to the interpreting code in some way (f.e. by specifying the corresponding "LMBCS-''n''" name). Thereby, the group byte can be omitted for these characters. Lotus 1-2-3 retrieves the optimization group code from the file header of the corresponding source file, whereas for Lotus Notes the optimization group code is fixed to be always 01_hex.

Character set

Without prefix byte the code points 32 (20_hex) to 127 (7F_hex) are interpreted as follows (corresponding to LMBCS codes 32 to 127):

Group 1

LMBCS group 1 code points 128 (80_hex) to 255 (FF_hex) are identical to the corresponding code points in

(DOS Latin-1), whereas code points 1 (01_hex) to 127 (7F_hex) are defined according to the following exception list (corresponding to LMBCS codes 256 to 383):

Group 2

LMBCS group 2 code points 128 (80_hex) to 255 (FF_hex) are identical to the corresponding code points in

code page 851 Code page 851 (CCSID 851) (CP 851, IBM 851, OEM 851) is a code page used under DOS to write Greek language although it lacks the letters Ϊ and Ϋ. It covers the German language as well. It also covers some accented letters of the French language, ...

(DOS Greek), whereas code points 1 (01_hex) to 127 (7F_hex) are defined according to the following exception list:

Group 6

LMBCS group 6 code points 128 (80_hex) to 255 (FF_hex) are identical to the corresponding code points in

code page 852 Code page 852 (CCSID 852) (also known as CP 852, IBM 00852, OEM 852 (Latin II), MS-DOS Latin 2) is a code page used under DOS to write Central European languages that use Latin script (such as Bosnian, Croatian, Czech, Hungarian, Polish, Rom ...

(DOS Latin-2), whereas code points 1 (01_hex) to 127 (7F_hex) are defined according to the following exception list:

Notes

References

External links

* {{Character encodings, state=collapsed Character encoding Character sets Computer-related introductions in 1989 IBM Lotus SmartSuite Lotus Software software