LMBCS-2
   HOME

TheInfoList



OR:

The Lotus Multi-Byte Character Set (LMBCS) is a proprietary multi-byte
character encoding Character encoding is the process of assigning numbers to Graphics, graphical character (computing), characters, especially the written characters of Language, human language, allowing them to be Data storage, stored, Data communication, transmi ...
originally conceived in 1988 at
Lotus Development Corporation Lotus Software (called Lotus Development Corporation before its acquisition by IBM) was an American software company based in Massachusetts; it was "offloaded" to India's HCL Technologies in 2018. Lotus is most commonly known for the Lotus 1-2- ...
with input from Bob Balaban and others. Created around the same time and addressing some of the same problems, LMBCS could be viewed as parallel development and possible alternative to
Unicode Unicode, formally The Unicode Standard,The formal version reference is is an information technology Technical standard, standard for the consistent character encoding, encoding, representation, and handling of Character (computing), text expre ...
. For maximum compatibility, later issues of LMBCS incorporate UTF-16 as a subset. Commercially, LMBCS was first introduced as the default character set of
Lotus 1-2-3 Release 3 Lotus 1-2-3 is a discontinued spreadsheet program from Lotus Software (later part of IBM). It was the first killer application of the IBM PC, was hugely popular in the 1980s, and significantly contributed to the success of IBM PC-compatibles i ...
for DOS in March 1989 and Lotus 1-2-3/G Release 1 for OS/2 in 1990 replacing the 8-bit
Lotus International Character Set The Lotus International Character Set (LICS) is a proprietary single-byte character encoding introduced in 1985 by Lotus Development Corporation. It is based on the 1983 DEC Multinational Character Set (MCS) for VT220 terminals. As such, LICS is ...
(LICS) and
ASCII ASCII ( ), abbreviated from American Standard Code for Information Interchange, is a character encoding standard for electronic communication. ASCII codes represent text in computers, telecommunications equipment, and other devices. Because of ...
used in earlier DOS-only versions of Lotus 1-2-3 and
Symphony A symphony is an extended musical composition in Western classical music, most often for orchestra. Although the term has had many meanings from its origins in the ancient Greek era, by the late 18th century the word had taken on the meaning com ...
. LMBCS is also used in IBM/
Lotus Lotus may refer to: Plants *Lotus (plant), various botanical taxa commonly known as lotus, particularly: ** ''Lotus'' (genus), a genus of terrestrial plants in the family Fabaceae **Lotus flower, a symbolically important aquatic Asian plant also ...
SmartSuite, Notes and Domino, as well as in a number of third-party products. LMBCS encodes the characters required for languages using the
Latin Latin (, or , ) is a classical language belonging to the Italic branch of the Indo-European languages. Latin was originally a dialect spoken in the lower Tiber area (then known as Latium) around present-day Rome, but through the power of the ...
,
Arabic Arabic (, ' ; , ' or ) is a Semitic languages, Semitic language spoken primarily across the Arab world.Semitic languages: an international handbook / edited by Stefan Weninger; in collaboration with Geoffrey Khan, Michael P. Streck, Janet C ...
,
Hebrew Hebrew (; ; ) is a Northwest Semitic language of the Afroasiatic language family. Historically, it is one of the spoken languages of the Israelites and their longest-surviving descendants, the Jews and Samaritans. It was largely preserved ...
, Greek and
Cyrillic , bg, кирилица , mk, кирилица , russian: кириллица , sr, ћирилица, uk, кирилиця , fam1 = Egyptian hieroglyphs , fam2 = Proto-Sinaitic , fam3 = Phoenician , fam4 = G ...
scripts, the
Thai Thai or THAI may refer to: * Of or from Thailand, a country in Southeast Asia ** Thai people, the dominant ethnic group of Thailand ** Thai language, a Tai-Kadai language spoken mainly in and around Thailand *** Thai script *** Thai (Unicode block ...
,
Chinese Chinese can refer to: * Something related to China * Chinese people, people of Chinese nationality, citizenship, and/or ethnicity **''Zhonghua minzu'', the supra-ethnic concept of the Chinese nation ** List of ethnic groups in China, people of va ...
, Japanese and Korean writing systems, and technical symbols.


Encodings

Technically, LMBCS is a lead-byte encoding where code point 00hex as well as code points 20hex (32) to 7Fhex (127) are identical to
ASCII ASCII ( ), abbreviated from American Standard Code for Information Interchange, is a character encoding standard for electronic communication. ASCII codes represent text in computers, telecommunications equipment, and other devices. Because of ...
(as well as to LICS). Code point 00hex is always treated as
NUL character The null character (also null terminator) is a control character with the value zero. It is present in many character sets, including those defined by the Baudot and ITA2 codes, ISO/IEC 646 (or ASCII), the C0 control code, the Universal Coded Ch ...
to ensure maximum code compatibility with existing software libraries dealing with null-terminated strings in many programming languages such as C. This applies even to the UTF-16be codes, where code words with the form xx00hex are mapped to private-use codes with the form F6xxhex during encoding in order to avoid the use of NUL bytes, and to escaped control characters, where 20hex is added to the C0 (but not C1) control characters following the 0Fhex lead byte. Code points 01hex to 1Fhex, which serve as control codes in ASCII, are used as lead bytes to switch the definition of code points above 7Fhex between several ''code groups'' (similar to
code page In computing, a code page is a character encoding and as such it is a specific association of a set of printable characters and control characters with unique numbers. Typically each number represents the binary value in a single byte. (In some co ...
s) and at the same time determine either a single- or multi-byte nature for the corresponding code group. For example, code group 1 (with group byte 01hex) is almost identical to the SBCS code page 850, whereas code group 16 (with group byte 10hex) is similar to the Japanese MBCS code page 932. Multi-byte characters can thus occupy two or three bytes. In canonical LMBCS, each character starts with its group byte. To reduce the length, in optimized or compressed LMBCS a ''default code group'' or ''optimization group code'' can be defined on a per application or process basis (ideally chosen according to the highest likelihood of occurrence) and must be communicated to the interpreting code in some way (f.e. by specifying the corresponding "LMBCS-''n''" name). Thereby, the group byte can be omitted for these characters. Lotus 1-2-3 retrieves the optimization group code from the file header of the corresponding source file, whereas for Lotus Notes the optimization group code is fixed to be always 01hex.


Character set

Without prefix byte the code points 32 (20hex) to 127 (7Fhex) are interpreted as follows (corresponding to LMBCS codes 32 to 127):


Group 1

LMBCS group 1 code points 128 (80hex) to 255 (FFhex) are identical to the corresponding code points in code page 850 (DOS Latin-1), whereas code points 1 (01hex) to 127 (7Fhex) are defined according to the following exception list (corresponding to LMBCS codes 256 to 383):


Group 2

LMBCS group 2 code points 128 (80hex) to 255 (FFhex) are identical to the corresponding code points in
code page 851 Code page 851 (CCSID 851) (CP 851, IBM 851, OEM 851) is a code page used under DOS to write Greek language although it lacks the letters Ϊ and Ϋ. It covers the German language as well. It also covers some accented letters of the French language, ...
(DOS Greek), whereas code points 1 (01hex) to 127 (7Fhex) are defined according to the following exception list:


Group 6

LMBCS group 6 code points 128 (80hex) to 255 (FFhex) are identical to the corresponding code points in
code page 852 Code page 852 (CCSID 852) (also known as CP 852, IBM 00852, OEM 852 (Latin II), MS-DOS Latin 2) is a code page used under DOS to write Central European languages that use Latin script (such as Bosnian, Croatian, Czech, Hungarian, Polish, Romani ...
(DOS Latin-2), whereas code points 1 (01hex) to 127 (7Fhex) are defined according to the following exception list:


See also

*
Compose key sequence A compose key (sometimes called multi key) is a key on a computer keyboard that indicates that the following (usually 2 or more) keystrokes trigger the insertion of an alternate character, typically a precomposed character or a symbol. For insta ...
* GB 18030 *
Standard Compression Scheme for Unicode The Standard Compression Scheme for Unicode (SCSU) is a Unicode Technical Standard for reducing the number of bytes needed to represent Unicode text, especially if that text uses mostly characters from one or a small number of per-language charact ...
(SCSU) *
Symbol (typeface) Symbol is one of the four standard fonts available on all PostScript-based printers, starting with Apple's original LaserWriter (1985). It contains a complete unaccented Greek alphabet (upper and lower case) and a selection of commonly used mathema ...
*
Xerox Character Code Standard The Xerox Character Code Standard (XCCS) is a historical 16-bit character encoding that was created by Xerox in 1980 for the exchange of information between elements of the Xerox Network Systems Architecture. It encodes the characters required f ...
(XCCS)


Notes


References


Further reading

* (Includes some information about LMBCS and Lotus system ranges.) * * * *Character Translation Files (.CTF) by Notes 2.x and Country Language Service (.CLS) files by Notes 3.0 and higher contain information about LMBCS translation into other codepage


External links

* {{Character encodings, state=collapsed Character encoding Character sets Computer-related introductions in 1989 IBM Lotus SmartSuite Lotus Software software