HOME

TheInfoList



OR:

The Lotus Multi-Byte Character Set (LMBCS) is a proprietary multi-byte
character encoding Character encoding is the process of assigning numbers to Graphics, graphical character (computing), characters, especially the written characters of Language, human language, allowing them to be Data storage, stored, Data communication, transmi ...
originally conceived in 1988 at Lotus Development Corporation with input from Bob Balaban and others. Created around the same time and addressing some of the same problems, LMBCS could be viewed as parallel development and possible alternative to
Unicode Unicode, formally The Unicode Standard,The formal version reference is is an information technology standard for the consistent encoding, representation, and handling of text expressed in most of the world's writing systems. The standard, wh ...
. For maximum compatibility, later issues of LMBCS incorporate
UTF-16 UTF-16 (16-bit Unicode Transformation Format) is a character encoding capable of encoding all 1,112,064 valid code points of Unicode (in fact this number of code points is dictated by the design of UTF-16). The encoding is variable-length, as cod ...
as a subset. Commercially, LMBCS was first introduced as the default character set of
Lotus 1-2-3 Release 3 Lotus 1-2-3 is a discontinued spreadsheet program from Lotus Software (later part of IBM). It was the first killer application of the IBM PC, was hugely popular in the 1980s, and significantly contributed to the success of IBM PC-compatibles i ...
for
DOS DOS is shorthand for the MS-DOS and IBM PC DOS family of operating systems. DOS may also refer to: Computing * Data over signalling (DoS), multiplexing data onto a signalling channel * Denial-of-service attack (DoS), an attack on a communicat ...
in March 1989 and Lotus 1-2-3/G Release 1 for
OS/2 OS/2 (Operating System/2) is a series of computer operating systems, initially created by Microsoft and IBM under the leadership of IBM software designer Ed Iacobucci. As a result of a feud between the two companies over how to position OS/2 r ...
in 1990 replacing the 8-bit Lotus International Character Set (LICS) and
ASCII ASCII ( ), abbreviated from American Standard Code for Information Interchange, is a character encoding standard for electronic communication. ASCII codes represent text in computers, telecommunications equipment, and other devices. Because ...
used in earlier DOS-only versions of Lotus 1-2-3 and Symphony. LMBCS is also used in IBM/ Lotus SmartSuite,
Notes Note, notes, or NOTE may refer to: Music and entertainment * Musical note, a pitched sound (or a symbol for a sound) in music * ''Notes'' (album), a 1987 album by Paul Bley and Paul Motian * ''Notes'', a common (yet unofficial) shortened versio ...
and
Domino Dominoes is a family of tile-based games played with gaming pieces, commonly known as dominoes. Each domino is a rectangular tile, usually with a line dividing its face into two square ''ends''. Each end is marked with a number of spots (also c ...
, as well as in a number of third-party products. LMBCS encodes the characters required for languages using the
Latin Latin (, or , ) is a classical language belonging to the Italic branch of the Indo-European languages. Latin was originally a dialect spoken in the lower Tiber area (then known as Latium) around present-day Rome, but through the power of the ...
,
Arabic Arabic (, ' ; , ' or ) is a Semitic language spoken primarily across the Arab world.Semitic languages: an international handbook / edited by Stefan Weninger; in collaboration with Geoffrey Khan, Michael P. Streck, Janet C. E.Watson; Walter ...
,
Hebrew Hebrew (; ; ) is a Northwest Semitic language of the Afroasiatic language family. Historically, it is one of the spoken languages of the Israelites and their longest-surviving descendants, the Jews and Samaritans. It was largely preserved ...
,
Greek Greek may refer to: Greece Anything of, from, or related to Greece, a country in Southern Europe: *Greeks, an ethnic group. *Greek language, a branch of the Indo-European language family. **Proto-Greek language, the assumed last common ancestor ...
and Cyrillic scripts, the Thai,
Chinese Chinese can refer to: * Something related to China * Chinese people, people of Chinese nationality, citizenship, and/or ethnicity **''Zhonghua minzu'', the supra-ethnic concept of the Chinese nation ** List of ethnic groups in China, people of ...
,
Japanese Japanese may refer to: * Something from or related to Japan, an island country in East Asia * Japanese language, spoken mainly in Japan * Japanese people, the ethnic group that identifies with Japan through ancestry or culture ** Japanese diaspor ...
and
Korean Korean may refer to: People and culture * Koreans, ethnic group originating in the Korean Peninsula * Korean cuisine * Korean culture * Korean language **Korean alphabet, known as Hangul or Chosŏn'gŭl **Korean dialects and the Jeju language ** ...
writing systems, and technical symbols.


Encodings

Technically, LMBCS is a lead-byte encoding where code point 00hex as well as code points 20hex (32) to 7Fhex (127) are identical to
ASCII ASCII ( ), abbreviated from American Standard Code for Information Interchange, is a character encoding standard for electronic communication. ASCII codes represent text in computers, telecommunications equipment, and other devices. Because ...
(as well as to LICS). Code point 00hex is always treated as
NUL character The null character (also null terminator) is a control character with the value zero. It is present in many character sets, including those defined by the Baudot and ITA2 codes, ISO/IEC 646 (or ASCII), the C0 control code, the Universal Coded Ch ...
to ensure maximum code compatibility with existing software libraries dealing with
null-terminated string In computer programming, a null-terminated string is a character string stored as an array containing the characters and terminated with a null character (a character with a value of zero, called NUL in this article). Alternative names are C str ...
s in many programming languages such as C. This applies even to the UTF-16be codes, where code words with the form xx00hex are mapped to private-use codes with the form F6xxhex during encoding in order to avoid the use of NUL bytes, and to escaped control characters, where 20hex is added to the C0 (but not C1) control characters following the 0Fhex lead byte. Code points 01hex to 1Fhex, which serve as control codes in ASCII, are used as lead bytes to switch the definition of code points above 7Fhex between several ''code groups'' (similar to code pages) and at the same time determine either a single- or multi-byte nature for the corresponding code group. For example, code group 1 (with group byte 01hex) is almost identical to the
SBCS SBCS, or Single Byte Character Set, is used to refer to character encodings that use exactly one byte for each graphic character. An SBCS can accommodate a maximum of 256 symbols, and is useful for scripts that do not have many symbols or accented ...
code page 850 Code page 850 ( CCSID 850) (also known as CP 850, IBM 00850, OEM 850, DOS Latin 1) is a code page used under DOS and Psion's EPOC16 operating systems in Western Europe. Depending on the country setting and system configuration, code page 850 i ...
, whereas code group 16 (with group byte 10hex) is similar to the Japanese MBCS code page 932. Multi-byte characters can thus occupy two or three bytes. In canonical LMBCS, each character starts with its group byte. To reduce the length, in optimized or compressed LMBCS a ''default code group'' or ''optimization group code'' can be defined on a per application or process basis (ideally chosen according to the highest likelihood of occurrence) and must be communicated to the interpreting code in some way (f.e. by specifying the corresponding "LMBCS-''n''" name). Thereby, the group byte can be omitted for these characters. Lotus 1-2-3 retrieves the optimization group code from the file header of the corresponding source file, whereas for Lotus Notes the optimization group code is fixed to be always 01hex.


Character set

Without prefix byte the code points 32 (20hex) to 127 (7Fhex) are interpreted as follows (corresponding to LMBCS codes 32 to 127):


Group 1

LMBCS group 1 code points 128 (80hex) to 255 (FFhex) are identical to the corresponding code points in
code page 850 Code page 850 ( CCSID 850) (also known as CP 850, IBM 00850, OEM 850, DOS Latin 1) is a code page used under DOS and Psion's EPOC16 operating systems in Western Europe. Depending on the country setting and system configuration, code page 850 i ...
(DOS Latin-1), whereas code points 1 (01hex) to 127 (7Fhex) are defined according to the following exception list (corresponding to LMBCS codes 256 to 383):


Group 2

LMBCS group 2 code points 128 (80hex) to 255 (FFhex) are identical to the corresponding code points in
code page 851 Code page 851 (CCSID 851) (CP 851, IBM 851, OEM 851) is a code page used under DOS to write Greek language although it lacks the letters Ϊ and Ϋ. It covers the German language as well. It also covers some accented letters of the French language, ...
(DOS Greek), whereas code points 1 (01hex) to 127 (7Fhex) are defined according to the following exception list:


Group 6

LMBCS group 6 code points 128 (80hex) to 255 (FFhex) are identical to the corresponding code points in
code page 852 Code page 852 (CCSID 852) (also known as CP 852, IBM 00852, OEM 852 (Latin II), MS-DOS Latin 2) is a code page used under DOS to write Central European languages that use Latin script (such as Bosnian, Croatian, Czech, Hungarian, Polish, Rom ...
(DOS Latin-2), whereas code points 1 (01hex) to 127 (7Fhex) are defined according to the following exception list:


See also

*
Compose key sequence A compose key (sometimes called multi key) is a key on a computer keyboard that indicates that the following (usually 2 or more) keystrokes trigger the insertion of an alternate character, typically a precomposed character or a symbol. For insta ...
*
GB 18030 GB 18030 is a Chinese government standard, described as ''Information Technology — Chinese coded character set'' and defines the required language and character support necessary for software in China. GB18030 is the registered Internet n ...
* Standard Compression Scheme for Unicode (SCSU) *
Symbol (typeface) Symbol is one of the four standard fonts available on all PostScript-based printers, starting with Apple's original LaserWriter (1985). It contains a complete unaccented Greek alphabet (upper and lower case) and a selection of commonly used mathema ...
* Xerox Character Code Standard (XCCS)


Notes


References


Further reading

* (Includes some information about LMBCS and Lotus system ranges.) * * * *Character Translation Files (.CTF) by Notes 2.x and Country Language Service (.CLS) files by Notes 3.0 and higher contain information about LMBCS translation into other codepage


External links

* {{Character encodings, state=collapsed Character encoding Character sets Computer-related introductions in 1989 IBM Lotus SmartSuite Lotus Software software