HOME

TheInfoList



OR:

The Lotus Multi-Byte Character Set (LMBCS) is a proprietary multi-byte
character encoding Character encoding is the process of assigning numbers to graphical character (computing), characters, especially the written characters of human language, allowing them to be stored, transmitted, and transformed using computers. The numerical v ...
originally conceived in 1988 at
Lotus Development Corporation Lotus Software (called Lotus Development Corporation before its acquisition by IBM) was an American software company based in Massachusetts; it was sold to India's HCL Technologies in 2018. Lotus is most commonly known for the Lotus 1-2-3 sprea ...
with input from Bob Balaban and others. Created around the same time and addressing some of the same problems, LMBCS could be viewed as parallel development and possible alternative to
Unicode Unicode or ''The Unicode Standard'' or TUS is a character encoding standard maintained by the Unicode Consortium designed to support the use of text in all of the world's writing systems that can be digitized. Version 16.0 defines 154,998 Char ...
. For maximum compatibility, later issues of LMBCS incorporate
UTF-16 UTF-16 (16-bit Unicode Transformation Format) is a character encoding that supports all 1,112,064 valid code points of Unicode. The encoding is variable-length as code points are encoded with one or two ''code units''. UTF-16 arose from an earli ...
as a subset. Commercially, LMBCS was first introduced as the default character set of Lotus 1-2-3 Release 3 for DOS in March 1989 and Lotus 1-2-3/G Release 1 for
OS/2 OS/2 is a Proprietary software, proprietary computer operating system for x86 and PowerPC based personal computers. It was created and initially developed jointly by IBM and Microsoft, under the leadership of IBM software designer Ed Iacobucci, ...
in 1990 replacing the 8-bit Lotus International Character Set (LICS) and
ASCII ASCII ( ), an acronym for American Standard Code for Information Interchange, is a character encoding standard for representing a particular set of 95 (English language focused) printable character, printable and 33 control character, control c ...
used in earlier DOS-only versions of Lotus 1-2-3 and
Symphony A symphony is an extended musical composition in Western classical music, most often for orchestra. Although the term has had many meanings from its origins in the ancient Greek era, by the late 18th century the word had taken on the meaning c ...
. LMBCS is also used in
IBM International Business Machines Corporation (using the trademark IBM), nicknamed Big Blue, is an American Multinational corporation, multinational technology company headquartered in Armonk, New York, and present in over 175 countries. It is ...
/ Lotus SmartSuite,
Notes Note, notes, or NOTE may refer to: Music and entertainment * Musical note, a pitched sound (or a symbol for a sound) in music * ''Notes'' (album), a 1987 album by Paul Bley and Paul Motian * ''Notes'', a common (yet unofficial) shortened versi ...
and
Domino Dominoes is a family of tile-based games played with gaming pieces. Each domino is a rectangular tile, usually with a line dividing its face into two square ''ends''. Each end is marked with a number of spots (also called '' pips'' or ''dots'' ...
, as well as in a number of third-party products. LMBCS encodes the characters required for languages using the
Latin Latin ( or ) is a classical language belonging to the Italic languages, Italic branch of the Indo-European languages. Latin was originally spoken by the Latins (Italic tribe), Latins in Latium (now known as Lazio), the lower Tiber area aroun ...
,
Arabic Arabic (, , or , ) is a Central Semitic languages, Central Semitic language of the Afroasiatic languages, Afroasiatic language family spoken primarily in the Arab world. The International Organization for Standardization (ISO) assigns lang ...
,
Hebrew Hebrew (; ''ʿÎbrit'') is a Northwest Semitic languages, Northwest Semitic language within the Afroasiatic languages, Afroasiatic language family. A regional dialect of the Canaanite languages, it was natively spoken by the Israelites and ...
,
Greek Greek may refer to: Anything of, from, or related to Greece, a country in Southern Europe: *Greeks, an ethnic group *Greek language, a branch of the Indo-European language family **Proto-Greek language, the assumed last common ancestor of all kno ...
and
Cyrillic The Cyrillic script ( ) is a writing system used for various languages across Eurasia. It is the designated national script in various Slavic, Turkic, Mongolic, Uralic, Caucasian and Iranic-speaking countries in Southeastern Europe, Ea ...
scripts, the Thai, Chinese, Japanese and Korean writing systems, and technical symbols.


Encodings

Technically, LMBCS is a lead-byte encoding where code point 00hex as well as code points 20hex (32) to 7Fhex (127) are identical to
ASCII ASCII ( ), an acronym for American Standard Code for Information Interchange, is a character encoding standard for representing a particular set of 95 (English language focused) printable character, printable and 33 control character, control c ...
(as well as to LICS). Code point 00hex is always treated as NUL character to ensure maximum code compatibility with existing software libraries dealing with
null-terminated string In computer programming, a null-terminated string is a character string stored as an array containing the characters and terminated with a ''null character'' (a character with an internal value of zero, called "NUL" in this article, not same a ...
s in many programming languages such as C. This applies even to the UTF-16be codes, where code words with the form xx00hex are mapped to private-use codes with the form F6xxhex during encoding in order to avoid the use of NUL bytes, and to escaped control characters, where 20hex is added to the C0 (but not C1) control characters following the 0Fhex lead byte. Code points 01hex to 1Fhex, which serve as control codes in ASCII, are used as lead bytes to switch the definition of code points above 7Fhex between several ''code groups'' (similar to
code page In computing, a code page is a character encoding and as such it is a specific association of a set of printable character (computing), characters and control characters with unique numbers. Typically each number represents the binary value in a s ...
s) and at the same time determine either a single- or multi-byte nature for the corresponding code group. For example, code group 1 (with group byte 01hex) is almost identical to the SBCS code page 850, whereas code group 16 (with group byte 10hex) is similar to the Japanese MBCS code page 932. Multi-byte characters can thus occupy two or three bytes. In canonical LMBCS, each character starts with its group byte. To reduce the length, in optimized or compressed LMBCS a ''default code group'' or ''optimization group code'' can be defined on a per application or process basis (ideally chosen according to the highest likelihood of occurrence) and must be communicated to the interpreting code in some way (f.e. by specifying the corresponding "LMBCS-''n''" name). Thereby, the group byte can be omitted for these characters. Lotus 1-2-3 retrieves the optimization group code from the file header of the corresponding source file, whereas for Lotus Notes the optimization group code is fixed to be always 01hex.


Character set

Without prefix byte the code points 32 (20hex) to 127 (7Fhex) are interpreted as follows (corresponding to LMBCS codes 32 to 127):


Group 1

LMBCS group 1 code points 128 (80hex) to 255 (FFhex) are identical to the corresponding code points in code page 850 (DOS Latin-1), whereas code points 1 (01hex) to 127 (7Fhex) are defined according to the following exception list (corresponding to LMBCS codes 256 to 383):


Group 2

LMBCS group 2 code points 128 (80hex) to 255 (FFhex) are identical to the corresponding code points in code page 851 (DOS Greek), whereas code points 1 (01hex) to 127 (7Fhex) are defined according to the following exception list:


Group 6

LMBCS group 6 code points 128 (80hex) to 255 (FFhex) are identical to the corresponding code points in code page 852 (DOS Latin-2), whereas code points 1 (01hex) to 127 (7Fhex) are defined according to the following exception list:


See also

*
Compose key A compose key (sometimes called multi key) is a key on a computer keyboard that indicates that the following (usually 2 or more) keystrokes trigger the insertion of an alternate character, typically a precomposed character or a symbol. For insta ...
*
GB 18030 GB 18030 is a Chinese government standard, described as ''Information Technology — Chinese coded character set'' and defines the required language and character support necessary for software in China. GB18030 is the registered Internet n ...
* Standard Compression Scheme for Unicode (SCSU) * Symbol (typeface) * Xerox Character Code Standard (XCCS)


Notes


References


Further reading

* (Includes some information about LMBCS and Lotus system ranges.) * * * *Character Translation Files (.CTF) by Notes 2.x and Country Language Service (.CLS) files by Notes 3.0 and higher contain information about LMBCS translation into other codepage


External links

* {{Character encodings, state=collapsed Character encoding Character sets Computer-related introductions in 1989 IBM Lotus SmartSuite Lotus Software software