HOME

TheInfoList




A bidirectional text contains two
text direction A writing system is a method of visually representing verbal communication, based on a script and a orthography, set of rules regulating its use. While both writing and spoken language, speech are useful in conveying messages, writing differs ...
alities,
right-to-left In a right-to-left, top-to-bottom script Script may refer to: Writing systems * Script, a distinctive writing system, based on a repertoire of specific elements or symbols, or that repertoire * Script (styles of handwriting) * Script (Unicode) ...
(RTL or dextrosinistral) and
left-to-right A writing system is a method of visually representing verbal communication Communication (from Latin ''communicare'', meaning "to share") is the act of developing Semantics, meaning among Subject (philosophy), entities or Organization, grou ...
(LTR or
sinistrodextral A writing system is a method of visually representing verbal communication Communication (from Latin ''communicare'', meaning "to share") is the act of developing Semantics, meaning among Subject (philosophy), entities or Organization, grou ...
). It generally involves text containing different types of
alphabet An alphabet is a standardized set of basic written symbols A symbol is a mark, sign, or word In linguistics, a word of a spoken language can be defined as the smallest sequence of phonemes that can be uttered in isolation with semanti ...

alphabet
s, but may also refer to
boustrophedon Boustrophedon is a style of writing, mostly seen in ancient manuscript A manuscript (abbreviated MS for singular and MSS for plural) was, traditionally, any document written by hand – or, once practical typewriter A typewriter i ...

boustrophedon
, which is changing text direction in each row. Some
writing system A writing system is a method of visually representing verbal communication Communication (from Latin Latin (, or , ) is a classical language A classical language is a language A language is a structured system of communic ...
s including the
Arabic Arabic (, ' or , ' or ) is a Semitic language The Semitic languages are a branch of the Afroasiatic language family originating in the Middle East The Middle East is a list of transcontinental countries, transcontinental region ...

Arabic
and
Hebrew Hebrew (, , or ) is a Northwest Semitic languages, Northwest Semitic language of the Afroasiatic languages, Afroasiatic language family. Historically, it is regarded as one of the spoken languages of the Israelites and their longest-survivi ...

Hebrew
scripts or derived systems such as the
Persian Persian may refer to: * People and things from Iran, historically called ''Persia'' in the English language ** Persians, Persian people, the majority ethnic group in Iran, not to be conflated with the Iranian peoples ** Persian language, an Iranian ...

Persian
,
Urdu Urdu (; ur, , ALA-LC ALA-LC (American Library Association The American Library Association (ALA) is a nonprofit organization A nonprofit organization (NPO), also known as a non-business entity, not-for-profit organization, or nonpr ...
, and
Yiddish Yiddish (, or , ''yidish'' or ''idish'', , ; , ''Yidish-Taytsh'', ) is a West Germanic The West Germanic languages constitute the largest of the three branches of the Germanic languages, Germanic family of languages (the others being the ...
scripts, are written in a form known as right-to-left (RTL), in which writing begins at the right-hand side of a page and concludes at the left-hand side. This is different from the left-to-right (LTR) direction used by the dominant Latin script. When LTR text is mixed with RTL in the same paragraph, each type of text is written in its own direction, which is known as ''bidirectional text''. This can get rather complex when multiple levels of quotation are used. Many computer programs fail to display bidirectional text correctly. For example, the Hebrew name Sarah (שרה) is spelled: sin (ש) (which appears rightmost), then resh (ר), and finally heh (ה) (which should appear leftmost). ''Note: Some
web browser A web browser (commonly referred to as a browser) is application software for accessing the World Wide Web. When a User (computing), user requests a web page from a particular website, the web browser retrieves the necessary content from a web ...

web browser
s may display the Hebrew text in this article in the opposite direction.''


Bidirectional script support

Bidirectional script support is the capability of a
computer A computer is a machine that can be programmed to Execution (computing), carry out sequences of arithmetic or logical operations automatically. Modern computers can perform generic sets of operations known as Computer program, programs. These ...

computer
system to correctly display bidirectional text. The term is often shortened to "BiDi" or "bidi". Early computer installations were designed only to support a single
writing system A writing system is a method of visually representing verbal communication Communication (from Latin Latin (, or , ) is a classical language A classical language is a language A language is a structured system of communic ...
, typically for left-to-right scripts based on the
Latin alphabet The Latin alphabet or Roman alphabet is the collection of letters originally used by the ancient Romans In historiography Historiography is the study of the methods of historian ( 484– 425 BC) was a Greek historian who lived ...

Latin alphabet
only. Adding new
character set Character encoding is the process of assigning numbers to graphical Graphics (from Greek Greek may refer to: Greece Anything of, from, or related to Greece Greece ( el, Ελλάδα, , ), officially the Hellenic Republic, is a country ...
s and
character encoding Character encoding is the process of assigning numbers to graphical Graphics (from Greek Greek may refer to: Greece Anything of, from, or related to Greece Greece ( el, Ελλάδα, , ), officially the Hellenic Republic, is a country ...
s enabled a number of other left-to-right scripts to be supported, but did not easily support right-to-left scripts such as
Arabic Arabic (, ' or , ' or ) is a Semitic language The Semitic languages are a branch of the Afroasiatic language family originating in the Middle East The Middle East is a list of transcontinental countries, transcontinental region ...

Arabic
or
Hebrew Hebrew (, , or ) is a Northwest Semitic languages, Northwest Semitic language of the Afroasiatic languages, Afroasiatic language family. Historically, it is regarded as one of the spoken languages of the Israelites and their longest-survivi ...

Hebrew
, and mixing the two was not practical. Right-to-left scripts were introduced through encodings like
ISO/IEC 8859-6 ISO/IEC 8859-6:1999, ''Information technology — 8-bit single-byte coded graphic character sets — Part 6: Latin/Arabic alphabet'', is part of the ISO/IEC 8859 series of ASCII-based standard character encodings, first edition published in 1987. ...
and
ISO/IEC 8859-8 ISO/IEC 8859-8, ''Information technology — 8-bit single-byte coded graphic character sets — Part 8: Latin/Hebrew alphabet'', is part of the ISO/IEC 8859 ISO/IEC 8859 is a joint International Organization for Standardization, ISO and Internati ...
, storing the letters (usually) in writing and reading order. It is possible to simply flip the left-to-right display order to a right-to-left display order, but doing this sacrifices the ability to correctly display left-to-right scripts. With bidirectional script support, it is possible to mix characters from different scripts on the same page, regardless of writing direction. In particular, the
Unicode Unicode, formally the Unicode Standard, is an information technology Technical standard, standard for the consistent character encoding, encoding, representation, and handling of Character (computing), text expressed in most of the world's wri ...

Unicode
standard provides foundations for complete BiDi support, with detailed rules as to how mixtures of left-to-right and right-to-left scripts are to be encoded and displayed.


Unicode bidi support

The Unicode standard calls for characters to be ordered 'logically', i.e. in the sequence they are intended to be interpreted, as opposed to 'visually', the sequence they appear. This distinction is relevant for bidi support because at any bidi transition, the visual presentation ceases to be the 'logical' one. Thus, in order to offer bidi support, Unicode prescribes an algorithm for how to convert the logical sequence of characters into the correct visual presentation. For this purpose, the Unicode encoding standard divides all its characters into one of four types: 'strong', 'weak', 'neutral', and 'explicit formatting'.


Strong characters

Strong characters are those with a definite direction. Examples of this type of character include most alphabetic characters, syllabic characters, Han ideographs, non-European or non-Arabic digits, and punctuation characters ''that are specific to only those scripts''.


Weak characters

Weak characters are those with vague direction. Examples of this type of character include European digits, Eastern Arabic-Indic digits, arithmetic symbols, and currency symbols.


Numbers

Unless a directional override is present numbers are always encoded (and entered)
big-endian In computing, endianness is the order or sequence of bytes of a word (data type), word of digital data in computer memory. Endianness is primarily expressed as big-endian (BE) or little-endian (LE). A big-endian system stores the most significan ...
, and the numerals rendered LTR. The weak directionality only applies to the placement of the number in its entirety.


Neutral characters

Neutral characters have direction indeterminable without context. Examples include paragraph separators, tabs, and most other whitespace characters. Punctuation symbols that are common to many scripts, such as the colon, comma, full-stop, and the no-break-space also fall within this category.


Explicit formatting

Explicit formatting characters, also referred to as "directional formatting characters", are special Unicode sequences that direct the algorithm to modify its default behavior. These characters are subdivided into "marks", "embeddings", "isolates", and "overrides". Their effects continue until the occurrence of either a paragraph separator, or a "pop" character.


Marks

If a "weak" character is followed by another "weak" character, the algorithm will look at the first neighbouring "strong" character. Sometimes this leads to unintentional display errors. These errors are corrected or prevented with "pseudo-strong" characters. Such
Unicode control characters Many Unicode Unicode is an information technology Technical standard, standard for the consistent character encoding, encoding, representation, and handling of Character (computing), text expressed in most of the world's writing systems. The st ...
are called ''marks''. The mark ( or ) is to be inserted into a location to make an enclosed weak character inherit its writing direction. For example, to correctly display the for an English name brand (LTR) in an Arabic (RTL) passage, an LRM mark is inserted after the trademark symbol if the symbol is not followed by LTR text (e.g. ""). If the LRM mark is not added, the weak character ™ will be neighbored by a strong LTR character and a strong RTL character. Hence, in an RTL context, it will be considered to be RTL, and displayed in an incorrect order (e.g. "").


Embeddings

The "embedding" directional formatting characters are the classical Unicode method of explicit formatting, and as of Unicode 6.3, are being discouraged in favor of "isolates". An "embedding" signals that a piece of text is to be treated as directionally distinct. The text within the scope of the embedding formatting characters is not independent of the surrounding text. Also, characters within an embedding can affect the ordering of characters outside. Unicode 6.3 recognized that directional embeddings usually have too strong an effect on their surroundings and are thus unnecessarily difficult to use.


Isolates

The "isolate" directional formatting characters signal that a piece of text is to be treated as directionally isolated from its surroundings. As of Unicode 6.3, these are the formatting characters that are being encouraged in new documents – once target platforms are known to support them. These formatting characters were introduced after it became apparent that directional embeddings usually have too strong an effect on their surroundings and are thus unnecessarily difficult to use. Unlike the legacy 'embedding' directional formatting characters, 'isolate' characters have no effect on the ordering of the text outside their scope. Isolates can be nested, and may be placed within embeddings and overrides.


Overrides

The "override" directional formatting characters allow for special cases, such as for part numbers (e.g. to force a part number made of mixed English, digits and Hebrew letters to be written from right to left), and are recommended to be avoided wherever possible. As is true of the other directional formatting characters, "overrides" can be nested one inside another, and in embeddings and isolates.


Pops

The "pop" directional formatting characters terminate the scope of the most recent "embedding", "override", or "isolate".


Runs

In the algorithm, each sequence of concatenated strong characters is called a "run". A "weak" character that is located between two "strong" characters with the same orientation will inherit their orientation. A "weak" character that is located between two "strong" characters with a different writing direction, will inherit the main context's writing direction (in an LTR document the character will become LTR, in an RTL document, it will become RTL).


Table of possible BiDi character types


Scripts using bidirectional text


Egyptian hieroglyphs

Egyptian Egyptian describes something of, from, or related to Egypt. Egyptian or Egyptians may refer to: Nations and ethnic groups * Egyptians, a national group in North Africa ** Egyptian culture, a complex and stable culture with thousands of years of r ...
hieroglyphs A hieroglyph (Greek#REDIRECT Greek Greek may refer to: Greece Anything of, from, or related to Greece Greece ( el, Ελλάδα, , ), officially the Hellenic Republic, is a country located in Southeast Europe. Its population is approximat ...
can be written bidirectionally, where the signs had a distinct "head" that faced the beginning of a line and "tail" that faced the end.


Chinese characters and other CJK scripts

Chinese characters Chinese characters, also called ''hanzi'' (), are logogram In a written language A written language is the representation of a spoken or gestural language A language is a structured system of communication used by humans, ...
can be written in either direction as well as vertically (top to bottom then right to left), especially in signs (such as plaques), but the orientation of the individual characters is never changed. This can often be seen on tour buses in China, where the company name customarily runs from the front of the vehicle to its rear — that is, from right to left on the right side of the bus, and from left to right on the left side of the bus. English texts on the right side of the vehicle are also quite commonly written in reverse order. (See pictures of tour bus and post vehicle below.) Likewise, other
CJK In internationalization In economics, internationalization or internationalisation is the process of increasing involvement of enterprises in international markets, although there is no agreed definition of internationalization. Internationaliz ...
scripts made up of the same square characters, such as the
Japanese writing system The modern Japanese writing system uses a combination of logographic In a written language A written language is the representation of a spoken or gestural language A language is a structured system of communication used by huma ...
and
Korean writing system Korean ( , ''hangugeo''; , ''chosŏnmal'') is an East Asian language spoken by about 80 million people, mainly Korean Korean may refer to: People and culture * Koreans, ethnic group originating in the Korean Peninsula * Korean cuisine * K ...
, can also be written in any direction, although left-to-right, top-to-bottom and, right-to-left are most common. Image:Yangzhou-tour-bus--right-side-3182.jpg, The right side (text runs from right to left) Image:Yangzhou-tour-bus--leftt-side-3184.jpg, The left side (text runs from left to right) Image:Hainan Airlines - Boeing 737-86N.jpg, On the right side of this
Hainan Airlines Hainan Airlines Co., Ltd. (HNA, ) is an airline An airline is a company that provides air transport services for traveling passenger A passenger (also abbreviated as pax) is a person who travels in a vehicle but bears little or no re ...
aircraft, the text runs from right to left (海南航空). Image:Hainan Airlines.JPG, The left side of this Hainan Airlines aircraft, however, shows the text running from left to right (海南航空). File:VM 5485 China Post Office car at Zhengzhou Train Station.jpg, A photo that shows text on both sides of a China Post vehicle


Boustrophedon

Boustrophedon Boustrophedon is a style of writing, mostly seen in ancient manuscript A manuscript (abbreviated MS for singular and MSS for plural) was, traditionally, any document written by hand – or, once practical typewriter A typewriter i ...

Boustrophedon
is a writing style found in ancient
Greek#REDIRECT Greek Greek may refer to: Greece Anything of, from, or related to Greece Greece ( el, Ελλάδα, , ), officially the Hellenic Republic, is a country located in Southeast Europe. Its population is approximately 10.7 million as of ...
inscriptions and in
Hungarian runes
Hungarian runes
. This method of writing alternates direction, and usually reverses the individual characters, on each successive line.


Moon type

Moon type The Moon System of Embossed Reading (commonly known as the Moon writing, Moon alphabet, Moon script, Moon type, or Moon code) is a writing system A writing system is a method of visually representing verbal communication Communication (f ...
is an embossed adaptation of the Latin alphabet invented as a
tactile alphabet 300px, Six principal systems of embossed type in use c. 1900: Haüy, Gall, Howe, Moon, Braille, Wait A tactile alphabet is a system for writing material that the blind can read by touch. While currently the Braille system is the most popular and s ...
for the blind. Initially the text changed direction (but not character orientation) at the end of the lines. Special embossed lines connected the end of a line and the beginning of the next.
Moon Type for the Blind
', Ramseyer Bible Collection, Kathryn A. Martin Library,
University of Minnesota Duluth The University of Minnesota Duluth (UMD) is a public university #REDIRECT Public university #REDIRECT Public university #REDIRECT Public university#REDIRECT Public university A public university or public college is a university or college that ...
.
Around 1990, it changed to a
left-to-right A writing system is a method of visually representing verbal communication Communication (from Latin ''communicare'', meaning "to share") is the act of developing Semantics, meaning among Subject (philosophy), entities or Organization, grou ...
orientation.


See also

*
Internationalization and localization In computing, internationalization and localization (American English, American) or internationalisation and localisation (British English), often abbreviated i18n and L10n, are means of adapting computer software to different languages, regional ...
*
Horizontal and vertical writing in East Asian scripts scholar Su Shi (). The calligraphy is read in columns from top to bottom, from right to left. Many East Asian scripts can be written horizontally or vertically. Chinese character, Chinese, Japanese and Korean Korean may refer to: People ...
* *
Combining Cyrillic Millions Cyrillic numerals are a numeral system A numeral system (or system of numeration) is a writing system A writing system is a method of visually representing verbal communication Communication (from Latin ''communicare'', meaning "to s ...
* Right-to-left mark * Transformation of text *
Boustrophedon Boustrophedon is a style of writing, mostly seen in ancient manuscript A manuscript (abbreviated MS for singular and MSS for plural) was, traditionally, any document written by hand – or, once practical typewriter A typewriter i ...

Boustrophedon


References


External links


Unicode Standards Annex #9
The Bidirectional Algorithm
W3C guidelines on authoring techniques for bi-directional text
- includes examples and good explanations
ICU
International Components for Unicode International Components for Unicode (ICU) is an open-source software, open-source project of mature C (programming language), C/C++ and Java (programming language), Java libraries for Unicode support, software Internationalization and localizati ...
contains an implementation of the bi-directional algorithm — along with other internationalization services {{Unicode navigation Character encoding Unicode algorithms Internationalization and localization Writing direction