Complex text layout (CTL) or complex text rendering is the
typesetting
Typesetting is the composition of text by means of arranging physical ''type'' (or ''sort'') in mechanical systems or ''glyphs'' in digital systems representing ''characters'' (letters and other symbols).Dictionary.com Unabridged. Random Ho ...
of
writing system
A writing system is a method of visually representing verbal communication, based on a script and a set of rules regulating its use. While both writing and speech are useful in conveying messages, writing differs in also being a reliable form ...
s in which the shape or positioning of a
grapheme
In linguistics, a grapheme is the smallest functional unit of a writing system.
The word ''grapheme'' is derived and the suffix ''-eme'' by analogy with ''phoneme'' and other names of emic units. The study of graphemes is called ''graphemics' ...
depends on its relation to other graphemes. The term is used in the field of software
internationalization
In economics, internationalization or internationalisation is the process of increasing involvement of enterprises in international markets, although there is no agreed definition of internationalization. Internationalization is a crucial strateg ...
, where each grapheme is a
character
Character or Characters may refer to:
Arts, entertainment, and media Literature
* ''Character'' (novel), a 1936 Dutch novel by Ferdinand Bordewijk
* ''Characters'' (Theophrastus), a classical Greek set of character sketches attributed to The ...
.
Scripts which require CTL for proper display may be known as complex scripts. Examples include the
Arabic alphabet and scripts of the
Brahmic family
The Brahmic scripts, also known as Indic scripts, are a family of abugida writing systems. They are used throughout the Indian subcontinent, Southeast Asia and parts of East Asia. They are descended from the Brahmi script of ancient India ...
, such as
Devanagari
Devanagari ( ; , , Sanskrit pronunciation: ), also called Nagari (),Kathleen Kuiper (2010), The Culture of India, New York: The Rosen Publishing Group, , page 83 is a left-to-right abugida (a type of segmental Writing systems#Segmental syste ...
,
Khmer script
Khmer script ( km, អក្សរខ្មែរ, )Huffman, Franklin. 1970. ''Cambodian System of Writing and Beginning Reader''. Yale University Press. . is an abugida (alphasyllabary) script used to write the Khmer language, the official la ...
or the
Thai alphabet
The Thai script ( th, อักษรไทย, ) is the abugida used to write Thai, Southern Thai and many other languages spoken in Thailand. The Thai alphabet itself (as used to write Thai) has 44 consonant symbols ( th, พยัญชน ...
. Many scripts do not require CTL. For instance, the
Latin alphabet
The Latin alphabet or Roman alphabet is the collection of letters originally used by the ancient Romans to write the Latin language. Largely unaltered with the exception of extensions (such as diacritics), it used to write English and the o ...
or
Chinese character
Chinese characters () are logograms developed for the Written Chinese, writing of Chinese. In addition, they have been adapted to write other East Asian languages, and remain a key component of the Japanese writing system where they are k ...
s can be typeset by simply displaying each character one after another in straight rows or columns. However, even these scripts have alternate forms or optional features (such as
cursive
Cursive (also known as script, among other names) is any style of penmanship in which characters are written joined in a flowing manner, generally for the purpose of making writing faster, in contrast to block letters. It varies in functionalit ...
writing) which require CTL to produce on computers.
Characteristics requiring CTL
The main characteristics of CTL complexity are:
*
Bi-directional text
A bidirectional text contains two text directionalities, right-to-left (RTL) and left-to-right (LTR). It generally involves text containing different types of alphabets, but may also refer to boustrophedon, which is changing text direction in eac ...
, where characters may be written from either right-to-left or left-to-right direction.
*
Context-sensitive shaping
Context-sensitive is an adjective meaning "depending on context" or "depending on circumstances". It may refer to:
* Context-sensitive meaning, where meaning depends on context (language use)
** Context-sensitive grammar, a formal grammar in which ...
and
ligature
Ligature may refer to:
* Ligature (medicine), a piece of suture used to shut off a blood vessel or other anatomical structure
** Ligature (orthodontic), used in dentistry
* Ligature (music), an element of musical notation used especially in the me ...
s, where a character may change its shape, dependent on its location and/or the surrounding characters. For example, a character in
Arabic script
The Arabic script is the writing system used for Arabic and several other languages of Asia and Africa. It is the second-most widely used writing system in the world by number of countries using it or a script directly derived from it, and the ...
can have as many as four different shape-forms, depending on context.
* Ordering, where the displayed order of the characters is not the same as the logical order. For example, in Devanagari, which is written from left to right, the grapheme for "short i" appears to the left of ("before") the consonant that it follows: in ''ki'', the ''-i'' should render on the left, its bow reaching until above the ''k-'' to the right.
Not all occurrences of these characteristics require CTL. For example, the
Greek alphabet
The Greek alphabet has been used to write the Greek language since the late 9th or early 8th century BCE. It is derived from the earlier Phoenician alphabet, and was the earliest known alphabetic script to have distinct letters for vowels as we ...
has context-sensitive shaping of the letter
sigma
Sigma (; uppercase Σ, lowercase σ, lowercase in word-final position ς; grc-gre, σίγμα) is the eighteenth letter of the Greek alphabet. In the system of Greek numerals, it has a value of 200. In general mathematics, uppercase Σ is used as ...
, which appears as ς at the end of a word and σ elsewhere. However, these two forms are normally stored as different characters; for instance,
Unicode
Unicode, formally The Unicode Standard,The formal version reference is is an information technology Technical standard, standard for the consistent character encoding, encoding, representation, and handling of Character (computing), text expre ...
has both and , and does not treat them as
equivalent
Equivalence or Equivalent may refer to:
Arts and entertainment
*Album-equivalent unit, a measurement unit in the music industry
* Equivalence class (music)
*'' Equivalent VIII'', or ''The Bricks'', a minimalist sculpture by Carl Andre
*''Equiva ...
. For collation and comparison purposes, software should consider the string "δῖος Ἀχιλλεύς" equivalent to "δῖοσ Ἀχιλλεύσ",
but for typesetting purposes they are distinct and CTL is not required to choose the correct form.
Implementations
Most text-rendering software that is capable of CTL will include information about specific scripts, and so will be able to render them correctly without
font files needing to supply instructions on how to lay out characters. Such software is usually provided in a
library
A library is a collection of materials, books or media that are accessible for use and not just for display purposes. A library provides physical (hard copies) or digital access (soft copies) materials, and may be a physical location or a vir ...
; examples include:
*
Core Text
Core Text is a Core Foundation style API in macOS, first introduced in Mac OS X v10.4, Mac OS X 10.4 Tiger, made public in Mac OS X v10.5, Mac OS X 10.5 Leopard, and introduced for the iPad with iPhone SDK 3.2. Exposing a C (programming language), ...
for
macOS
macOS (; previously OS X and originally Mac OS X) is a Unix operating system developed and marketed by Apple Inc. since 2001. It is the primary operating system for Apple's Mac computers. Within the market of desktop and lapt ...
*
Uniscribe Uniscribe is the Microsoft Windows set of services for rendering Unicode-encoded text, supporting complex text layout. It is implemented in the dynamic link library . Uniscribe has been released with Windows 2000 and Internet Explorer 5.0. In addi ...
(with Universal Shaping Engine) and
DirectWrite DirectWrite is a text layout and glyph rendering Application programming interface, API by Microsoft. It was designed to replace Graphics Device Interface, GDI/GDI+ and Uniscribe for screen-oriented rendering and was first shipped with Windows 7 an ...
for
Microsoft Windows
Windows is a group of several proprietary graphical operating system families developed and marketed by Microsoft. Each family caters to a certain sector of the computing industry. For example, Windows NT for consumers, Windows Server for serv ...
*
HarfBuzz
HarfBuzz (loose transliteration of Persian calque ''harf-bāz'', literally "open type") is a software development library for text shaping, which is the process of converting Unicode text to glyph indices and positions. The newer version, ''Ne ...
, a
cross-platform
In computing, cross-platform software (also called multi-platform software, platform-agnostic software, or platform-independent software) is computer software that is designed to work in several computing platforms. Some cross-platform software r ...
library
*
Pango
Pango (stylized as Παν語) is a text (i.e. glyph) layout engine library which works with the HarfBuzz shaping engine for displaying multi-language text.
Full-function rendering of text and cross-platform support is achieved when Pango is us ...
, a cross-platform library which nowadays incorporates
HarfBuzz
HarfBuzz (loose transliteration of Persian calque ''harf-bāz'', literally "open type") is a software development library for text shaping, which is the process of converting Unicode text to glyph indices and positions. The newer version, ''Ne ...
However, such software is unable to properly render any script for which it lacks instructions, which can include many minority scripts. The alternative approach is to include the rendering instructions in the font file itself. Rendering software still needs to be capable of reading and following the instructions, but this is relatively simple.
Examples of this latter approach include
Apple Advanced Typography
Apple Advanced Typography (AAT) is Apple Inc.'s computer technology for advanced font rendering, supporting internationalization and complex features for typographers, a successor to Apple's little-used QuickDraw GX font technology of the mid-1 ...
(AAT) and
Graphite
Graphite () is a crystalline form of the element carbon. It consists of stacked layers of graphene. Graphite occurs naturally and is the most stable form of carbon under standard conditions. Synthetic and natural graphite are consumed on large ...
. Both of these names encompass both the instruction format and the software supporting it; AAT is included on
Apple
An apple is an edible fruit produced by an apple tree (''Malus domestica''). Apple fruit tree, trees are agriculture, cultivated worldwide and are the most widely grown species in the genus ''Malus''. The tree originated in Central Asia, wh ...
operating system
An operating system (OS) is system software that manages computer hardware, software resources, and provides common services for computer programs.
Time-sharing operating systems schedule tasks for efficient use of the system and may also in ...
s, while Graphite is available for
Microsoft Windows
Windows is a group of several proprietary graphical operating system families developed and marketed by Microsoft. Each family caters to a certain sector of the computing industry. For example, Windows NT for consumers, Windows Server for serv ...
and
Linux
Linux ( or ) is a family of open-source Unix-like operating systems based on the Linux kernel, an operating system kernel first released on September 17, 1991, by Linus Torvalds. Linux is typically packaged as a Linux distribution, which ...
-based systems.
The
OpenType
OpenType is a format for scalable computer fonts. It was built on its predecessor TrueType, retaining TrueType's basic structure and adding many intricate data structures for prescribing typographic behavior. OpenType is a registered trademark o ...
format is primarily intended for systems using the first approach (layout knowledge in the renderer, not the font), but it has a few features that assist with CTL, such as contextual ligatures. AAT and Graphite instructions can be embedded in OpenType font files.
See also
*
Typography
Typography is the art and technique of arranging type to make written language legible, readable and appealing when displayed. The arrangement of type involves selecting typefaces, point sizes, line lengths, line-spacing ( leading), and ...
*
Unicode
Unicode, formally The Unicode Standard,The formal version reference is is an information technology Technical standard, standard for the consistent character encoding, encoding, representation, and handling of Character (computing), text expre ...
* Writing systems which require complex text layout:
**
Arabic alphabet
** Most of the
Brahmic
The Brahmic scripts, also known as Indic scripts, are a family of abugida writing systems. They are used throughout the Indian subcontinent, Southeast Asia and parts of East Asia. They are descended from the Brahmi script of ancient India ...
family of scripts
**
N'Ko script
N'Ko () is a script devised by Solomana Kante in 1949, as a modern writing system for the Mandé languages of West Africa. The term ''N'Ko'', which means ''I say'' in all Mandé languages, is also used for the Mandé literary standard written i ...
**
Tengwar
The Tengwar script is an artificial script, one of several scripts created by J. R. R. Tolkien, the author of ''The Lord of the Rings''.
Within the fictional context of Middle-earth, the Tengwar were invented by the Elf Fëanor, and used fi ...
(diacritics and numbers)
References
{{Reflist
External links
Examples of complex rendering—
SIL international
SIL International (formerly known as the Summer Institute of Linguistics) is an evangelical Christian non-profit organization whose main purpose is to study, develop and document languages, especially those that are lesser-known, in order to ex ...
's examples of complex writing systems around the world
Complex Text Layout—
The Open Group
The Open Group is a global consortium that seeks to "enable the achievement of business objectives" by developing "open, vendor-neutral technology standards and certifications." It has over 840 member organizations and provides a number of servi ...
's Desktop Technologies
Supporting Indic Scripts in Mozilla— also other CTL scripts
Project SILA—
Graphite
Graphite () is a crystalline form of the element carbon. It consists of stacked layers of graphene. Graphite occurs naturally and is the most stable form of carbon under standard conditions. Synthetic and natural graphite are consumed on large ...
and
Mozilla
Mozilla (stylized as moz://a) is a free software community founded in 1998 by members of Netscape. The Mozilla community uses, develops, spreads and supports Mozilla products, thereby promoting exclusively free software and open standards, wi ...
integration project
CTL Architecture in Solaris— Solaris Globalization Whitepapers
Complex Scripts— Microsoft Global Development and Computing Portal
Theppitak's Homepage— information about Thai language processing
HarfBuzz's pageat
Freedesktop.org
freedesktop.org (fd.o) is a project to work on interoperability and shared base technology for free-software desktop environments for the X Window System (X11) and Wayland on Linux and other Unix-like operating systems. It was founded by Hav ...
D-Type Unicode Text Module — Portable software library for complex textBidiRenderer— An application that illustrates the shaping and layout of complex text in bidirectional paragraphs using FriBidi, FreeType, and HarfBuzz
Tehreer-Android— A library that gives full control over text related technologies such as bidirectional algorithm, open type shaping, text typesetting and text rendering
Tehreer-Cocoa— Standalone font/text engine for iOS
Typesetting
Indic computing
Natural language and computing