HOME

TheInfoList



OR:

A lightweight markup language (LML), also termed a simple or humane markup language, is a
markup language Markup language refers to a text-encoding system consisting of a set of symbols inserted in a text document to control its structure, formatting, or the relationship between its parts. Markup is often used to control the display of the document ...
with simple, unobtrusive syntax. It is designed to be easy to write using any generic text editor and easy to read in its raw form. Lightweight markup languages are used in applications where it may be necessary to read the raw document as well as the final rendered output. For instance, a person downloading a software library might prefer to read the documentation in a text editor rather than a web browser. Another application for such languages is to provide for data entry in web-based publishing, such as weblogs and
wiki A wiki ( ) is an online hypertext publication collaboratively edited and managed by its own audience, using a web browser. A typical wiki contains multiple pages for the subjects or scope of the project, and could be either open to the pub ...
s, where the input interface is a simple
text box type=search placeholder=An example text box, which can be used to search the English Wikipedia. A text box (input box), text field or text entry box is a control element of a graphical user interface, that should enable the user to input ...
. The server software then converts the input into a common
document markup language Markup language refers to a text-encoding system consisting of a set of symbols inserted in a text document to control its structure, formatting, or the relationship between its parts. Markup is often used to control the display of the document ...
like
HTML The HyperText Markup Language or HTML is the standard markup language for documents designed to be displayed in a web browser. It can be assisted by technologies such as Cascading Style Sheets (CSS) and scripting languages such as JavaS ...
.


History

Lightweight markup languages were originally used on text-only displays which could not display characters in
italics In typography, italic type is a cursive font based on a stylised form of calligraphic handwriting. Owing to the influence from calligraphy, italics normally slant slightly to the right. Italics are a way to emphasise key points in a printed t ...
or bold, so informal methods to convey this information had to be developed. This formatting choice was naturally carried forth to plain-text email communications. Console browsers may also resort to similar display conventions. In 1986 international standard
SGML The Standard Generalized Markup Language (SGML; ISO 8879:1986) is a standard for defining generalized markup languages for documents. ISO 8879 Annex A.1 states that generalized markup is "based on two postulates": * Declarative: Markup should ...
provided facilities to define and parse lightweight markup languages using grammars and tag implication. The 1998 W3C
XML Extensible Markup Language (XML) is a markup language and file format for storing, transmitting, and reconstructing arbitrary data. It defines a set of rules for encoding documents in a format that is both human-readable and machine-readable. T ...
is a profile of SGML that omits these facilities. However, no SGML document type definition (DTD) for any of the languages listed below is known.


Types

Lightweight markup languages can be categorized by their tag types. Like HTML (<b>bold</b>), some languages use named elements that share a common format for start and end tags (e.g. BBCode ''bold b/code>), whereas proper lightweight markup languages are restricted to
ASCII ASCII ( ), abbreviated from American Standard Code for Information Interchange, is a character encoding standard for electronic communication. ASCII codes represent text in computers, telecommunications equipment, and other devices. Because ...
-only punctuation marks and other non-letter symbols for tags, but some also mix both styles (e.g.
Textile Textile is an Hyponymy and hypernymy, umbrella term that includes various Fiber, fiber-based materials, including fibers, yarns, Staple (textiles)#Filament fiber, filaments, Thread (yarn), threads, different #Fabric, fabric types, etc. At f ...
bq. ) or allow embedded HTML (e.g. Markdown), possibly extended with custom elements (e.g.
MediaWiki MediaWiki is a free and open-source wiki software. It is used on Wikipedia and almost all other Wikimedia websites, including Wiktionary, Wikimedia Commons and Wikidata; these sites define a large part of the requirement set for Media ...
). Most languages distinguish between markup for lines or blocks and for shorter spans of texts, but some only support inline markup. Some markup languages are tailored for a specific purpose, such as documenting computer code (e.g. POD, reST, RD) or being converted to a certain output format (usually HTML or
LaTeX Latex is an emulsion (stable dispersion) of polymer microparticles in water. Latexes are found in nature, but synthetic latexes are common as well. In nature, latex is found as a milky fluid found in 10% of all flowering plants (angiosperms ...
) and nothing else, others are more general in application. This includes whether they are oriented on textual presentation or on data serialization. Presentation oriented languages include
AsciiDoc AsciiDoc is a human-readable document format, semantically equivalent to DocBook XML, but using plain-text mark-up conventions. AsciiDoc documents can be created using any text editor and read “as-is”, or rendered to HTML or any other for ...
, atx, BBCode, Creole, Crossmark, Epytext, Haml, JsonML, MakeDoc, Markdown, Org-mode, POD (Perl), reST (Python), RD (Ruby), SECST,
Setext Setext (Structure Enhanced Text) is a lightweight markup language used to format plain text documents such as e-newsletters, Usenet postings, and e-mails. In contrast to some other markup languages (such as HTML), the markup is easily readable ...
, SiSU, SPIP, Xupl, Texy!, Textile, txt2tags, UDO and Wikitext. Data serialization oriented languages include Curl (
homoiconic In computer programming, homoiconicity (from the Greek words ''homo-'' meaning "the same" and ''icon'' meaning "representation") is a property of some programming languages. A language is homoiconic if a program written in it can be manipulated ...
, but also reads JSON; every object serializes),
JSON JSON (JavaScript Object Notation, pronounced ; also ) is an open standard file format and data interchange format that uses human-readable text to store and transmit data objects consisting of attribute–value pairs and arrays (or other se ...
, and
YAML YAML ( and ) (''see '') is a human-readable data-serialization language. It is commonly used for configuration files and in applications where data is being stored or transmitted. YAML targets many of the same communications applications as Ext ...
.


Comparison of language features

Markdown's own syntax does not support class attributes or id attributes; however, since Markdown supports the inclusion of native HTML code, these features can be implemented using direct HTML. (Some extensions may support these features.) txt2tags' own syntax does not support class attributes or id attributes; however, since txt2tags supports inclusion of native HTML code in tagged areas, these features can be implemented using direct HTML when saving to an HTML target.


Comparison of implementation features


Comparison of lightweight markup language syntax


Inline span syntax

Although usually documented as yielding italic and bold text, most lightweight markup processors output semantic HTML elements em and strong instead. Monospaced text may either result in semantic code or presentational tt elements. Few languages make a distinction, e.g. Textile, or allow the user to configure the output easily, e.g. Texy. LMLs sometimes differ for multi-word markup where some require the markup characters to replace the inter-word spaces (''infix''). Some languages require a single character as prefix and suffix, other need doubled or even tripled ones or support both with slightly different meaning, e.g. different levels of emphasis. Gemtext does not have any inline formatting, monospaced text (called preformatted text in the context of Gemtext) must have the opening and closing ``` on their own lines.


Emphasis syntax

In HTML, text is emphasized with the <em> and <strong> element types, whereas <i> and <b> traditionally mark up text to be italicized or bold-faced, respectively. Microsoft Word and Outlook, and accordingly other word processors and mail clients that strive for a similar user experience, support the basic convention of using asterisks for boldface and underscores for italic style. While Word removes the characters, Outlook retains them.


Editorial syntax

In HTML, removed or deleted and inserted text is marked up with the <del> and <ins> element types, respectively. However, legacy element types <s> or <strike> and <u> are still also available for stricken and underlined spans of text. AsciiDoc, ATX, Creole, MediaWiki, PmWiki, reST, Slack, Textile, Texy! and WhatsApp do not support dedicated markup for underlining text. Textile does, however, support insertion via the +inserted+ syntax. AsciiDoc, ATX, Creole, MediaWiki, PmWiki, reST, Setext and Texy! do not support dedicated markup for striking through text.


Programming syntax

Quoted computer code is traditionally presented in typewriter-like fonts where each character occupies the same fixed width. HTML offers the semantic <code> and the deprecated, presentational <tt> element types for this task. Mediawiki and Gemtext do not provide lightweight markup for inline code spans.


Heading syntax

Headings are usually available in up to six levels, but the top one is often reserved to contain the same as the document title, which may be set externally. Some documentation may associate levels with divisional types, e.g. part, chapter, section, article or paragraph. Most LMLs follow one of two styles for headings, either
Setext Setext (Structure Enhanced Text) is a lightweight markup language used to format plain text documents such as e-newsletters, Usenet postings, and e-mails. In contrast to some other markup languages (such as HTML), the markup is easily readable ...
-like underlines or atx-like"atx, the true structured text format" by Aaron Swartz (2002)
/ref> line markers, or they support both.


Underlined headings

Level 1 Heading


Level 2 Heading --------------- Level 3 Heading ~~~~~~~~~~~~~~~
The first style uses underlines, i.e. repeated characters (e.g. equals =, hyphen - or tilde ~, usually at least two or four times) in the line below the heading text. RST determines heading levels dynamically, which makes authoring more individual on the one hand, but complicates merges from external sources on the other hand.


Prefixed headings

# Level 1 Heading
## Level 2 Heading ##
### Level 3 Heading ###
The second style is based on repeated markers (e.g. hash #, equals = or asterisk *) at the start of the heading itself, where the number of repetitions indicates the (sometimes inverse) heading level. Most languages also support the reduplication of the markers at the end of the line, but whereas some make them mandatory, others do not even expect their numbers to match. Org-mode supports indentation as a means of indicating the level. BBCode does not support section headings at all. POD and Textile choose the HTML convention of numbered heading levels instead. Microsoft Word supports auto-formatting paragraphs as headings if they do not contain more than a handful of words, no period at the end and the user hits the enter key twice. For lower levels, the user may press the tabulator key the according number of times before entering the text, i.e. one through eight tabs for heading levels two through nine.


Link syntax

Hyperlinks can either be added inline, which may clutter the code because of long URLs, or with named alias or numbered id references to lines containing nothing but the address and related attributes and often may be located anywhere in the document. Most languages allow the author to specify text Text to be displayed instead of the plain address http://example.com and some also provide methods to set a different link title Title which may contain more information about the destination. LMLs that are tailored for special setups, e.g. wikis or code documentation, may automatically generate named anchors (for headings, functions etc.) inside the document, link to related pages (possibly in a different namespace) or provide a textual search for linked keywords. Most languages employ (double) square or angular brackets to surround links, but hardly any two languages are completely compatible. Many can automatically recognize and parse absolute URLs inside the text without further markup. Gemtext and setext links must be on a line by themselves, they cannot be used inline. Org-mode's normal link syntax does a text search of the file. You can also put in dedicated targets with <>.


List syntax

HTML requires an explicit element for the list, specifying its type, and one for each list item, but most lightweight markup languages need only different line prefixes for the bullet points or enumerated items. Some languages rely on indentation for nested lists, others use repeated parent list markers. Microsoft Word automatically converts paragraphs that start with an asterisk *, hyphen-minus - or greater-than bracket > followed by a space or horizontal tabulator as bullet list items. It will also start an enumerated list for the digit ''1'' and the case-insensitive letters ''a'' (for alphabetic lists) or ''i'' (for roman numerals), if they are followed by a period ., a closing round parenthesis ), a greater-than sign > or a hyphen-minus - and a space or tab; in case of the round parenthesis an optional opening one ( before the list marker is also supported. Languages differ on whether they support optional or mandatory digits in numbered list items, which kinds of enumerators they understand (e.g. decimal digit ''1'', roman numerals ''i'' or ''I'', alphabetic letters ''a'' or ''A'') and whether they support to keep explicit values in the output format. Some Markdown dialects, for instance, will respect a start value other than 1, but ignore any other explicit value.
! (1) ! /nowiki> ! ! ! ! ! ! ! ! nest , - ! , Markdown , , , , , , , , , , , , , , , , , , , , , , , , , , , , , 0–3 , , 1–3 , , indent , - ! ,
MediaWiki MediaWiki is a free and open-source wiki software. It is used on Wikipedia and almost all other Wikimedia websites, including Wiktionary, Wikimedia Commons and Wikidata; these sites define a large part of the requirement set for Media ...
,
TiddlyWiki TiddlyWiki is a personal wiki and a non-linear notebook for organising and sharing complex information. It is an open-source single page application wiki in the form of a single HTML file that includes CSS, JavaScript, embedded files such as ...
, , , , , , , , , , , , , , , , , , , , , , , , , , , , , 0 , , 1+ , , repeat , - ! , Org-mode , , , , , , , , , , , , , , , , , , , , , colspan="2" , , , , , , 0+ , , , , indent , - ! , Jira,
Textile Textile is an Hyponymy and hypernymy, umbrella term that includes various Fiber, fiber-based materials, including fibers, yarns, Staple (textiles)#Filament fiber, filaments, Thread (yarn), threads, different #Fabric, fabric types, etc. At f ...
, , , , , , , , , , , , , , , , , , , , , , , , , , , , , 0 , , 1+ , , repeat , - Slack assists the user in entering enumerated and bullet lists, but does not actually format them as such, i.e. it just includes a leading digit followed by a period and a space or a bullet character in front of a line.


Historical formats

The following lightweight markup languages, while similar to some of those already mentioned, have not yet been added to the comparison tables in this article: * EtText: circa 2000 * Grutatext: circa 2002


See also

*
Comparison of document-markup languages The following tables compare general and technical information for a number of document markup languages. Please see the individual markup languages' articles for further information. General information Basic general information about the marku ...
* Comparison of documentation generators * Lightweight programming language * Markdown * Wikitext


References


External links

* {{Markup languages Computing-related lists Data serialization formats Markup language comparisons Markup languages de:Auszeichnungssprache#Lightweight Markup Language