HOME

TheInfoList



OR:

DocBook is a
semantic Semantics (from grc, σημαντικός ''sēmantikós'', "significant") is the study of reference, meaning, or truth. The term can be used to refer to subfields of several distinct disciplines, including philosophy, linguistics and comput ...
markup language Markup language refers to a text-encoding system consisting of a set of symbols inserted in a text document to control its structure, formatting, or the relationship between its parts. Markup is often used to control the display of the document ...
for technical
documentation Documentation is any communicable material that is used to describe, explain or instruct regarding some attributes of an object, system or procedure, such as its parts, assembly, installation, maintenance and use. As a form of knowledge manageme ...
. It was originally intended for writing technical documents related to computer hardware and software, but it can be used for any other sort of documentation. As a semantic language, DocBook enables its users to create document content in a presentation-neutral form that captures the logical structure of the content; that content can then be published in a variety of formats, including
HTML The HyperText Markup Language or HTML is the standard markup language for documents designed to be displayed in a web browser. It can be assisted by technologies such as Cascading Style Sheets (CSS) and scripting languages such as JavaS ...
,
XHTML Extensible HyperText Markup Language (XHTML) is part of the family of XML markup languages. It mirrors or extends versions of the widely used HyperText Markup Language (HTML), the language in which Web pages are formulated. While HTML, prior ...
,
EPUB EPUB is an e-book file format that uses the ".epub" file extension. The term is short for ''electronic publication'' and is sometimes styled ''ePub''. EPUB is supported by many e-readers, and compatible software is available for most smartpho ...
,
PDF Portable Document Format (PDF), standardized as ISO 32000, is a file format developed by Adobe in 1992 to present documents, including text formatting and images, in a manner independent of application software, hardware, and operating systems. ...
, man pages,
Web help Web most often refers to: * Spider web, a silken structure created by the animal * World Wide Web or the Web, an Internet-based hypertext system Web, WEB, or the Web may also refer to: Computing * WEB, a literate programming system created by ...
DocBook WebHelp Project
/ref> and
HTML Help Microsoft Compiled HTML Help is a Microsoft proprietary online help format, consisting of a collection of HTML pages, an index and other navigation tools. The files are compressed and deployed in a binary format with the extension .CHM, for C ...
, without requiring users to make any changes to the source. In other words, when a document is written in DocBook format it becomes easily portable into other formats, rather than needing to be rewritten.


Design

DocBook is an
XML Extensible Markup Language (XML) is a markup language and file format for storing, transmitting, and reconstructing arbitrary data. It defines a set of rules for encoding documents in a format that is both human-readable and machine-readable. T ...
language. In its current version (5.x), DocBook's language is formally defined by a
RELAX NG In computing, RELAX NG (REgular LAnguage for XML Next Generation) is a schema language for XML—a RELAX NG schema specifies a pattern for the structure and content of an XML document. A RELAX NG schema is itself an XML document but RELAX NG also ...
schema The word schema comes from the Greek word ('), which means ''shape'', or more generally, ''plan''. The plural is ('). In English, both ''schemas'' and ''schemata'' are used as plural forms. Schema may refer to: Science and technology * SCHEMA ...
with integrated
Schematron Schematron is a rule-based validation language for making assertions about the presence or absence of patterns in XML trees. It is a structural schema language expressed in XML using a small number of elements and XPath. In many implementations ...
rules. (There are also
W3C XML Schema XSD (XML Schema Definition), a recommendation of the World Wide Web Consortium (W3C), specifies how to formally describe the elements in an Extensible Markup Language ( XML) document. It can be used by programmers to verify each piece of item con ...
+Schematron and
Document Type Definition A document type definition (DTD) is a set of ''markup declarations'' that define a ''document type'' for an SGML-family markup language ( GML, SGML, XML, HTML). A DTD defines the valid building blocks of an XML document. It defines the document s ...
(DTD) versions of the schema available, but these are considered non-standard.) As a semantic language, DocBook documents do not describe what their contents "look like", but rather the ''meaning'' of those contents. For example, rather than explaining how the abstract for an article might be visually formatted, DocBook simply says that a particular section ''is'' an abstract. It is up to an external processing tool or application to decide where on a page the abstract should go and what it should look like or whether or not it should be included in the final output at all. DocBook provides a vast number of semantic element tags. They are divided into three broad categories, namely structural, block-level, and inline. ''Structural'' tags specify broad characteristics of their contents. The book element, for example, specifies that its child elements represent the parts of a book. This includes a title, chapters, glossaries, appendices, and so on. DocBook's structural tags include, but are not limited to: * ''set:'' Titled collection of one or more books or articles, can be nested with other sets * ''book:'' Titled collection of chapters, articles, and/or parts, with optional glossaries, appendices, etc. * ''part:'' Titled collection of one or more chapters—can be nested with other parts, and may have special introductory text * ''article:'' Titled, unnumbered collection of block-level elements * ''chapter:'' Titled, numbered collection of block-level elements—chapters don't require explicit numbers, a chapter number is the number of previous chapter elements in the XML document plus 1 * ''appendix:'' Contains text that represents an
appendix Appendix, or its plural form appendices, may refer to: __NOTOC__ In documents * Addendum, an addition made to a document by its author after its initial printing or publication * Bibliography, a systematic list of books and other works * Index (pu ...
* ''dedication:'' Text represents the dedication of the contained structural element Structural elements can contain other structural elements. Structural elements are the only permitted top-level elements in a DocBook document. ''Block-level'' tags are elements like paragraph, lists, etc. Not all these elements can directly contain text. Sequential block-level elements render one "after" another. After, in this case, can differ depending on the language. In most Western languages, "after" means below: text paragraphs are printed down the page. Other languages' writing systems can have different directionality; for example, in Japanese, paragraphs are often printed in downward columns, with the columns running from right to left, so "after" in that case would be to the left. DocBook semantics are entirely neutral to these kinds of language-based concepts. ''Inline-level'' tags are elements like emphasis, hyperlinks, etc. They wrap text within a block-level element. These elements do not cause the text to break when rendered in a paragraph format, but typically they cause the document processor to apply some kind of distinct typographical treatment to the enclosed text, by changing the font, size, or similar attributes. (The DocBook specification ''does'' say that it expects different typographical treatment, but it does not offer specific requirements as to what this treatment may be.) That is, a DocBook processor doesn't have to transform an emphasis tag into ''italics''. A reader-based DocBook processor could increase the size of the words, or, a text-based processor could use bold instead of italics.


Sample document

Very simple book Chapter 1 Hello world! I hope that your day is proceeding splendidly! Chapter 2 Hello again, world! Semantically, this document is a "book", with a "title", that contains two "chapters" each with their own "titles". Those "chapters" contain "paragraphs" that have text in them. The markup is fairly readable in English. In more detail, the root element of the document is book. All DocBook elements are in an
XML Namespace XML namespaces are used for providing uniquely named elements and attributes in an XML document. They are defined in a W3C recommendation. An XML instance may contain element or attribute names from more than one XML vocabulary. If each vocabular ...
, so the root element has an ''xmlns'' attribute to set the current namespace. Also, the root element of a DocBook document must have a ''version'' that specifies the version of the format that the document is built on. (XML documents can include elements from multiple namespaces at once, like the id attributes in the example.) A book element must contain a title, or an info element containing a title. This must be before any child structural elements. Following the title are the structural children, in this case, two chapter elements. Each of these must have a title. They contain para block elements, which can contain free text and other inline elements like the emphasis in the second paragraph of the first chapter.


Schemas and validation

Rules are formally defined in the DocBook
XML schema An XML schema is a description of a type of XML document, typically expressed in terms of constraints on the structure and content of documents of that type, above and beyond the basic syntactical constraints imposed by XML itself. These constra ...
. Appropriate programming tools can validate an XML document (DocBook or otherwise), against its corresponding schema, to determine if (and where) the document fails to conform to that schema. XML editing tools can also use schema information to avoid creating non-conforming documents in the first place.


Authoring and processing

Because DocBook is XML, documents can be created and edited with any text editor. A dedicated
XML editor An XML editor is a markup language editor with added functionality to facilitate the editing of XML. This can be done using a plain text editor, with all the code visible, but XML editors have added facilities like tag completion and menus and but ...
is likewise a functional DocBook editor. DocBook provides schema files for popular XML schema languages, so any XML editor that can provide content completion based on a schema can do so for DocBook. Many graphical or
WYSIWYG In computing, WYSIWYG ( ), an acronym for What You See Is What You Get, is a system in which editing software allows content to be edited in a form that resembles its appearance when printed or displayed as a finished product, such as a printed d ...
XML editors Extensible Markup Language (XML) is a markup language and file format for storing, transmitting, and reconstructing arbitrary data. It defines a set of rules for encoding documents in a format that is both human-readable and machine-readable. T ...
come with the ability to edit DocBook like a
word processor A word processor (WP) is a device or computer program that provides for input, editing, formatting, and output of text, often with some additional features. Early word processors were stand-alone devices dedicated to the function, but current ...
. Tables, list items, and other stylized content can be copied and pasted into the DocBook editor and will be preserved in the DocBook XML output. Because DocBook conforms to a well-defined XML schema, documents can be validated and processed using any tool or programming language that includes XML support.


History

DocBook began in 1991 in discussion groups on
Usenet Usenet () is a worldwide distributed discussion system available on computers. It was developed from the general-purpose Unix-to-Unix Copy (UUCP) dial-up network architecture. Tom Truscott and Jim Ellis conceived the idea in 1979, and it wa ...
and eventually became a joint project of HAL Computer Systems and O'Reilly & Associates and eventually spawned its own maintenance organization (the Davenport Group) before moving in 1998 to the ''SGML Open'' consortium, which subsequently became
OASIS In ecology, an oasis (; ) is a fertile area of a desert or semi-desert environment'ksar''with its surrounding feeding source, the palm grove, within a relational and circulatory nomadic system.” The location of oases has been of critical imp ...
. DocBook is currently maintained by the ''DocBook Technical Committee'' at OASIS. DocBook is available in both
SGML The Standard Generalized Markup Language (SGML; ISO 8879:1986) is a standard for defining generalized markup languages for documents. ISO 8879 Annex A.1 states that generalized markup is "based on two postulates": * Declarative: Markup should ...
and
XML Extensible Markup Language (XML) is a markup language and file format for storing, transmitting, and reconstructing arbitrary data. It defines a set of rules for encoding documents in a format that is both human-readable and machine-readable. T ...
forms, as a DTD.
RELAX NG In computing, RELAX NG (REgular LAnguage for XML Next Generation) is a schema language for XML—a RELAX NG schema specifies a pattern for the structure and content of an XML document. A RELAX NG schema is itself an XML document but RELAX NG also ...
and
W3C XML Schema XSD (XML Schema Definition), a recommendation of the World Wide Web Consortium (W3C), specifies how to formally describe the elements in an Extensible Markup Language ( XML) document. It can be used by programmers to verify each piece of item con ...
forms of the XML version are available. Starting with DocBook 5, the RELAX NG version is the "normative" form from which the other formats are generated. DocBook originally started out as an SGML application, but an equivalent XML application was developed and has now replaced the
SGML The Standard Generalized Markup Language (SGML; ISO 8879:1986) is a standard for defining generalized markup languages for documents. ISO 8879 Annex A.1 states that generalized markup is "based on two postulates": * Declarative: Markup should ...
one for most uses. (Starting with version 4 of the SGML DTD, the XML DTD continued with this version numbering scheme.) Initially, a key group of software companies used DocBook since their representatives were involved in its initial design. Eventually, however, DocBook was adopted by the open source community where it has become a standard for creating documentation for many projects, including
FreeBSD FreeBSD is a free and open-source Unix-like operating system descended from the Berkeley Software Distribution (BSD), which was based on Research Unix. The first version of FreeBSD was released in 1993. In 2005, FreeBSD was the most popular ...
, KDE,
GNOME A gnome is a mythological creature and diminutive spirit in Renaissance magic and alchemy, first introduced by Paracelsus in the 16th century and later adopted by more recent authors including those of modern fantasy literature. Its characte ...
desktop documentation, the
GTK+ GTK (formerly GIMP ToolKit and GTK+) is a free and open-source cross-platform widget toolkit for creating graphical user interfaces (GUIs). It is licensed under the terms of the GNU Lesser General Public License, allowing both free and prop ...
API references, the
Linux kernel The Linux kernel is a free and open-source, monolithic, modular, multitasking, Unix-like operating system kernel. It was originally authored in 1991 by Linus Torvalds for his i386-based PC, and it was soon adopted as the kernel for the GNU ...
documentation (which, as of July 2016, is transitioning to
Sphinx A sphinx ( , grc, σφίγξ , Boeotian: , plural sphinxes or sphinges) is a mythical creature with the head of a human, the body of a lion, and the wings of a falcon. In Greek tradition, the sphinx has the head of a woman, the haunches of ...
/
reStructuredText reStructuredText (RST, ReST, or reST) is a file format for textual data used primarily in the Python programming language community for technical documentation. It is part of the Docutils project of the Python Doc-SIG (Documentation Special Inte ...
), and the work of the
Linux Documentation Project The Linux Documentation Project (LDP) is a dormant an all-volunteer project that maintains a large collection of GNU and Linux-related documentation and publishes the collection online. It began as a way for hackers to share their documentation ...
.


Pre DocBook v5.0

Until DocBook 5, DocBook was defined normatively by a Document Type Definition (DTD). Because DocBook was built originally as an application of
SGML The Standard Generalized Markup Language (SGML; ISO 8879:1986) is a standard for defining generalized markup languages for documents. ISO 8879 Annex A.1 states that generalized markup is "based on two postulates": * Declarative: Markup should ...
, the DTD was the only available schema language. DocBook 4.x formats can be SGML or XML, but the XML version does not have its own namespace. DocBook 4.x formats had to live within the restrictions of being defined by a DTD. The most significant restriction was that an element name uniquely defines its possible contents. That is, an element named info must contain the same information no matter where it is in the DocBook file. As such, there are many kinds of info elements in DocBook 4.x: bookinfo, chapterinfo, etc. Each has a slightly different content model, but they do share some of their content model. Additionally, they repeat context information. The book's info element is that, because it is a direct child of the book; it does not need to be named specially for a human reader. However, because the format was defined by a DTD, it did have to be named as such. The root element does not have or need a ''version'', as the version is built into the DTD declaration at the top of a pre-DocBook 5 document. DocBook 4.x documents are not compatible with DocBook 5, but can be converted into DocBook 5 documents via an XSLT stylesheet. One (db4-upgrade.xsl) is provided as part of the distribution of the DocBook 5 schema and specification package.


Output formats

DocBook files are used to prepare output files in a wide variety of formats. Nearly always, this is accomplished using
DocBook XSL The DocBook XSL stylesheets are a set of XSLT stylesheets for the XML-based DocBook language. Purpose DocBook is a semantic markup language. That is, it specifies the meaning of the elements in a document, not how they are intended to be presen ...
stylesheets. These are
XSLT XSLT (Extensible Stylesheet Language Transformations) is a language originally designed for transforming XML documents into other XML documents, or other formats such as HTML for web pages, plain text or XSL Formatting Objects, which may subseq ...
stylesheets that transform DocBook documents into a number of formats (
HTML The HyperText Markup Language or HTML is the standard markup language for documents designed to be displayed in a web browser. It can be assisted by technologies such as Cascading Style Sheets (CSS) and scripting languages such as JavaS ...
,
XSL-FO XSL-FO (XSL Formatting Objects) is a markup language for XML document formatting that is most often used to generate PDF files. XSL-FO is part of XSL (Extensible Stylesheet Language), a set of W3C technologies designed for the transformation and ...
for later conversion into
PDF Portable Document Format (PDF), standardized as ISO 32000, is a file format developed by Adobe in 1992 to present documents, including text formatting and images, in a manner independent of application software, hardware, and operating systems. ...
, etc.). These stylesheets can be sophisticated enough to generate tables of contents, glossaries, and indexes. They can oversee the selection of particular designated portions of a master document to produce different versions of the same document (such as a "tutorial" or a "quick-reference guide", where each of these consist of a subset of the material). Users can write their own customized stylesheets or even a full-fledged program to process the DocBook into an appropriate output format as their needs dictate. Norman Walsh and the DocBook Project development team maintain the key application for producing output from DocBook source documents: A set of
XSLT XSLT (Extensible Stylesheet Language Transformations) is a language originally designed for transforming XML documents into other XML documents, or other formats such as HTML for web pages, plain text or XSL Formatting Objects, which may subseq ...
stylesheets (as well as a legacy set of DSSSL stylesheets) that can generate high-quality
HTML The HyperText Markup Language or HTML is the standard markup language for documents designed to be displayed in a web browser. It can be assisted by technologies such as Cascading Style Sheets (CSS) and scripting languages such as JavaS ...
and print ( FO/
PDF Portable Document Format (PDF), standardized as ISO 32000, is a file format developed by Adobe in 1992 to present documents, including text formatting and images, in a manner independent of application software, hardware, and operating systems. ...
) output, as well as output in other formats, including RTF,
man page A man page (short for manual page) is a form of software documentation usually found on a Unix or Unix-like operating system. Topics covered include computer programs (including library and system calls), formal standards and conventions, and e ...
s and HTML Help.
Web help Web most often refers to: * Spider web, a silken structure created by the animal * World Wide Web or the Web, an Internet-based hypertext system Web, WEB, or the Web may also refer to: Computing * WEB, a literate programming system created by ...
is a chunked HTML output format in the
DocBook XSL The DocBook XSL stylesheets are a set of XSLT stylesheets for the XML-based DocBook language. Purpose DocBook is a semantic markup language. That is, it specifies the meaning of the elements in a document, not how they are intended to be presen ...
stylesheets that was introduced in version 1.76.1. The documentation for web help also provides an example of web help and is part of the DocBook XSL distribution. The major features are its fully CSS-based page layout, search of the help content, and a table of contents in collapsible-tree form. Search has
stemming In linguistic morphology and information retrieval, stemming is the process of reducing inflected (or sometimes derived) words to their word stem, base or root form—generally a written word form. The stem need not be identical to the morpholog ...
, match highlighting, explicit page-scoring, and the standard multilingual tokenizer. The search and TOC are in a pane that appears as a frameset, but is actually implemented with div tags and cookies (so that it is progressive).


Simplified DocBook

DocBook offers a large number of features that may be overwhelming to a new user. For those who want the convenience of DocBook without a steep learning curve, ''Simplified DocBook'' was designed. It is a small subset of DocBook designed for single documents such as articles or white papers (i.e., "books" are not supported). The Simplified DocBook DTD is currently at version 1.1.


Criticism

Ingo Schwarze, the author of
OpenBSD OpenBSD is a security-focused, free and open-source, Unix-like operating system based on the Berkeley Software Distribution (BSD). Theo de Raadt created OpenBSD in 1995 by forking NetBSD 1.0. According to the website, the OpenBSD project e ...
's
mandoc mandoc (historically called mdocml) is a utility used for formatting man pages in BSD Operating Systems (e.g. NetBSD), specifically those written in the ''mdoc'' and ''man'' macro languages. Unlike the groff and older troff and nroff tools that ...
, considers DocBook inferior to the semantic ''mdoc'' macro for man pages. In an attempt to write a DocBook-to-mdoc converter (previous converters like docbook-to-man does not cover semantic elements), he finds the semantic parts "bloated, redundant, and incomplete at the same time" compared to elements covered in mdoc. Moreover, Schwarze finds the DocBook specification not specific enough about the use of tags, the language non-portable across versions, rough in details and overall inconsistent.


See also

*
List of document markup languages The following is a list of document markup languages. You may also find the List of markup languages of interest. Well-known document markup languages * HyperText Markup Language (HTML) – the original markup language that was defined as a part ...
*
Comparison of document markup languages The following tables compare general and technical information for a number of document markup languages. Please see the individual markup languages' articles for further information. General information Basic general information about the marku ...
*
DocBook XSL The DocBook XSL stylesheets are a set of XSLT stylesheets for the XML-based DocBook language. Purpose DocBook is a semantic markup language. That is, it specifies the meaning of the elements in a document, not how they are intended to be presen ...
*
Darwin Information Typing Architecture The Darwin Information Typing Architecture (DITA) specification defines a set of document types for authoring and organizing topic-oriented information, as well as a set of mechanisms for combining, extending, and constraining document types. It i ...
*
LinuxDoc LinuxDoc is an SGML DTD which is similar to DocBook. It was created by Matt Welsh and version 1.1 was announced in 1994. It is primarily used by the Linux Documentation Project. The DocBook SGML tags are often longer than the equivalent LinuxDoc t ...
*
LaTeX Latex is an emulsion (stable dispersion) of polymer microparticles in water. Latexes are found in nature, but synthetic latexes are common as well. In nature, latex is found as a milky fluid found in 10% of all flowering plants (angiosperms ...
* Mumasy


References


Further reading

Norman Walsh is the principal author of the boo
''DocBook: The Definitive Guide''
the official documentation of DocBook. This book is available online under the
GFDL The GNU Free Documentation License (GNU FDL or simply GFDL) is a copyleft license for free documentation, designed by the Free Software Foundation (FSF) for the GNU Project. It is similar to the GNU General Public License, giving readers th ...
, and also as a print publication. * * *


External links


DocBook.org
- Collection of DocBook information, including a 4.x and 5.0 version of ''DocBook: The Definitive Guide'' and all versions of the DocBook schemas/DTDs.
DocBook Repository at OASIS
- Normative home of DocBook schema/DTD.
DocBook XSL Project page
at SourceForge.net
XSLT 1.0 Stylesheets for DocBook
at
GitHub GitHub, Inc. () is an Internet hosting service for software development and version control using Git. It provides the distributed version control of Git plus access control, bug tracking, software feature requests, task management, cont ...

DocBook Demystification HOWTO

DocBook: The Definitive Guide, 1st edition, v. 2.0.6
- Fully bookmarked PDF of the Guide for DocBook 3.x and 4.x.

{{Authority control Markup languages Document-centric XML-based standards Technical communication Technical communication tools Software documentation Open formats