A well-formed document in
XML
Extensible Markup Language (XML) is a markup language and file format for storing, transmitting, and reconstructing arbitrary data. It defines a set of rules for encoding documents in a format that is both human-readable and machine-readable. T ...
is a document that "adheres to the
syntax
In linguistics, syntax () is the study of how words and morphemes combine to form larger units such as phrases and sentences. Central concerns of syntax include word order, grammatical relations, hierarchical sentence structure ( constituency) ...
rules specified by the
XML
Extensible Markup Language (XML) is a markup language and file format for storing, transmitting, and reconstructing arbitrary data. It defines a set of rules for encoding documents in a format that is both human-readable and machine-readable. T ...
1.0 specification in that it must satisfy both physical and logical structures".
Requirements
At its base level well-formed documents require that:
* Content be defined.
* Content be delimited with a beginning and end tag
* Content be properly nested (parents within roots, children within parents)
To be a well-formed document, rules must be established about the declaration and treatment of entities.
Tags are
case sensitive
Case or CASE may refer to:
Containers
* Case (goods), a package of related merchandise
* Cartridge case or casing, a firearm cartridge component
* Bookcase, a piece of furniture used to store books
* Briefcase or attaché case, a narrow box to ...
, with attributes delimited with quotation marks. Empty elements have rules established. Overlapping tags invalidate a document. Ideally, a well-formed document conforms to the design goals of XML. Other key syntax rules provided in the specification include:
* It contains only properly encoded legal Unicode characters.
* None of the special syntax characters such as
<
and
&
appear except when performing their markup-delineation roles.
* The begin, end, and empty-element tags that delimit the elements are correctly nested, with none missing and none overlapping.
* The element tags are case-sensitive; the beginning and end tags must match exactly. Tag names cannot contain any of the characters
!"#$%&'()*+,/;<=>?@ `~
, nor a space character, and cannot start with
-
,
.
, or a numeric digit.
* There is a single "root" element that contains all the other elements.
A valid XML document is defined in the XML specification as a well-formed XML document which also conforms to the rules of a
Document Type Definition
A document type definition (DTD) is a set of ''markup declarations'' that define a ''document type'' for an SGML-family markup language ( GML, SGML, XML, HTML).
A DTD defines the valid building blocks of an XML document. It defines the document ...
(DTD). According to JavaCommerce.com XML tutorial, "Well formed XML documents simply markup pages with descriptive tags. You don't need to describe or explain what these tags mean. In other words a well formed XML document does not need a DTD, but it must conform to the XML syntax rules. If all tags in a document are correctly formed and follow XML guidelines, then a document is considered as well formed."
An XML processor that encounters a violation of the well-formedness rules is required to report such errors and to cease normal processing. This policy, occasionally referred to as
draconian
Draconian is an adjective meaning "of great severity", that derives from Draco, an Athenian law scribe under whom small offenses had heavy punishments ( Draconian laws).
Draconian may also refer to:
* Draconian (band), a death/doom metal band fro ...
,
["Dracon and Postel"]
2003/08/19, Tim Bray stands in notable contrast to the behavior of programs that process
HTML
The HyperText Markup Language or HTML is the standard markup language for documents designed to be displayed in a web browser. It can be assisted by technologies such as Cascading Style Sheets (CSS) and scripting languages such as JavaScri ...
, which are designed to produce a reasonable result even in the presence of severe markup errors in the spirit of
Postel's law
In computing, the robustness principle is a design guideline for software that states: "be conservative in what you do, be liberal in what you accept from others". It is often reworded as: "be conservative in what you send, be liberal in what you a ...
("Be conservative in what you send; be liberal in what you accept").
"Postel’s Law Has No Exceptions"
August 18, 2003 Aaron Swartz
Importance
The concept of a well-formed document allows for a better understanding of the fundamental construction of XML. It helps to clarify XML beyond the typical sense of it. For example, while most XML Document Type Definition
A document type definition (DTD) is a set of ''markup declarations'' that define a ''document type'' for an SGML-family markup language ( GML, SGML, XML, HTML).
A DTD defines the valid building blocks of an XML document. It defines the document ...
s utilize left and right angle brackets as content delimiters, strictly speaking this is not a necessity (though a delimiter should be terse and concise). The left and right angle bracket codes are a convention, albeit clear and distinctive, not an absolute requirement.
The concept of well-formed document also allows for the comprehension of the abstract nature of XML. In reality, there is no such thing as XML. Rather, XML is a principle that represents a set of behaviors and practices. It is possible to discuss types of XML, as expressed within a Document Type Definition (DTD).
Well-formed documents also bring into focus the issue of valid versus correct XML. According to the W3 Organization, valid documents are those that validate against a DTD. The rules of validity
Validity or Valid may refer to:
Science/mathematics/statistics:
* Validity (logic), a property of a logical argument
* Scientific:
** Internal validity, the validity of causal inferences within scientific studies, usually based on experiments
** ...
mean that a document complies with the restraints stated within a DTD. Thus, tags or entities must be in conformity to the rules and relations established within a DTD. However, there is no control on whether a tag or entity is correct. Thus a first level head tag could be applied to a second level head object and be valid, while incorrect.
The emphasis on well-formed documents has developed within the publishing
Publishing is the activity of making information, literature, music, software and other content available to the public for sale or for free. Traditionally, the term refers to the creation and distribution of printed works, such as books, newsp ...
industry where the use of left and right angle bracket delimited information has become problematic. Emphasis on the well-formed document allows for the definition, delimiting, and nesting of content to be managed within programs that are not XML, per se, but exhibit the characteristics or potential for being well formed.
Validation tools
There are several tools available to determine if a given XML document is well formed.
Richard Tobin’s XML validator
Truugo’s XML Validator
W3Schools XML Validator
See also
* XML schemas and validation
*Well-formed element
In web page design, and generally for all markup languages such as SGML, HTML, and XML, a well-formed element is one that is either a) opened and subsequently closed, or b) an empty element, which in that case must be terminated; and in either c ...
References
{{reflist
XML