HOME

TheInfoList



OR:

Canonical XML is a normal form of
XML Extensible Markup Language (XML) is a markup language and file format for storing, transmitting, and reconstructing arbitrary data. It defines a set of rules for encoding documents in a format that is both human-readable and machine-readable ...
, intended to allow relatively simple comparison of pairs of XML documents for equivalence; for this purpose, the Canonical XML transformation removes non-meaningful differences between the documents. Any XML document can be converted to Canonical XML. For example, XML permits whitespace to occur at various points within start-tags, and attributes to be specified in any order. Such differences are seldom if ever used to convey meaning, and so these forms are generally considered equivalent: <p class="a" secure="1"> <p secure = "1" class='a' > In converting an arbitrary XML document to Canonical XML, attributes are encoded in a normative order (alphabetical by name), and with normative spacing and quoting (though with all namespace declarations placed ahead of regular attributes, and namespaced attributes sorted by namespace rather than prefix or qualified name). Thus, the second form above would be converted to the first. Canonical XML specifies a number of other details, some of which are: * the
UTF-8 UTF-8 is a variable-length character encoding used for electronic communication. Defined by the Unicode Standard, the name is derived from ''Unicode'' (or ''Universal Coded Character Set'') ''Transformation Format 8-bit''. UTF-8 is capable of ...
encoding is used * line-ends are represented using the newline character * whitespace in attribute values is normalized * entity references and non-special character references are expanded *
CDATA The term CDATA, meaning character data, is used for distinct, but related, purposes in the markup languages SGML and XML. The term indicates that a certain portion of the document is general ''character data'', rather than non-character data or ch ...
sections are replaced with their character content * empty elements are encoded as start/end pairs, not using the special empty-element syntax * default attributes are made explicit * superfluous namespace declarations are deleted According to the
W3C The World Wide Web Consortium (W3C) is the main international standards organization for the World Wide Web. Founded in 1994 and led by Tim Berners-Lee, the consortium is made up of member organizations that maintain full-time staff working to ...
, if two
XML Extensible Markup Language (XML) is a markup language and file format for storing, transmitting, and reconstructing arbitrary data. It defines a set of rules for encoding documents in a format that is both human-readable and machine-readable ...
documents have the same canonical form, then the two documents are logically equivalent within the given application context (except for limitations regarding a few unusual cases). However, in a special context users might care about special semantics beyond the generic logical equivalence with which Canonical XML is associated. For example, a
steganography Steganography ( ) is the practice of representing information within another message or physical object, in such a manner that the presence of the information is not evident to human inspection. In computing/electronic contexts, a computer file, ...
system could conceal information in an XML document by varying whitespace, attribute quoting and order, the use of hexadecimal vs. decimal numeric character references, and so on. Obviously converting such a file to Canonical XML would lose those specialized semantics. On the other hand, XML files that differ in their use of upper- vs. lower-case, or that use archaic versus modern spelling, and so on, might be considered equivalent for certain purposes. Such contexts are beyond the scope of Canonical XML.


See also

*
XML Signature XML Signature (also called ''XMLDSig'', ''XML-DSig'', ''XML-Sig'') defines an XML syntax for digital signatures and is defined in the W3C recommendationbr>XML Signature Syntax and Processing Functionally, it has much in common with PKCS #7 but is ...


External links


W3C Recommendation, Canonical XML Version 1.0, 15 March 2001W3C Recommendation, Exclusive XML Canonicalization Version 1.0, 18 July 2002
Cryptography standards XML-based standards {{markup-languages-stub