HOME

TheInfoList



OR:

SAX (Simple API for XML) is an
event-driven Event driven may refer to: The term event-driven refers to a methodology that focuses on events and event dependencies. Examples include * Event-driven finite-state machine, finite-state machine where the transition from one state to another ...
online algorithm In computer science, an online algorithm is one that can process its input piece-by-piece in a serial fashion, i.e., in the order that the input is fed to the algorithm, without having the entire input available from the start. In contrast, an o ...
for
parsing Parsing, syntax analysis, or syntactic analysis is the process of analyzing a string of symbols, either in natural language, computer languages or data structures, conforming to the rules of a formal grammar. The term ''parsing'' comes from Lati ...
XML Extensible Markup Language (XML) is a markup language and file format for storing, transmitting, and reconstructing arbitrary data. It defines a set of rules for encoding documents in a format that is both human-readable and machine-readable ...
documents, with an
API An application programming interface (API) is a way for two or more computer programs to communicate with each other. It is a type of software interface, offering a service to other pieces of software. A document or standard that describes how ...
developed by the XML-DEV mailing list. SAX provides a mechanism for reading data from an XML document that is an alternative to that provided by the
Document Object Model The Document Object Model (DOM) is a cross-platform and language-independent interface that treats an XML or HTML document as a tree structure wherein each node is an object representing a part of the document. The DOM represents a document wi ...
(DOM). Where the DOM operates on the document as a whole—building the full
abstract syntax tree In computer science, an abstract syntax tree (AST), or just syntax tree, is a tree representation of the abstract syntactic structure of text (often source code) written in a formal language. Each node of the tree denotes a construct occurring ...
of an XML document for convenience of the user—SAX parsers operate on each piece of the XML document sequentially, issuing parsing events while making a single pass through the input stream.


Definition

Unlike
DOM Dom or DOM may refer to: People and fictional characters * Dom (given name), including fictional characters * Dom (surname) * Dom La Nena (born 1989), stage name of Brazilian-born cellist, singer and songwriter Dominique Pinto * Dom people, an et ...
, there is no formal specification for SAX. The
Java Java (; id, Jawa, ; jv, ꦗꦮ; su, ) is one of the Greater Sunda Islands in Indonesia. It is bordered by the Indian Ocean to the south and the Java Sea to the north. With a population of 151.6 million people, Java is the world's List ...
implementation of SAX is considered to be
normative Normative generally means relating to an evaluative standard. Normativity is the phenomenon in human societies of designating some actions or outcomes as good, desirable, or permissible, and others as bad, undesirable, or impermissible. A norm in ...
. SAX processes documents state-independently, in contrast to DOM which is used for state-dependent processing of XML documents.


Benefits

A SAX parser only needs to report each parsing event as it happens, and normally discards almost all of that information once reported (it does, however, keep some things, for example a list of all elements that have not been closed yet, in order to catch later errors such as end-tags in the wrong order). Thus, the minimum memory required for a SAX parser is proportional to the maximum depth of the XML file (i.e., of the XML tree) and the maximum data involved in a single XML event (such as the name and attributes of a single start-tag, or the content of a processing instruction, etc.). This much
memory Memory is the faculty of the mind by which data or information is encoded, stored, and retrieved when needed. It is the retention of information over time for the purpose of influencing future action. If past events could not be remembered, ...
is usually considered negligible. A DOM parser, in contrast, has to build a tree representation of the entire document in memory to begin with, thus using memory that increases with the entire document length. This takes considerable time and space for large documents (memory allocation and data-structure construction take time). The compensating advantage, of course, is that once loaded ''any'' part of the document can be accessed in any order. Because of the event-driven nature of SAX, processing documents is generally far faster than DOM-style parsers, ''so long as'' the processing can be done in a start-to-end pass. Many tasks, such as indexing, conversion to other formats, very simple formatting and the like can be done that way. Other tasks, such as sorting, rearranging sections, getting from a link to its target, looking up information on one element to help process a later one and the like require accessing the document structure in complex orders and will be much faster with DOM than with multiple SAX passes. Some implementations do not neatly fit either category: a DOM approach can keep its
persistent data Persistent data in the field of data processing denotes information that is infrequently accessed and not likely to be modified. Static data is information, for example a record, that does not change and may be intended to be permanent. It may ha ...
on disk, cleverly organized for speed (editors such as SoftQuad Author/Editor and large-document browser/indexers such as DynaText do this); while a SAX approach can cleverly cache information for later use (any validating SAX parser keeps more information than described above). Such implementations blur the DOM/SAX tradeoffs, but are often very effective in practice. Due to the nature of DOM, streamed reading from disk requires techniques such as
lazy evaluation In programming language theory, lazy evaluation, or call-by-need, is an evaluation strategy which delays the evaluation of an expression until its value is needed (non-strict evaluation) and which also avoids repeated evaluations (sharing). The b ...
, caches,
virtual memory In computing, virtual memory, or virtual storage is a memory management technique that provides an "idealized abstraction of the storage resources that are actually available on a given machine" which "creates the illusion to users of a very l ...
, persistent data structures, or other techniques (one such technique is disclosed in US patent 5557722). Processing XML documents larger than main memory is sometimes thought impossible because some DOM parsers do not allow it. However, it is no less possible than sorting a dataset larger than main memory using disk space as memory to sidestep this limitation.


Drawbacks

The event-driven model of SAX is useful for XML parsing, but it does have certain drawbacks. Virtually any kind of
XML validation XML validation is the process of checking a document written in XML (eXtensible Markup Language) to confirm that it is both well-formed and also "valid" in that it follows a defined structure. A well-formed document follows the basic syntactic rul ...
requires access to the document in full. The most trivial example is that an attribute declared in the DTD to be of type IDREF, requires that there be only one element in the document that uses the same value for an ID attribute. To validate this in a SAX parser, one must keep track of all ID attributes (any one of them ''might'' end up being referenced by an IDREF attribute at the very end); as well as every IDREF attribute until it is resolved. Similarly, to validate that each element has an acceptable sequence of child elements, information about what child elements have been seen for each parent must be kept until the parent closes. Additionally, some kinds of XML processing simply require having access to the entire document.
XSLT XSLT (Extensible Stylesheet Language Transformations) is a language originally designed for transforming XML documents into other XML documents, or other formats such as HTML for web pages, plain text or XSL Formatting Objects, which may subseque ...
and
XPath XPath (XML Path Language) is an expression language designed to support the query or transformation of XML documents. It was defined by the World Wide Web Consortium (W3C) and can be used to compute values (e.g., strings, numbers, or Boolean v ...
, for example, need to be able to access any node at any time in the parsed XML tree. Editors and browsers likewise need to be able to display, modify, and perhaps re-validate at any time. While a SAX parser may well be used to construct such a tree initially, SAX provides no help for such processing as a whole.


XML processing with SAX

A
parser Parsing, syntax analysis, or syntactic analysis is the process of analyzing a string of symbols, either in natural language, computer languages or data structures, conforming to the rules of a formal grammar. The term ''parsing'' comes from Lat ...
that implements SAX (i.e., ''a SAX Parser'') functions as a stream parser, with an
event-driven Event driven may refer to: The term event-driven refers to a methodology that focuses on events and event dependencies. Examples include * Event-driven finite-state machine, finite-state machine where the transition from one state to another ...
API. The user defines a number of callback methods that will be called when events occur during parsing. The SAX events include (among others): * XML Text nodes * XML Element Starts and Ends * XML
Processing Instruction A Processing Instruction (PI) is an SGML and XML node type, which may occur anywhere in the document, intended to carry instructions to the application. Processing instructions are exposed in the Document Object Model as Node.PROCESSING_INSTRUCTION ...
s * XML Comments Some events correspond to XML objects that are easily returned all at once, such as comments. However, XML ''elements'' can contain many other XML objects, and so SAX represents them as does XML itself: by one event at the beginning, and another at the end. Properly speaking, the SAX interface does not deal in ''elements'', but in ''events'' that largely correspond to ''tags''. SAX parsing is unidirectional; previously parsed data cannot be re-read without starting the parsing operation again. There are many SAX-like implementations in existence. In practice, details vary, but the overall model is the same. For example, XML attributes are typically provided as name and value arguments passed to element events, but can also be provided as separate events, or via a hash table or similar collection of all the attributes. For another, some implementations provide "Init" and "Fin" callbacks for the very start and end of parsing; others don't. The exact names for given event types also vary slightly between implementations.


Example

Given the following XML document: ¶ Some Text Pre-Text Inlined text Post-text. This XML document, when passed through a SAX parser, will generate a sequence of events like the following: * XML Element start, named ''DocumentElement'', with an attribute ''param'' equal to "value" * XML Element start, named ''FirstElement'' * XML Text node, with data equal to "¶ Some Text" (note: certain white spaces can be changed) * XML Element end, named ''FirstElement'' * Processing Instruction event, with the target ''some_pi'' and data ''some_attr="some_value"'' (the content after the target is just text; however, it is very common to imitate the syntax of XML attributes, as in this example) * XML Element start, named ''SecondElement'', with an attribute ''param2'' equal to "something" * XML Text node, with data equal to "Pre-Text" * XML Element start, named ''Inline'' * XML Text node, with data equal to "Inlined text" * XML Element end, named ''Inline'' * XML Text node, with data equal to "Post-text." * XML Element end, named ''SecondElement'' * XML Element end, named ''DocumentElement'' Note that the first line of the sample above is the XML Declaration and not a processing instruction; as such it will not be reported as a processing instruction event (although some SAX implementations provide a separate event just for the XML declaration). The result above may vary: the SAX specification deliberately states that a given section of text may be reported as multiple sequential text events. Many parsers, for example, return separate text events for numeric character references. Thus in the example above, a SAX parser may generate a different series of events, part of which might include: * XML Element start, named ''FirstElement'' * XML Text node, with data equal to "¶" (the Unicode character U+00b6) * XML Text node, with data equal to " Some Text" * XML Element end, named ''FirstElement''


See also

* Expat (XML) *
Java API for XML Processing In computing, the Java API for XML Processing, or JAXP ( ), one of the Java XML Application programming interfaces, provides the capability of validating and parsing XML documents. It has three basic parsing interfaces: * the Document Object M ...
* LibXML *
List of XML markup languages This is a list of notable XML markup languages. A * AdsML Markup language used for interchange of data between advertising systems. *aecXML: a mark-up language which uses Industry Foundation Classes to create a vendor-neutral means to access d ...
*
List of XML schemas This is a list of notable XML schemas in use on the Internet sorted by purpose. XML schemas can be used to create XML documents for a wide range of purposes such as syndication, general exchange, and storage of data in a standard format. Bookmar ...
*
MSXML Microsoft XML Core Services (MSXML) are set of services that allow applications written in JScript, VBScript, and Microsoft development tools to build Windows-native XML-based applications. It supports XML 1.0, DOM, SAX, an XSLT 1.0 processor, ...

RapidJSON
- a SAX-like API for JSON *
StAX Streaming API for XML (StAX) is an application programming interface ( API) to read and write XML documents, originating from the Java programming language community. Traditionally, XML APIs are either: * DOM based - the entire document is read i ...
*
Streaming XML Streaming XML is a synonym for dynamic data in XML format. Another popular use of this term refers to one method of consuming XML data – largely known as Simple API for XML. This is via asynchronous events that are generated as the XML data is ...
* VTD-XML * Xerces *
XQuery API for Java XQuery API for Java (XQJ) refers to the common Java API for the W3C XQuery 1.0 specification. The XQJ API enables Java programmers to execute XQuery against an XML data source (e.g. an XML database) while reducing or eliminating Vendor lock-i ...


References


Further reading

* *


External links


SAX home page
{{DEFAULTSORT:Simple Api For Xml Application programming interfaces XML-based standards