.
XMLC DOM Performance
Problem
- Building the DOM is an expensive operation, creating a large number of
objects.
- Outputting the DOM is an expensive operation, often more overhead
than creating it due to character-level operations required to
create character entity references.
- Large portions of the DOM tend to be unmodified (static) when creating
dynamic content using XMLC
- Sharing static portions of the DOM could avoid the cost of building it.
- Storing static portions of the DOM as pre-formatted strings with
entity references could save a lot of the overhead of formatting the
DOM.
- The general-purpose nature of the DOM makes it difficult to share static
portions. XMLC, while encoraging the use of access methods, provides
general access to the DOM. This has proven useful for things such as
rewrite of session ids into URLs and automatically adding headers and
footers to the DOM. Metodologies for sharing would require restricting
DOM access and/or non-standard DOM behavior. Apriori knowledge of the
dynamic portions maybe be required.
Lazy DOM
A lazy DOM representation is used to avoid creating DOM nodes that are not
changed by the program. This approach maintains the behavior of the
DOM API and requires no programmer specification of dynamic vs static
areas. It also efficiently handles portions of the DOM that are only
ocassionally modified. Access to preformatted tags and text is also
provided for unmodified protions of the lazy DOM.
An overview of the implementation is outlined here:
- Define LazyDOM nodes that inherits from the Xerces DOM nodes.
- While much of the functionality is similar to the Xerces deferred DOM,
up-front knowledge in XMLC compile phase makes a simpler repersentation
possible.
- Gives us more flexabilty for optimizations (e.g. access to internal nodes
without expanding parents).
- Avoid maintaining changes to Xerces DOM.
- Derived DOMs must extend our DOM (but other DOMs will continue to
work).
- Each node in the template DOM is assigned a 32-bit numeric node id
(nodeId). The nodeId is used to index tables
containing information about the document.
- A null nodeId is represented by -1.
- The document nodeId is always 0.
- Class-specific (static) representation:
- A read-only copy of the document DOM, with each node containing
its nodeId. Element and Text nodes may also contain a
preformatted text string.
- A table, indexed by nodeId, containing a reference to
the template node.
- Instance representation:
- The instance representation is built in a lazy manner. Initially,
only the Document node exists.
- Array of pointers to DOM nodes that have been instantiated,
indexed by nodeId. A null entry indicates that
the node has not been expanded. A non-null entry points
to the expanded node. Handling of deleted or moved nodes is not
determined. Might require a pointer to a constant node.
- Instance DOM structure and behavior:
- The Document node is always created and contains. A table of pointers
to nodes that have been expanded,indexed by nodeId.
- Each node contains:
- It's nodeId, if its was created from a template node.
- A pointer to the template node.
- Document, Element, Attr, and
EntityReference nodes contain the following attributes:
- childrenExpanded - Boolean indicates if the
children of this node have been expanded.
- parentExpanded - Attribute indicating that the
parent node has been expanded and this node is linked to
it.
- Element nodes also contain:
- attrExpanded - Boolean indicating if the attributes of
the Element been expanded.
- Node behavior:
- Access to a nodes children causes expansion. This is implemented
by overriding the DOM methods that access the parent and
children of a done..
- Lazy attributes are implemented by overriding attribute
access methods. Access to attributes causes the attribute
nodes to be created (which themselves are lazy).
- The DOM formatting facility provides an interface for traversing
a DOM in an efficient manner.