Also see the XMLC 2.2 Release Note, XMLC 2.2.1 Release Note, XMLC 2.2.2 Release Note, XMLC 2.2.3 Release Note, XMLC 2.2.4 Release Note, XMLC 2.2.5 Release Note, XMLC 2.2.6 Release Note, XMLC 2.2.7.1 Release Note, XMLC 2.2.8.1 Release Note, XMLC 2.2.9 Release Note, and XMLC 2.2.10 Release Note.
XML Documents with a DocumentType were never able to be cloned under Xerces1. It always resulted in an error, forcing the manual creation of a new Document and DocumentType and importation of the document element. Some DOM information was always lost performing this workaround. Xerces2 allows full clones by relaxing the rules around normally read-only DocumentTypes if the document is being cloned. I applied the Xerces2 code to Xerces1 and it works great! Additionally, I modified the code to copy the internal subset to the clone (which was also bug in Xerces-2.8.0 that I supplied a patch for, which was applied). Full document clones are now supported in XMLC's version of Xerces1 just as in Xerces2.
Avoid new Boolean() creation. Use Boolean.TRUE/Boolean.FALSE instead
Made sure MetaDataDocument is able to serialize to some file location and doesn't ever try creating a file without checking whether a location taken from the metadata is null first. Added a final fallback of the current directory, or directory from which the JVM started. Now the following output locations are used to serialize metadata files, in the order...
Modified all Document implementations to properly implement cloneNode(boolean) so that the resulting document will be an instance of the subclass rather than the parent class.
Made LazyDocument and LazyHTMLDocument cloneNode(boolean) template-aware and created a new method in LazyDocument called cloneAsTemplateDocument(TemplateDOM), which is meant to be overridden in subclasses, just as is cloneNode(boolean).
Also, with the CoreDocumentImpl cloning modification above and the Document implementation cloning mods, DocumentLoaderImpl now simply calls document.cloneNode(true) instead of creating the new document from scratch (had been done for XML, never HTML, files only), which ended up losing DOM information.
Package prefixes were being turned into paths using the File.pathSeparatorChar to replace periods in the package name, which is completely wrong. Changed this to replacing the period with "/", which is what it should have been all along. So, until now, only single name package prefixes (without periods) actually worked. Now this is fully functional.
If the title element didn't already exist, HTMLDocumentImpl.setTitle(String) created a new HTMLTitleElement, but neglected to append it as a child of the HTMLHeadElement. Fixing this corrects a diff failure in org.enhydra.xml.io.BasicTests.test2().
Added ability look up entities on both the classpath and file system. Only classpath was supported previously, even though all entities must be prefixed as "file:" (which is then stripped to look the entity up in the classpath). Took advantage of this odd implementation to extend it to try the file system if the entity is not found in the classpath. This allows one to specify a file system URL for -xcatalog flags.
Also did some general cleanup and made entity lookup failfast when an entity location prefixed by "file:" cannot be found on either the classpath or the file system. Previously, it would fail further along inside the XML parser, masking the real source of the problem.
Modified parsers to register elements of type "ID" with CoreDocumentImpl.identifiers map. Thus, where getElementById(String) never used to work for XML documents, it now does based on the document's DTD defining attributes of type "ID". For HTML documents, there is no DTD to find this information, so the "id" attribute is treated as always being of type "ID".
For HTML (and XHTML, VoiceXML, CHTML, and WML) documents, where lookup used to involve recursing the document looking for an element with an "id" attribute with a given value, lookup is now optimized by simply retrieving the element from the "identifiers" map. Fallback to the old recursion is provided in HTMLDocumentImpl (and, by extension, CHTMLDocumentImpl), so if the Id value is not found from the "identifiers" map initially, recursion will happen. This is to support parsers that don't manually populate the "identifiers" map, which is not a given since HTMLDocument validity is not enforced by a DTD.
Both deferred parsing and copy construction of standard load XMLC generated files use syncWithDocument(Node) to synchronize element fields with the DOM. Previously, this was a recursive method and every element in the DOM was interrogated for an an attribute of type "ID" with a given Id value, all put in a if-else-if statement. There were a few problems with this...
All problems were resolved by removing recursion altogether and using the new getElementById(String) enhancements. Id's are now looked up by reflecting over the generated element fields and calling getElementById(String) for the given field. While it's arguable that reflection might be inneficient itself, the syncWithDocument(Node) code is now always the same size and won't ever suffer from problem #2. And since the getElementById(String) lookup is optimized, it should minimally be a wash, performance-wise, with the old method.
Note that existing generated XMLC documents will continue to work because syncWithDocument(Node) had been, unnecessarily, implemented using recursion. The code calling syncWithDocument(Node) was already recursively calling said method. It actually worked out in thise case because I was able to remove the recursive call to syncWithDocument(Node) without breaking existing code because existing generated code already recurses. So, all should be well with existing code, but please report any problems to the mailing list.
Also note that, at least for HTML documents, it is recommended that Id's be named using an Upper-case first letter. This is because of the way that the reflective code initially looks up the Id based on the way the element field is named and on the way HTMLDocumentImpl (and, by extension, CHTMLDocumentImpl) falls back to old recursion behavior if the Id is not found in the "identifiers" map. Other Document implementations only look up from the "identifiers" map and never recurse, so if, say... "MyId" is not found initially, then "myId" is tried. This is the only real caveat to this scheme (and only with HTMLDocuments and extensions), but it may have the benefit of encouraging a consistent naming convention for Id's.
Updated DynamicMLCreator implementations (ASM2 and BCEL) to conform to the new implementation of the XMLC generated syncWithDocument(Node) method. Along with this, the ability to use a preferred XMLCDomFactory was also added. The syntax for specifying this is, for example...
xmlcDeferredParsingFactory.createFromFile(path, "org.enhydra.xml.xhtml.XHTMLDomFactory");
The single argument createFromFile(path) is still available. In this case, the XMLCDomFactoryCache provides the default XMLCDomFactory, which is currently LazyHTMLDomFactory for HTML documents and LazyDomFactory for all other document types.
There are still some test failures, but many are false negatives due to insignificant whitespace changes, seemingly caused by line ending changes forced by CVS and erroneously seen as different by the XMLC differ.
Other legitimate failures include...
Updated to ASM-2.2.3, BCEL-5.2, Jaxen-1.1-beta-11, and Log4j-1.2.14
Synched up some of the internal Xerces1 HTML implementation, the LazyDOM HTML implementation, and the XHTML implementation classses with some changes made to the Xerces2 HTML DOM implementations classes to make it easier to perform diff's and view truly significant differences.