Also see the XMLC 2.2 Release Note, XMLC 2.2.1 Release Note, XMLC 2.2.2 Release Note, XMLC 2.2.3 Release Note, XMLC 2.2.4 Release Note, XMLC 2.2.5 Release Note, XMLC 2.2.6 Release Note, and XMLC 2.2.7.1 Release Note.
XMLC-2.2.8 was initially released, but pulled because of a fundamental issue with the enhancement "Better character encoding detection, along with proper detection, and handling, of BOM markers" that was missed before release. The 2.2.8.1 release is essentially identical but takes care of the problem. See description of said enhancement for more information.
Resolved Issue 300892. Copied, and modified, XmlReader from the Rome project for use in XMLC to provide enhanced encoding detection. BOM markers are now detected in both HTML and XML, as well as new detection of encoding defined in the XML header (if provided). Fallback continues to go to the <html encoding="[some encoding]"/> metadata, or the metadata default ("ISO-8859-1" in most cases) if encoding detection fails and it is left unspecified in the metadata. See the Javadoc in org.enhydra.xml.io.XmlReader for more details.
If an encoding other than "ISO-8859-1" is desired for HTML files, it will generally need to be defined in the metadata (since the only time when encoding detection would succeed for HTML files is when there is a BOM marker, which is not usually the case). For XML files, it would be advisable to include the encoding in the XML header. The XML header (or BOM marker) will override any encoding defined in the metadata.
Merged in the latest Xerces2 HTML DOM implementation source keeping original XMLC source only in cases where modifications were made to work around bugginess or oddities in the original implementation. This brings in all fixes made in the Xerces2 source as well as allowed me to find a couple bugs in that source that hadn't been caught yet (usually minor, with some exceptions). Diffs to the original Xerces2 source should now be quite simple. I regenerated the LazyDOM source off the new HTML source.
On top of this I, painstakingly (it was a mindnumbingly long and painful process, so I hope it was worth the effort!), merged the HTML Impl sources with the XHTML Impl sources, where applicable, making them as similar as possible (even taking into account white space). This means that all the fixes brought in from the Xerces2 HTML sources also exist in the XHTML sources (accounting for differences in specification and implementation). Of course, that's not the best part. Up to now, one had to choose between using the XHTML DOM or the HTML DOM. The XHTML DOM could not be dropped in and used where the HTML DOM had been used extensively. The sources have been fixed up to make it possible to use the XHTML DOM without changing existing code that used the HTML DOM. For instance, doc.createElement("DIV")
would have failed previous to these changes. Not only did the XHTML DOM not properly implement the HTML interfaces (causing ClassCastException's), but this would have failed because of XHTML case-sensitivity (lower-case defined elements in XHTML as opposed to upper-case in HTML). It now works by forcing lower-case on element creation strings with no namespace!
Besides having to update options.xmlc to use the "xhtml" DOM and update the markup to comply with one of the XHTML DTDs (meaning you can now enjoy compile-time validation of your markup and not have to worry about JTidy messiness), the only thing to be aware of is providing the following (recommended) OutputOptions or some browsers might freak out (setXHTMLCompatibility(true) is, actually, nearly mandatory in order for the resulting markup to work properly in most browsers)...
oo.setOmitXMLHeader(true); oo.setEnableXHTMLCompatibility(true); oo.setUseAposEntity(false);
The nearly empty abstract class BaseCmdOptions served no useful purpose, so it was dereferenced and removed.
Resolved Issue 300079. Got rid of last remaining unnecessary imports in generated classes. This is not terribly important, but does enhance ones experience in tools like Eclipse where warnings are generated, by default, for every unnecessary import.
Got rid of a few more warnings reported by Eclipse 3.1
Updated to ASM-2.2.1