The xmlc
command is used to run XMLC. All of command options are
explained in detail in the
the XMLC command reference page.
It parses an HTML pages and normally creates a Java class than
contains a DOM representation of that object.
The XMLC compilation process is straight forward, with only a class name, output directory and a HTML file normally required as parameters. For example:
xmlc -d ../../classes -class app.presentation.user.UserTable ../html/usertable.htmlwould generate an XMLC class named
app.presentation.user.UserTable
from a HTML file
../html/usertable.html
and writes the resulting class file under
the directory ../../classes
.
If one is developing applications using the standard Enhydra make
configuration, it is very easy compile with XMLC. The rules
will compile a HTML named in the form foo.html
into a class named fooHTML
in the package the makefile
is associated with.
The following variables, defined in stdrules.mk
, are used
when compiling HTML files with XMLC:
HTML_CLASSES
- List of classes to be generated from HTML using XMLC.
These classes *must* be named in the form xxxxHTML and be generated
from a HTML file named xxxx.html in HTML_DIR. The names in this
variable must not include the .class extension.
HTML_DIR
- Directory containing HTML files to compile with XMLC. It
should be relative to the current directory or $(ROOT).
XMLC_HTML_OPTS
- Options to pass to XMLC when compiling HTML files.
Maybe left unspecified or empty.
HTML_XMLC_OPTS_FILE
- Name of .xmlc options file to use when compiling
HTML files with XMLC. Maybe left unspecified or empty.
The following is an example Enhydra makefile for compiling four HTML objects . More examples maybe found in the
ROOT = ../../../.. PACKAGEDIR = golfShop/presentation/xmlc/login HTML_DIR = ../../html/login HTML_XMLC_OPTS_FILE = login.xmlc HTML_CLASSES = LoginHTML \ LogoutHTML \ CheckVersionHTML \ NewAccountHTML include $(ROOT)/config.mk
In certain cases, it maybe necessary to have specific rules defined for HTML objects that need options that don't apply to all HTML files. The following variables maybe used in construction new rules:
XMLC
- The path of the script that runs the XMLC compiler.
XMLC_CMD
- Contains a shell command that sets up the class path
and runs the XMLC compiler.
The HTML parser defaults to HTML Tidy, earlier versions of XMLC used the
Swing HTML parser. Due to differences in the way non-conforming HTML
is handled, the resulting DOM trees may not be the same. If one does
not wish to fix these inconsistencies, it is suggested that existing
documents use the Swing parser. This is specified with the
-parser swing
option.
When using the Enhydra make rules, the parser can be
specified by setting:
XMLC_HTML_OPTS = -parser swing
The default XMLC HTML parser is built on the Java port of the HTML Tidy program. This parser locates, and often correct many errors in HTML. Problems that can't be corrected must be fixed before the page will compile. The HTML Tidy program maybe useful in producing corrected HTML files.
The HTML Tidy parser will reject proprietary tags it does not understand.
Several options are useful for understanding the results of XMLC.
The -verbose
option provides a tracing of the overall execution
of XMLC (but not the parser details obtained with -parseinfo
).
The -info
option produces a dump of information about the
page being compiled, currently consisting of the ids and URLs found in the
page. With the -methods
option, a list of all of the
generated access methods for the class will be produced.
The -parseinfo
option, which traces the execution of the HTML
parser, can be very useful in debugging page problems that are not obvious
from the parser error messages.
To see the DOM tree that is produced, use the -dump
option. These options can all be used with -nocompile
to
only get status without generating a class file.
Occasionally the only different between the mockup HTML page and the
page in the application is the URLs. This is often the case for
frame sets. XMLC can address this using the -urlmapping
options to update the URLs and then the -docout
option
to write a new HTML file with updated URLs instead of producing a
class file.