A New Breed of XML
Well, thanks for making it over to my little corner of the Journal.
Not surprisingly, I'm going to try and give you a good dose of technical
talk; what might be surprising is what I am choosing to talk about
this time around. Now, certainly, nobody is going to go into shock
over me spending some time on XML; I'm pretty much inexorably tied
to XML, probably for the rest of my life! I had the privilidge of
writing Java
and XML for O'Reilly & Associates, I've been a columnist
for the IBM DeveloperWorks XML Zone and JavaWorld's XML area, and
have been involved in Apache Cocoon, Apache Xerces, and other XML-related
topics for quite a while now. So you might expect me to be spending
time on XML itself, or perhaps XSLT, XPath, XLink, or another XML-related
vocabulary. Or maybe you're hoping to get some instruction on SAX,
the Simple API for XML, DOM,
the Document Object Model, or even JAXP,
the Java API for XML Parsing. Or, you might be ready to get the
latest on JDOM, an API for XML
that Jason Hunter and I created and maintain. However, none of these
are what interest me in this issue.
You see, there is a new kind on the block in the Java and XML
world. This new kid provides a completely different approach to
handling XML. If you're a newbie to Java and XML, there are two
basic approaches to handling XML data. The first, embodied by SAX,
is to simply parse an XML document sequentially. At each step of
the parsing, a callback occurs, notifying the program code
that, for example, the start of an element has occurred, or a processing
instruction was encountered, or character data was read. This leaves
the task of what to do with the information completely up to the
developer. So while this is an exteremly efficient approach, as
no in-memory structures are automatically created, it often leaves
even intermediate level programmers scratching their heads as to
how they can interact with the XML. The second, and wildly more
popular approach, is represented traditionally by DOM, and more
recently by JDOM. In these APIs, XML is parsed and built into a
tree of data, mirroring the document structure. The developer can
then manipulate this tree, which is typically a very intuitive procedure.
However, and in particular with the DOM API, this is a very memory-consumptive,
high-overhead approach. Large XML documents are stored completely
just as small ones are, and the resultant tree structures can literally
cripple entire applications. This is even more applicable when the
structure is being fed to other processors, like in the case of
XSLT (XML transformations), where even more structures are
created and held in memory. However, this can result in some rather
tedious code, particularly in configuration using XML, which is
a common task. And this is where the new option for handling XML
comes into play.
Consider that more often than not, handling XML is more about moving
data from one format, a document on disk or from a network resource,
to another, a Java instance of some object. Let's take a simple
XML document as an example:
<?xml version="1.0"?>
<server>
<portNumber>80</portNumber>
<hostname>galadriel.middleearth.com</hostname>
</server>
|
This document holds some configuration information crucial for,
let's say, starting up a Java component called Server.
An instance of the Server class requires this information
to function. The result might be code like this:
// Obtain the Java representation of the XML document
Document doc = builder.build(new File("config.xml"));
Element root = doc.getRootElement();
// Create and configure the server component
Server server = new Server();
server.setPort(root.getChild("portNumber").getContent());
server.setHostname(root.getChild("hostname").getContent());
// Start the server
server.start();
|
So what's the big deal here? Is there something wrong with this
code? Well, actually there is. First, a tree structure is created
in memory, and holds a Java representation of the XML document.
But then, an instance of the "target" object is created in memory.
Then, data is "shuffled" from one in-memory structure to another.
The result is that often huge amounts of memory become nothing more
than staging grounds, converting between XML and the final Java
object. What would certainly be preferable is to take that same
code, and perform an action like this:
Server server = (Server)Converter.convert(new File("config.xml"));
// No configuration is required, as it has already been performed!
server.start();
|
In this latter case, obviously simpler and preferable, the XML
document is converted directly into a Java object. Now, there are
certainly going to be some in-memory structures created in this
process, but they are both:
-
Hidden from the client, making client programming easier.
-
Discarded when no longer needed, protecting the client program.
Clearly, this is a better, simpler means of handling XML documents
that directly map to Java objects. Aptly named, this methodology
is called data binding. Now if you are an XML guy, as I suppose
that I am, then perhaps data binding isn't such a new concept; however,
for most Java developers struggling with SAX and DOM, this is a
new means of dealing with XML. Put it this way: it wasn't in practice
in a real enough way to include it in my book, Java and XML,
written in the first quarter of this year!
So let's quickly breeze through an overview, and talk some more
specifics. JSR-031, a Java Specification Request from Sun, deals
with the issue of data binding. Unfortuantely, JSR-031 has been
on the table for a very long time, with very little action. It details
a process of creating an XML document, which represents a Java object
instance. The object itself is defined in a set of XML constraints,
and Java classes can even be generated from this set of constraints.
Documents that conform to those constraints are then unmarshalled,
or converted, into an instance of the object it represents. The
process can also occur in reverse, when a Java object instance is
marshalled, or converted, into an XML document.
The biggest problem today is that Sun has not taken the lead on
this concept. The hallmark of any JSR is a reference implementation,
which in essence dictates how adherents to a specification or technology
should behave. The reference implementation for JSR-031, code named
"Project Adelard", has yet to materialize. In addition, the original
specification indicated that XML Schemas, an XML vocabulary used
to represent document constraints, would play a vital part in the
process; XML Schemas, you see, are infinitely more expressive in
detailing constraints than their older counterparts, DTDs. In any
case, Adelard will come out in later 2000 or early 2001, but it
has been announced that it will not include XML Schema support,
instead allowing only DTDs to be used for constraints. The result
is a half-hearted first attempt at the technology, weakening the
example for others to follow.
So what does this mean? Well, it means that Enhydra must step
up and lead yet again, while others follow. I've recently written
a complete series of articles on XML Data Binding for IBM's DeveloperWorks
online magazines, and released with those articles a set of data
binding classes (which, of course, do support XML Schema).
These are fully-functional, and although there are still some features
to add, are very simple and very effective. And the best thing about
these classes is that they are now officially a part of the Enhydra
project, open sourced and ready for use. The classes are currently
available through the Enhydra FTP server, where many new conributions
like this will be staged. Dicussion is occurring even today on the
Enhydra mailing lists, particularly on the EnhydraEnterprise
and the architecture
group lists. Once again, I'm happy to report that Enhydra is
ahead of the curve.
So before closing shop, let's talk a little bit about how data
binding will make its way into the core Enhydra platform, and how
it is going to affect your development efforts; if it doesn't help
make your life easier, what good is it, right? So where you will
see data binding show up is in the Enhydra Naming Service (ENS),
which is the Enhydra facility built on top of JNDI (the Java Naming
and Directory Service). Currently, objects are bound into the JNDI
namespace through programmatic means and through JNDI properties
files. The problem, particularly using properties files, is that
there is no notion of type-safety. Keep in mind that when using
JNDI lookups directly, or even when narrowing objects through RMI-IIOP
(through the PortableRemoteObject's
narrow() method), an explicit cast must occur on
the client end. In other words, some degree of typing must always
occur. However, this typing is mot matched on the server-side. A
flat file, a JNDI properties file, has no notion of type. Persisting
objects from a JNDI namespace provides no idea of type-safety. And
while the client is left to pay the price of typing is lost in the
namespace, the server simply doesn't care!
However, data binding offers a new means of this. Instead of taking
an object bound into the namespace, and having the server "guess"
at writing the object out, that object can be marshalled into an
XML document. Suddenly, type-safety "magically" appears; not only
can we convert this object from our JNDI namespace into an object
in a predictable way, we can ensure that it meets a set of constraints,
represented in an XML Schema. And this schema does even more; it
provides the client with a view of the objects in the namespace,
and therefore a guarantee of their type. In other words, the cast
on the client side becomes not a "hope this works" but simply a
formality; the client knows it will work because the object is bound
by a set of viewable constraints.
And there's still more (what would you expect to pay for this
in a retail store? $49.95? $39.95? No! Order now and receive this
amazing offer for only ... well, you get the idea!). The final beauty
of this approach is that it enhances the ability to define your
own services in the Enhydra framework. Enhydra provides a means
of building services, such as a web service. You define certain
items, such as a port, a hostname, the document root, and so forth.
Consider, though, that previously this was done fairly ad hoc, often
using some arbitrary file format. Sort of like Perl - put semi-colons
here, and then a double period there... sure, that makes a lot
of sense ;-). With data binding, Enhydra needs only provide an XML
Schema defining the information that should be provided. You, then,
need only supply an XML document or documents that conform to the
provided schema, and you can rest assured that your service is ready
to go. And, surprise, surprise, data binding performs the task of
converting your XML document into a configuration object used directly
by a service manager. As you can see, this relatively small package
(the current set of classes number only 5!) plays a vital part of
the Enhydra platform's future, again making the application server
you get here, for free, a clear leader over all of its commercial
and non-commercial cousins.
So I hope you've gotten a bit of a taste about XML data binding,
and are perhaps ready to find out more. You can start by reading
the original series of articles at IBM that I spoke of, seeing some
more in-depth technical explanations and examples, and seeing how
this approach stacks up against other APIs, by checking out Article
One, Two,
and Three.
(The fourth IBM piece will focus on the
merits of JSP and how it compares with Enhydra XMLC.) And finally,
you can get the code for yourself, right now, at the Enhydra
FTP server. So check it out, speak out on the mailing lists,
and I'll see you online!
|