www.enhydra.org
 

 
FRONT PAGE

 

A New Breed of XML

Brett McLaughlin, Lutris Enhydra Strategist

Well, thanks for making it over to my little corner of the Journal. Not surprisingly, I'm going to try and give you a good dose of technical talk; what might be surprising is what I am choosing to talk about this time around. Now, certainly, nobody is going to go into shock over me spending some time on XML; I'm pretty much inexorably tied to XML, probably for the rest of my life! I had the privilidge of writing Java and XML for O'Reilly & Associates, I've been a columnist for the IBM DeveloperWorks XML Zone and JavaWorld's XML area, and have been involved in Apache Cocoon, Apache Xerces, and other XML-related topics for quite a while now. So you might expect me to be spending time on XML itself, or perhaps XSLT, XPath, XLink, or another XML-related vocabulary. Or maybe you're hoping to get some instruction on SAX, the Simple API for XML, DOM, the Document Object Model, or even JAXP, the Java API for XML Parsing. Or, you might be ready to get the latest on JDOM, an API for XML that Jason Hunter and I created and maintain. However, none of these are what interest me in this issue.

You see, there is a new kind on the block in the Java and XML world. This new kid provides a completely different approach to handling XML. If you're a newbie to Java and XML, there are two basic approaches to handling XML data. The first, embodied by SAX, is to simply parse an XML document sequentially. At each step of the parsing, a callback occurs, notifying the program code that, for example, the start of an element has occurred, or a processing instruction was encountered, or character data was read. This leaves the task of what to do with the information completely up to the developer. So while this is an exteremly efficient approach, as no in-memory structures are automatically created, it often leaves even intermediate level programmers scratching their heads as to how they can interact with the XML. The second, and wildly more popular approach, is represented traditionally by DOM, and more recently by JDOM. In these APIs, XML is parsed and built into a tree of data, mirroring the document structure. The developer can then manipulate this tree, which is typically a very intuitive procedure. However, and in particular with the DOM API, this is a very memory-consumptive, high-overhead approach. Large XML documents are stored completely just as small ones are, and the resultant tree structures can literally cripple entire applications. This is even more applicable when the structure is being fed to other processors, like in the case of XSLT (XML transformations), where even more structures are created and held in memory. However, this can result in some rather tedious code, particularly in configuration using XML, which is a common task. And this is where the new option for handling XML comes into play.

Consider that more often than not, handling XML is more about moving data from one format, a document on disk or from a network resource, to another, a Java instance of some object. Let's take a simple XML document as an example:

<?xml version="1.0"?>

<server>
  <portNumber>80</portNumber>
  <hostname>galadriel.middleearth.com</hostname>
</server>

This document holds some configuration information crucial for, let's say, starting up a Java component called Server. An instance of the Server class requires this information to function. The result might be code like this:

    // Obtain the Java representation of the XML document
    Document doc = builder.build(new File("config.xml"));
    Element root = doc.getRootElement();
    
    // Create and configure the server component
    Server server = new Server();
    server.setPort(root.getChild("portNumber").getContent());
    server.setHostname(root.getChild("hostname").getContent());

    // Start the server
    server.start();

So what's the big deal here? Is there something wrong with this code? Well, actually there is. First, a tree structure is created in memory, and holds a Java representation of the XML document. But then, an instance of the "target" object is created in memory. Then, data is "shuffled" from one in-memory structure to another. The result is that often huge amounts of memory become nothing more than staging grounds, converting between XML and the final Java object. What would certainly be preferable is to take that same code, and perform an action like this:

    Server server = (Server)Converter.convert(new File("config.xml"));

    // No configuration is required, as it has already been performed!
    server.start();

In this latter case, obviously simpler and preferable, the XML document is converted directly into a Java object. Now, there are certainly going to be some in-memory structures created in this process, but they are both:

  • Hidden from the client, making client programming easier.

  • Discarded when no longer needed, protecting the client program.

Clearly, this is a better, simpler means of handling XML documents that directly map to Java objects. Aptly named, this methodology is called data binding. Now if you are an XML guy, as I suppose that I am, then perhaps data binding isn't such a new concept; however, for most Java developers struggling with SAX and DOM, this is a new means of dealing with XML. Put it this way: it wasn't in practice in a real enough way to include it in my book, Java and XML, written in the first quarter of this year!

So let's quickly breeze through an overview, and talk some more specifics. JSR-031, a Java Specification Request from Sun, deals with the issue of data binding. Unfortuantely, JSR-031 has been on the table for a very long time, with very little action. It details a process of creating an XML document, which represents a Java object instance. The object itself is defined in a set of XML constraints, and Java classes can even be generated from this set of constraints. Documents that conform to those constraints are then unmarshalled, or converted, into an instance of the object it represents. The process can also occur in reverse, when a Java object instance is marshalled, or converted, into an XML document.

The biggest problem today is that Sun has not taken the lead on this concept. The hallmark of any JSR is a reference implementation, which in essence dictates how adherents to a specification or technology should behave. The reference implementation for JSR-031, code named "Project Adelard", has yet to materialize. In addition, the original specification indicated that XML Schemas, an XML vocabulary used to represent document constraints, would play a vital part in the process; XML Schemas, you see, are infinitely more expressive in detailing constraints than their older counterparts, DTDs. In any case, Adelard will come out in later 2000 or early 2001, but it has been announced that it will not include XML Schema support, instead allowing only DTDs to be used for constraints. The result is a half-hearted first attempt at the technology, weakening the example for others to follow.

So what does this mean? Well, it means that Enhydra must step up and lead yet again, while others follow. I've recently written a complete series of articles on XML Data Binding for IBM's DeveloperWorks online magazines, and released with those articles a set of data binding classes (which, of course, do support XML Schema). These are fully-functional, and although there are still some features to add, are very simple and very effective. And the best thing about these classes is that they are now officially a part of the Enhydra project, open sourced and ready for use. The classes are currently available through the Enhydra FTP server, where many new conributions like this will be staged. Dicussion is occurring even today on the Enhydra mailing lists, particularly on the EnhydraEnterprise and the architecture group lists. Once again, I'm happy to report that Enhydra is ahead of the curve.

So before closing shop, let's talk a little bit about how data binding will make its way into the core Enhydra platform, and how it is going to affect your development efforts; if it doesn't help make your life easier, what good is it, right? So where you will see data binding show up is in the Enhydra Naming Service (ENS), which is the Enhydra facility built on top of JNDI (the Java Naming and Directory Service). Currently, objects are bound into the JNDI namespace through programmatic means and through JNDI properties files. The problem, particularly using properties files, is that there is no notion of type-safety. Keep in mind that when using JNDI lookups directly, or even when narrowing objects through RMI-IIOP (through the PortableRemoteObject's narrow() method), an explicit cast must occur on the client end. In other words, some degree of typing must always occur. However, this typing is mot matched on the server-side. A flat file, a JNDI properties file, has no notion of type. Persisting objects from a JNDI namespace provides no idea of type-safety. And while the client is left to pay the price of typing is lost in the namespace, the server simply doesn't care!

However, data binding offers a new means of this. Instead of taking an object bound into the namespace, and having the server "guess" at writing the object out, that object can be marshalled into an XML document. Suddenly, type-safety "magically" appears; not only can we convert this object from our JNDI namespace into an object in a predictable way, we can ensure that it meets a set of constraints, represented in an XML Schema. And this schema does even more; it provides the client with a view of the objects in the namespace, and therefore a guarantee of their type. In other words, the cast on the client side becomes not a "hope this works" but simply a formality; the client knows it will work because the object is bound by a set of viewable constraints.

And there's still more (what would you expect to pay for this in a retail store? $49.95? $39.95? No! Order now and receive this amazing offer for only ... well, you get the idea!). The final beauty of this approach is that it enhances the ability to define your own services in the Enhydra framework. Enhydra provides a means of building services, such as a web service. You define certain items, such as a port, a hostname, the document root, and so forth. Consider, though, that previously this was done fairly ad hoc, often using some arbitrary file format. Sort of like Perl - put semi-colons here, and then a double period there... sure, that makes a lot of sense ;-). With data binding, Enhydra needs only provide an XML Schema defining the information that should be provided. You, then, need only supply an XML document or documents that conform to the provided schema, and you can rest assured that your service is ready to go. And, surprise, surprise, data binding performs the task of converting your XML document into a configuration object used directly by a service manager. As you can see, this relatively small package (the current set of classes number only 5!) plays a vital part of the Enhydra platform's future, again making the application server you get here, for free, a clear leader over all of its commercial and non-commercial cousins.

So I hope you've gotten a bit of a taste about XML data binding, and are perhaps ready to find out more. You can start by reading the original series of articles at IBM that I spoke of, seeing some more in-depth technical explanations and examples, and seeing how this approach stacks up against other APIs, by checking out Article One, Two, and Three. (The fourth IBM piece will focus on the merits of JSP and how it compares with Enhydra XMLC.) And finally, you can get the code for yourself, right now, at the Enhydra FTP server. So check it out, speak out on the mailing lists, and I'll see you online!

Top

 

TOP OF PAGE

Lutris Technologies    Legal Notices    Privacy Policy