org.enhydra.xml.xmlc.html.parsers.tidy
Class TidyHTMLParser

java.lang.Object
  |
  +--org.enhydra.xml.xmlc.html.parsers.HTMLParserBase
        |
        +--org.enhydra.xml.xmlc.html.parsers.tidy.TidyHTMLParser
All Implemented Interfaces:
XMLCParser

public class TidyHTMLParser
extends HTMLParserBase
implements XMLCParser

XMLCParser object for HTML and HTML framesets that uses the Java version of the W3C HTML tidy program. It uses Tidy to convert HTML to XHTML and then parses it with an XML parser.


Constructor Summary
TidyHTMLParser()
          Constructor.
 
Method Summary
 XMLCDocument parse(InputSource input, LineNumberMap lineNumberMap, XMLCDomFactory domFactory, MetaData metaData, ErrorReporter errorReporter, ParseTracer tracer)
          Parse a XML file (or any file, such as HTML, that can be converted into XML).
 
Methods inherited from class org.enhydra.xml.xmlc.html.parsers.HTMLParserBase
addPCDataContentElements, handleParseErrors, validateConf
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

TidyHTMLParser

public TidyHTMLParser()
               throws XMLCException
Constructor.

Method Detail

parse

public XMLCDocument parse(InputSource input,
                          LineNumberMap lineNumberMap,
                          XMLCDomFactory domFactory,
                          MetaData metaData,
                          ErrorReporter errorReporter,
                          ParseTracer tracer)
                   throws IOException,
                          XMLCException
Description copied from interface: XMLCParser
Parse a XML file (or any file, such as HTML, that can be converted into XML).

Specified by:
parse in interface XMLCParser
Parameters:
input - The input source to parse.
lineNumberMap - If not null, a dynamic map of input stream line numbers and offsets to source files and line numbers. This object is dynamically updated as input is read. It may not have valid mappings for characeters that have not been read.
domFactory - The DOM factory object.
metaData - MetaData for the document.
errorReporter - Object for reporting errors during the parse.
tracer - Object for parser info tracing.
Returns:
A XMLC document object that contains the actual DOM Document.
Throws:
XMLCException - Thrown for fatal errors found parsing the document.
IOException
See Also:
XMLCParser.parse(org.xml.sax.InputSource, org.enhydra.xml.xmlc.misc.LineNumberMap, org.enhydra.xml.xmlc.dom.XMLCDomFactory, org.enhydra.xml.xmlc.metadata.MetaData, org.enhydra.xml.io.ErrorReporter, org.enhydra.xml.xmlc.parsers.ParseTracer)


Copyright © 1999-2002 enhydra.org (Mark Diekhans, David Li, Richard Kunze). All Rights reserved.