org.enhydra.apache.xerces.readers
Class UTF8Reader

java.lang.Object
  |
  +--org.enhydra.apache.xerces.readers.XMLEntityReader
        |
        +--org.enhydra.apache.xerces.readers.UTF8Reader
All Implemented Interfaces:
XMLEntityHandler.EntityReader

final class UTF8Reader
extends XMLEntityReader

This is the primary reader used for UTF-8 encoded byte streams.

This reader processes requests from the scanners against the underlying UTF-8 byte stream, avoiding when possible any up-front transcoding. When the StringPool handle interfaces are used, the information in the data stream will be added to the string pool and lazy-evaluated until asked for.

We use the SymbolCache to match expected names (element types in end tags) and walk the data structures of that class directly.

There is a significant amount of hand-inlining and some blatant voilation of good object oriented programming rules, ignoring boundaries of modularity, etc., in the name of good performance.

There are also some places where the code here frequently crashes the SUN java runtime compiler (JIT) and the code here has been carefully "crafted" to avoid those problems.

Version:
$Id: UTF8Reader.java,v 1.2 2005/01/26 08:28:44 jkjome Exp $

Field Summary
static byte[] fgAsciiAttValueChar
           
static byte[] fgAsciiEntityValueChar
           
 
Fields inherited from class org.enhydra.apache.xerces.readers.XMLEntityReader
fCarriageReturnCounter, fCharacterCounter, fCharDataHandler, fCurrentOffset, fEntityHandler, fErrorReporter, fInCDSect, fLinefeedCounter, fSendCharDataAsCharArray
 
Constructor Summary
UTF8Reader(XMLEntityHandler entityHandler, XMLErrorReporter errorReporter, boolean sendCharDataAsCharArray, InputStream dataStream, StringPool stringPool)
           
 
Method Summary
 int addString(int offset, int length)
          Add a string to the StringPool from the characters scanned using this reader as described by offset and length.
 int addSymbol(int offset, int length)
          Add a symbol to the StringPool from the characters scanned using this reader as described by offset and length.
 void append(XMLEntityHandler.CharBuffer charBuffer, int offset, int length)
          Append the characters processed by this reader associated with offset and length to the CharBuffer.
 XMLEntityHandler.EntityReader changeReaders()
          This method is called by the reader subclasses at the end of input.
 boolean lookingAtChar(char ch, boolean skipPastChar)
          Test that the current character is a ch character.
 boolean lookingAtSpace(boolean skipPastChar)
          Test that the current character is a whitespace character.
 boolean lookingAtValidChar(boolean skipPastChar)
          Test that the current character is valid.
 int scanAttValue(char qchar, boolean asSymbol)
          Scan an attribute value.
 int scanCharRef(boolean hex)
          Scan a character reference.
 int scanContent(QName element)
          Skip through the input while we are looking at character data.
 int scanEntityValue(int qchar, boolean createString)
          Scan an entity value.
 boolean scanExpectedName(char fastcheck, StringPool.CharArrayRange expectedName)
          Scan the name that is expected at the current position in the document.
 int scanInvalidChar()
          Scan an invalid character.
 int scanName(char fastcheck)
          Add a sequence of characters that match the XML definition of a Name to the StringPool.
 void scanQName(char fastcheck, QName qname)
          Add a sequence of characters that match the XML Namespaces definition of a QName to the StringPool.
 int scanStringLiteral()
          Scan a string literal.
 void skipPastName(char fastcheck)
          Skip past a sequence of characters that match the XML definition of a Name.
 void skipPastNmtoken(char fastcheck)
          Skip past a sequence of characters that match the XML definition of an Nmtoken.
 void skipPastSpaces()
          Skip past whitespace characters starting at the current position.
protected  boolean skippedMultiByteCharWithFlag(int b0, int flag)
           
 boolean skippedString(char[] s)
          Skip past a sequence of characters that matches the specified character array.
 void skipToChar(char ch)
          Advance through the input data up to the next ch character.
 
Methods inherited from class org.enhydra.apache.xerces.readers.XMLEntityReader
currentOffset, getColumnNumber, getInCDSect, getLineNumber, init, setInCDSect
 
Methods inherited from class java.lang.Object
, clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

fgAsciiAttValueChar

public static final byte[] fgAsciiAttValueChar

fgAsciiEntityValueChar

public static final byte[] fgAsciiEntityValueChar
Constructor Detail

UTF8Reader

public UTF8Reader(XMLEntityHandler entityHandler,
                  XMLErrorReporter errorReporter,
                  boolean sendCharDataAsCharArray,
                  InputStream dataStream,
                  StringPool stringPool)
           throws Exception
Method Detail

addString

public int addString(int offset,
                     int length)
Description copied from interface: XMLEntityHandler.EntityReader
Add a string to the StringPool from the characters scanned using this reader as described by offset and length.
Following copied from interface: org.enhydra.apache.xerces.readers.XMLEntityHandler.EntityReader
Parameters:
offset - The offset within this reader where the characters start.
length - The length within this reader where the characters end.
Returns:
The StringPool handle for the string.

addSymbol

public int addSymbol(int offset,
                     int length)
Description copied from interface: XMLEntityHandler.EntityReader
Add a symbol to the StringPool from the characters scanned using this reader as described by offset and length.
Following copied from interface: org.enhydra.apache.xerces.readers.XMLEntityHandler.EntityReader
Parameters:
offset - The offset within this reader where the characters start.
length - The length within this reader where the characters end.
Returns:
The StringPool handle for the symbol.

append

public void append(XMLEntityHandler.CharBuffer charBuffer,
                   int offset,
                   int length)
Description copied from interface: XMLEntityHandler.EntityReader
Append the characters processed by this reader associated with offset and length to the CharBuffer.
Following copied from interface: org.enhydra.apache.xerces.readers.XMLEntityHandler.EntityReader
Parameters:
charBuffer - The CharBuffer to append the characters to.
offset - The offset within this reader where the copy should start.
length - The length within this reader where the copy should stop.

changeReaders

public XMLEntityHandler.EntityReader changeReaders()
                                            throws Exception
Description copied from class: XMLEntityReader
This method is called by the reader subclasses at the end of input.
Overrides:
changeReaders in class XMLEntityReader

lookingAtChar

public boolean lookingAtChar(char ch,
                             boolean skipPastChar)
                      throws Exception
Description copied from interface: XMLEntityHandler.EntityReader
Test that the current character is a ch character.
Following copied from interface: org.enhydra.apache.xerces.readers.XMLEntityHandler.EntityReader
Parameters:
ch - The character to match against.
skipPastChar - If true, we advance past the matched character.
Returns:
true if the current character is a ch character; false otherwise.
Throws:
Exception -  

lookingAtValidChar

public boolean lookingAtValidChar(boolean skipPastChar)
                           throws Exception
Description copied from interface: XMLEntityHandler.EntityReader
Test that the current character is valid.
Following copied from interface: org.enhydra.apache.xerces.readers.XMLEntityHandler.EntityReader
Parameters:
skipPastChar - If true, we advance past the valid character.
Returns:
true if the current character is valid; false otherwise.
Throws:
Exception -  

lookingAtSpace

public boolean lookingAtSpace(boolean skipPastChar)
                       throws Exception
Description copied from interface: XMLEntityHandler.EntityReader
Test that the current character is a whitespace character.
Following copied from interface: org.enhydra.apache.xerces.readers.XMLEntityHandler.EntityReader
Parameters:
skipPastChar - If true, we advance past the whitespace character.
Returns:
true if the current character is whitespace; false otherwise.
Throws:
Exception -  

skipToChar

public void skipToChar(char ch)
                throws Exception
Description copied from interface: XMLEntityHandler.EntityReader
Advance through the input data up to the next ch character.
Following copied from interface: org.enhydra.apache.xerces.readers.XMLEntityHandler.EntityReader
Parameters:
ch - The character to search for.
Throws:
Exception -  

skipPastSpaces

public void skipPastSpaces()
                    throws Exception
Description copied from interface: XMLEntityHandler.EntityReader
Skip past whitespace characters starting at the current position.
Following copied from interface: org.enhydra.apache.xerces.readers.XMLEntityHandler.EntityReader
Throws:
Exception -  

skippedMultiByteCharWithFlag

protected boolean skippedMultiByteCharWithFlag(int b0,
                                               int flag)
                                        throws Exception

skipPastName

public void skipPastName(char fastcheck)
                  throws Exception
Description copied from interface: XMLEntityHandler.EntityReader
Skip past a sequence of characters that match the XML definition of a Name.
Following copied from interface: org.enhydra.apache.xerces.readers.XMLEntityHandler.EntityReader
Throws:
Exception -  

skipPastNmtoken

public void skipPastNmtoken(char fastcheck)
                     throws Exception
Description copied from interface: XMLEntityHandler.EntityReader
Skip past a sequence of characters that match the XML definition of an Nmtoken.
Following copied from interface: org.enhydra.apache.xerces.readers.XMLEntityHandler.EntityReader
Throws:
Exception -  

skippedString

public boolean skippedString(char[] s)
                      throws Exception
Description copied from interface: XMLEntityHandler.EntityReader
Skip past a sequence of characters that matches the specified character array.
Following copied from interface: org.enhydra.apache.xerces.readers.XMLEntityHandler.EntityReader
Parameters:
s - The characters to match.
Returns:
true if the current character is valid; false otherwise.
Throws:
Exception -  

scanInvalidChar

public int scanInvalidChar()
                    throws Exception
Description copied from interface: XMLEntityHandler.EntityReader
Scan an invalid character.
Following copied from interface: org.enhydra.apache.xerces.readers.XMLEntityHandler.EntityReader
Returns:
The invalid character as an integer, or -1 if there was a bad encoding.
Throws:
Exception -  

scanCharRef

public int scanCharRef(boolean hex)
                throws Exception
Description copied from interface: XMLEntityHandler.EntityReader
Scan a character reference.
Following copied from interface: org.enhydra.apache.xerces.readers.XMLEntityHandler.EntityReader
Returns:
The value of the character, or one of the following error codes: CHARREF_RESULT_SEMICOLON_REQUIRED CHARREF_RESULT_INVALID_CHAR CHARREF_RESULT_OUT_OF_RANGE
Throws:
Exception -  

scanStringLiteral

public int scanStringLiteral()
                      throws Exception
Description copied from interface: XMLEntityHandler.EntityReader
Scan a string literal.
Following copied from interface: org.enhydra.apache.xerces.readers.XMLEntityHandler.EntityReader
Returns:
The StringPool handle for the string that was scanned, or one of the following error codes: STRINGLIT_RESULT_QUOTE_REQUIRED STRINGLIT_RESULT_INVALID_CHAR
Throws:
Exception -  

scanAttValue

public int scanAttValue(char qchar,
                        boolean asSymbol)
                 throws Exception
Description copied from interface: XMLEntityHandler.EntityReader
Scan an attribute value.
Following copied from interface: org.enhydra.apache.xerces.readers.XMLEntityHandler.EntityReader
Parameters:
qchar - The initial quote character, either a single or double quote.
Returns:
The StringPool handle for the string that was scanned, or one of the following error codes: ATTVALUE_RESULT_COMPLEX ATTVALUE_RESULT_LESSTHAN ATTVALUE_RESULT_INVALID_CHAR
Throws:
Exception -  

scanEntityValue

public int scanEntityValue(int qchar,
                           boolean createString)
                    throws Exception
Description copied from interface: XMLEntityHandler.EntityReader
Scan an entity value.
Following copied from interface: org.enhydra.apache.xerces.readers.XMLEntityHandler.EntityReader
Parameters:
qchar - The initial quote character, either a single or double quote.
Returns:
The StringPool handle for the string that was scanned, or one of the following error codes: ENTITYVALUE_RESULT_FINISHED ENTITYVALUE_RESULT_REFERENCE ENTITYVALUE_RESULT_PEREF ENTITYVALUE_RESULT_INVALID_CHAR ENTITYVALUE_RESULT_END_OF_INPUT
Throws:
Exception -  

scanExpectedName

public boolean scanExpectedName(char fastcheck,
                                StringPool.CharArrayRange expectedName)
                         throws Exception
Description copied from interface: XMLEntityHandler.EntityReader
Scan the name that is expected at the current position in the document. This method is invoked when we are scanning the element type in an end tag that must match the element type in the corresponding start tag.
Following copied from interface: org.enhydra.apache.xerces.readers.XMLEntityHandler.EntityReader
Parameters:
fastcheck - A character that is not a legal name character that is provided as a hint to the reader of a character likely to terminate the Name.
expectedName - The characters of the name we expect.
Returns:
true if we scanned the name we expected to find; otherwise false if we did not.
Throws:
Exception -  

scanQName

public void scanQName(char fastcheck,
                      QName qname)
               throws Exception
Description copied from interface: XMLEntityHandler.EntityReader
Add a sequence of characters that match the XML Namespaces definition of a QName to the StringPool. If we find a QName at the current position we will add it to the StringPool and will return the string pool handle of that QName to the caller.
Following copied from interface: org.enhydra.apache.xerces.readers.XMLEntityHandler.EntityReader
Parameters:
fastcheck - A character that is not a legal name character that is provided as a hint to the reader of a character likely to terminate the Name.
Throws:
Exception -  

scanName

public int scanName(char fastcheck)
             throws Exception
Description copied from interface: XMLEntityHandler.EntityReader
Add a sequence of characters that match the XML definition of a Name to the StringPool. If we find a name at the current position we will add it to the StringPool as a symbol and will return the string pool handle for that symbol to the caller.
Following copied from interface: org.enhydra.apache.xerces.readers.XMLEntityHandler.EntityReader
Parameters:
fastcheck - A character that is not a legal name character that is provided as a hint to the reader of a character likely to terminate the Name.
Returns:
The StringPool handle for the name that was scanned, or -1 if a name was not found at the current position within the input data.
Throws:
Exception -  

scanContent

public int scanContent(QName element)
                throws Exception
Description copied from interface: XMLEntityHandler.EntityReader
Skip through the input while we are looking at character data.
Following copied from interface: org.enhydra.apache.xerces.readers.XMLEntityHandler.EntityReader
Parameters:
elementType - The element type handle in the StringPool.
Returns:
One of the following result codes: CONTENT_RESULT_START_OF_PI CONTENT_RESULT_START_OF_COMMENT CONTENT_RESULT_START_OF_CDSECT CONTENT_RESULT_END_OF_CDSECT CONTENT_RESULT_START_OF_ETAG CONTENT_RESULT_MATCHING_ETAG CONTENT_RESULT_START_OF_ELEMENT CONTENT_RESULT_START_OF_CHARREF CONTENT_RESULT_START_OF_ENTITYREF CONTENT_RESULT_INVALID_CHAR CONTENT_RESULT_MARKUP_NOT_RECOGNIZED CONTENT_RESULT_MARKUP_END_OF_INPUT CONTENT_RESULT_REFERENCE_END_OF_INPUT
Throws:
Exception -  


Copyright © 1999 The Apache Software Foundation. All Rights reserved.