|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: INNER | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Object | +--org.enhydra.apache.xerces.utils.regex.RegularExpression
A regular expression matching engine using Non-deterministic Finite Automaton (NFA). This engine does not conform to the POSIX regular expression.
RegularExpression re = new RegularExpression(regex); if (re.matches(text)) { ... }
RegularExpression re = new RegularExpression(regex);
Match match = new Match();
if (re.matches(text, match)) {
... // You can refer captured texts with methods of the Match
class.
}
RegularExpression re = new RegularExpression(regex, "i"); if (re.matches(text) >= 0) { ...}
You can specify options to RegularExpression(
regex,
options)
or setPattern(
regex,
options)
.
This options parameter consists of the following characters.
"i"
"m"
"s"
"u"
"w"
","
"X"
match()
method does not do subsring matching
but entire string matching.
Differences from the Perl 5 regular expression
|
Meta characters are `. * + ? { [ ( ) | \ ^ $'.
This range matches the character.
This range matches a character which has a code point that is >= C1's code point and <= C2's code point. + *
...
These expressions specifies the same ranges as the following expressions.
Enumerated ranges are merged (union operation). [a-ec-z] is equivalent to [a-z]
Match
instance
after matches(String,Match)
.
The 0th group means whole of this regular expression.
The Nth gorup is the inside of the Nth left parenthesis.
For instance, a regular expression is " *([^<:]*) +<([^>]*)> *" and target text is "From: TAMURA Kent <kent@trl.ibm.co.jp>":
Match.getCapturedText(0)
:
" TAMURA Kent <kent@trl.ibm.co.jp>"
Match.getCapturedText(1)
: "TAMURA Kent"
Match.getCapturedText(2)
: "kent@trl.ibm.co.jp"
regex ::= ('(?' options ')')? term ('|' term)* term ::= factor+ factor ::= anchors | atom (('*' | '+' | '?' | minmax ) '?'? )? | '(?#' [^)]* ')' minmax ::= '{' ([0-9]+ | [0-9]+ ',' | ',' [0-9]+ | [0-9]+ ',' [0-9]+) '}' atom ::= char | '.' | char-class | '(' regex ')' | '(?:' regex ')' | '\' [0-9] | '\w' | '\W' | '\d' | '\D' | '\s' | '\S' | category-block | '\X' | '(?>' regex ')' | '(?' options ':' regex ')' | '(?' ('(' [0-9] ')' | '(' anchors ')' | looks) term ('|' term)? ')' options ::= [imsw]* ('-' [imsw]+)? anchors ::= '^' | '$' | '\A' | '\Z' | '\z' | '\b' | '\B' | '\<' | '\>' looks ::= '(?=' regex ')' | '(?!' regex ')' | '(?<=' regex ')' | '(?<!' regex ')' char ::= '\\' | '\' [efnrtv] | '\c' [@-_] | code-point | character-1 category-block ::= '\' [pP] category-symbol-1 | ('\p{' | '\P{') (category-symbol | block-name | other-properties) '}' category-symbol-1 ::= 'L' | 'M' | 'N' | 'Z' | 'C' | 'P' | 'S' category-symbol ::= category-symbol-1 | 'Lu' | 'Ll' | 'Lt' | 'Lm' | Lo' | 'Mn' | 'Me' | 'Mc' | 'Nd' | 'Nl' | 'No' | 'Zs' | 'Zl' | 'Zp' | 'Cc' | 'Cf' | 'Cn' | 'Co' | 'Cs' | 'Pd' | 'Ps' | 'Pe' | 'Pc' | 'Po' | 'Sm' | 'Sc' | 'Sk' | 'So' block-name ::= (See above) other-properties ::= 'ALL' | 'ASSIGNED' | 'UNASSIGNED' character-1 ::= (any character except meta-characters) char-class ::= '[' ranges ']' | '(?[' ranges ']' ([-+&] '[' ranges ']')? ')' ranges ::= '^'? (range ','?)+ range ::= '\d' | '\w' | '\s' | '\D' | '\W' | '\S' | category-block | range-char | range-char '-' range-char range-char ::= '\[' | '\]' | '\\' | '\' [,-efnrtv] | code-point | character-2 code-point ::= '\x' hex-char hex-char | '\x{' hex-char+ '}' | '\v' hex-char hex-char hex-char hex-char hex-char hex-char hex-char ::= [0-9a-fA-F] character-2 ::= (any character except \[]-,)
Inner Class Summary | |
(package private) static class |
RegularExpression.Context
|
Field Summary | |
(package private) static int |
CARRIAGE_RETURN
|
(package private) RegularExpression.Context |
context
|
(package private) static boolean |
DEBUG
|
(package private) static int |
EXTENDED_COMMENT
"x" |
(package private) RangeToken |
firstChar
|
(package private) String |
fixedString
|
(package private) boolean |
fixedStringOnly
|
(package private) int |
fixedStringOptions
|
(package private) BMPattern |
fixedStringTable
|
(package private) boolean |
hasBackReferences
|
(package private) static int |
IGNORE_CASE
"i" |
(package private) static int |
LINE_FEED
|
(package private) static int |
LINE_SEPARATOR
|
(package private) int |
minlength
|
(package private) static int |
MULTIPLE_LINES
"m" |
(package private) int |
nofparen
The number of parenthesis in the regular expression. |
(package private) int |
numberOfClosures
|
(package private) Op |
operations
|
(package private) int |
options
|
(package private) static int |
PARAGRAPH_SEPARATOR
|
(package private) static int |
PROHIBIT_FIXED_STRING_OPTIMIZATION
"F" |
(package private) static int |
PROHIBIT_HEAD_CHARACTER_OPTIMIZATION
"H" |
(package private) String |
regex
A regular expression. |
(package private) static int |
SINGLE_LINE
"s" |
(package private) static int |
SPECIAL_COMMA
",". |
(package private) Token |
tokentree
Internal representation of the regular expression. |
(package private) static int |
UNICODE_WORD_BOUNDARY
An option. |
(package private) static int |
USE_UNICODE_CATEGORY
This option redefines \d \D \w \W \s \S. |
(package private) static Token |
wordchar
|
(package private) static int |
XMLSCHEMA_MODE
"X". |
Constructor Summary | |
|
RegularExpression(String regex)
Creates a new RegularExpression instance. |
|
RegularExpression(String regex,
String options)
Creates a new RegularExpression instance with options. |
(package private) |
RegularExpression(String regex,
Token tok,
int parens,
boolean hasBackReferences,
int options)
|
Method Summary | |
boolean |
equals(Object obj)
Return true if patterns are the same and the options are equivalent. |
(package private) boolean |
equals(String pattern,
int options)
|
int |
getNumberOfGroups()
Return the number of regular expression groups. |
String |
getOptions()
Returns a option string. |
String |
getPattern()
|
int |
hashCode()
|
boolean |
matches(char[] target)
Checks whether the target text contains this pattern or not. |
boolean |
matches(char[] target,
int start,
int end)
Checks whether the target text contains this pattern in specified range or not. |
boolean |
matches(char[] target,
int start,
int end,
Match match)
Checks whether the target text contains this pattern in specified range or not. |
boolean |
matches(char[] target,
Match match)
Checks whether the target text contains this pattern or not. |
boolean |
matches(CharacterIterator target)
Checks whether the target text contains this pattern or not. |
boolean |
matches(CharacterIterator target,
Match match)
Checks whether the target text contains this pattern or not. |
boolean |
matches(String target)
Checks whether the target text contains this pattern or not. |
boolean |
matches(String target,
int start,
int end)
Checks whether the target text contains this pattern in specified range or not. |
boolean |
matches(String target,
int start,
int end,
Match match)
Checks whether the target text contains this pattern in specified range or not. |
boolean |
matches(String target,
Match match)
Checks whether the target text contains this pattern or not. |
(package private) void |
prepare()
Prepares for matching. |
void |
setPattern(String newPattern)
|
void |
setPattern(String newPattern,
String options)
|
String |
toString()
Represents this instence in String. |
Methods inherited from class java.lang.Object |
|
Field Detail |
static final boolean DEBUG
String regex
int options
int nofparen
Token tokentree
boolean hasBackReferences
transient int minlength
transient Op operations
transient int numberOfClosures
transient RegularExpression.Context context
transient RangeToken firstChar
transient String fixedString
transient int fixedStringOptions
transient BMPattern fixedStringTable
transient boolean fixedStringOnly
static final int IGNORE_CASE
static final int SINGLE_LINE
static final int MULTIPLE_LINES
static final int EXTENDED_COMMENT
static final int USE_UNICODE_CATEGORY
#RegularExpression(java.lang.String,int)
,
#setPattern(java.lang.String,int)
,
UNICODE_WORD_BOUNDARY
static final int UNICODE_WORD_BOUNDARY
By default, the engine considers a position between a word character (\w) and a non word character is a word boundary.
By this option, the engine checks word boundaries with the method of 'Unicode Regular Expression Guidelines' Revision 4.
#RegularExpression(java.lang.String,int)
,
#setPattern(java.lang.String,int)
static final int PROHIBIT_HEAD_CHARACTER_OPTIMIZATION
static final int PROHIBIT_FIXED_STRING_OPTIMIZATION
static final int XMLSCHEMA_MODE
static final int SPECIAL_COMMA
static transient Token wordchar
static final int LINE_FEED
static final int CARRIAGE_RETURN
static final int LINE_SEPARATOR
static final int PARAGRAPH_SEPARATOR
Constructor Detail |
public RegularExpression(String regex) throws ParseException
regex
- A regular expressionParseException
- regex is not conforming to the syntax.public RegularExpression(String regex, String options) throws ParseException
regex
- A regular expressionoptions
- A String consisted of "i" "m" "s" "u" "w" "," "X"ParseException
- regex is not conforming to the syntax.RegularExpression(String regex, Token tok, int parens, boolean hasBackReferences, int options)
Method Detail |
public boolean matches(char[] target)
public boolean matches(char[] target, int start, int end)
start
- Start offset of the range.end
- End offset +1 of the range.public boolean matches(char[] target, Match match)
match
- A Match instance for storing matching result.public boolean matches(char[] target, int start, int end, Match match)
start
- Start offset of the range.end
- End offset +1 of the range.match
- A Match instance for storing matching result.public boolean matches(String target)
public boolean matches(String target, int start, int end)
start
- Start offset of the range.end
- End offset +1 of the range.public boolean matches(String target, Match match)
match
- A Match instance for storing matching result.public boolean matches(String target, int start, int end, Match match)
start
- Start offset of the range.end
- End offset +1 of the range.match
- A Match instance for storing matching result.public boolean matches(CharacterIterator target)
public boolean matches(CharacterIterator target, Match match)
match
- A Match instance for storing matching result.void prepare()
public void setPattern(String newPattern) throws ParseException
public void setPattern(String newPattern, String options) throws ParseException
public String getPattern()
public String toString()
toString
in class Object
public String getOptions()
setPattern()
.RegularExpression(java.lang.String,java.lang.String)
,
setPattern(java.lang.String,java.lang.String)
public boolean equals(Object obj)
equals
in class Object
boolean equals(String pattern, int options)
public int hashCode()
hashCode
in class Object
public int getNumberOfGroups()
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: INNER | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |