Class HTMLTokenizer
public class HTMLTokenizer
{
// Fields
public Hashtable attrs;
public String tag;
public String text;
public static final int TT_BEGIN_TAG;
public static final int TT_COMMENT;
public static final int TT_END_TAG;
public static final int TT_TEXT;
public int type;
// Constructors
public HTMLTokenizer (InputStream isin);
// Methods
public boolean hasMoreTokens ();
public void mark (int readLimit) throws IOException;
public int nextToken () throws ParseException, IOException;
public void reset () throws IOException;
public String toString ();
}
This class parses an HTML version 3.2 document. The parser does not interpret any HTML tags, except for comments and the <PRE> tag.
public HTMLTokenizer (InputStream isin);
Creates an HTMLTokenizer object when passed to an input stream.
Parameter | Description |
isin
| The input stream to tokenize.
|
public boolean hasMoreTokens ();
Indicates if the HTMLTokenizer object contains more tokens.
Return Value:
Returns true if there are more tokens; otherwise, returns false.
public void mark (int readLimit) throws IOException;
Marks the parser's current position in the input stream.
Return Value:
No return value.
Parameter | Description |
readLimit
| The number of bytes that can be read before this mark is invalidated.
|
Exceptions:
IOException
if the tokenized input stream cannot set the requested mark.
See Also: java.lang.InputStream.mark
public int nextToken () throws ParseException, IOException;
Parses the next token from the input stream. The white space that follows the token and the first character of the next token is consumed.
Return Value:
Returns one the following token types:
Exceptions:
NoSuchElementException
if a null token is received.
ParseException
if no tag is found after a less than (<) symbol or a tag does not have a matching greater than (>) symbol.
public void reset () throws IOException;
Resets the input to the last marked position.
Return Value:
No return value.
Exceptions:
IOException
if the tokenized input stream cannot set the requested mark.
See Also: java.lang.InputStream.reset
public String toString ();
Retrieves a string representation of the HTMLTokenizer object.
Return Value:
Returns a string containing the tag types, tags, attributes, and text of the current token in the HTML file.
- attrs
- The attributes of a tag. They are valid for these token types: TT_BEGIN_TAG and TT_END_TAG.
- tag
- The tag.
- Comments:
- If this is the closing end of a tag, it will not have the leading slash (/) character. This tag is valid for these token types: TT_BEGIN_TAG and TT_END_TAG.
- text
- Plain text. They are valid for these token types: TT_TEXT and TT_COMMENT.
- TT_BEGIN_TAG
- A token type representing a beginning tag (for example, <H1>).
- TT_COMMENT
- A token type representing a comment.
- TT_END_TAG
- A token type representing an ending tag (for example, </H1>).
- TT_TEXT
- A token type representing the token text.
- type
- The last token type read. It can be one of the following: