A StreamTokenizer
takes an input stream and parses it into "tokens," allowing
the tokens to be read one at a time. The parsing process is controlled by a table
and a number of flags that can be set to various states, allowing recognition of
identifiers, numbers, quoted strings, and comments in a standard style.
public classStreamTokenizer
{ public static final intTT_EOF
= -1; public static final intTT_EOL
= '\n'; public static final intTT_NUMBER
= -2; public static final intTT_WORD
= -3; public intttype
; public Stringsval
; public doublenval
; publicStreamTokenizer
(InputStream in); public voidresetSyntax
(); public voidwordChars
(int low, int hi); public voidwhitespaceChars
(int low, int hi); public voidordinaryChars
(int low, int hi); public voidordinaryChar
(int ch); public voidcommentChar
(int ch); public voidquoteChar
(int ch); public voidparseNumbers
(); public voideolIsSignificant
(boolean flag); public voidslashStarComments
(boolean flag); public voidslashSlashComments
(boolean flag); public voidlowerCaseMode
(boolean flag); public intnextToken
() throws IOException; public voidpushBack
(); public intlineno
(); public StringtoString
(); }
Each byte read from the input stream is regarded as a character in the range '\u0000'
through '\u00FF'
. The character value is used to look up five possible attributes of the character: whitespace, alphabetic, numeric, string quote, and comment character (a character may have more than one of these attributes, or none at all). In addition, there are three flags controlling whether line terminators are to be recognized as tokens, whether Java-style end-of-line comments that start with //
should be recognized and skipped, and whether Java-style "traditional" comments delimited by /*
and */
should be recognized and skipped. One more flag controls whether all the characters of identifiers are converted to lowercase.
Here is a simple example of the use of a StreamTokenizer
. The following code merely reads all the tokens in the standard input stream and prints an identification of each one. Changes in the line number are also noted.
import java.io.StreamTokenizer; import java.io.IOException;
class Tok { public static void main(String[] args) { StreamTokenizer st = new StreamTokenizer(System.in); st.ordinaryChar('/'); int lineNum = -1; try { for (int tokenType = st.nextToken(); tokenType != StreamTokenizer.TT_EOF; tokenType = st.nextToken()) { int newLineNum = st.lineno(); if (newLineNum != lineNum) { System.out.println("[line " + newLineNum + "]"); lineNum = newLineNum; } switch(tokenType) { case StreamTokenizer.TT_NUMBER: System.out.println("the number " + st.nval); break; case StreamTokenizer.TT_WORD: System.out.println("identifier " + st.sval); break; default: System.out.println(" operator " + (char)tokenType); } } } catch (IOException e) { System.out.println("I/O failure"); } } }
If the input stream contains this data:
10 LET A = 4.5 20 LET B = A*A 30 PRINT A, B
[line 1] the number 10.0 identifier LET identifier A operator = the number 4.5 [line 2] the number 20.0 identifier LET identifier B operator = identifier A operator * identifier A [line 3] the number 30.0 identifier PRINT identifier A operator , identifier B
22.14.1 public static final int TT_EOF = -1;
A constant that indicates end of file was reached.
22.14.2 public static final int TT_EOL = '\n';
A constant that indicates that a line terminator was recognized.
22.14.3 public static final int TT_NUMBER = -2;
A constant that indicates that a number was recognized.
22.14.4 public static final int TT_WORD = -3;
A constant that indicates that a word (identifier) was recognized.
22.14.5 public int ttype;
The type of the token that was last recognized by this StreamTokenizer
. This
will be TT_EOF
, TT_EOL
, TT_NUMBER
, TT_WORD
, or a nonnegative byte value that
was the first byte of the token (for example, if the token is a string token, then
ttype
has the quote character that started the string).
22.14.6 public String sval;
If the value of ttype
is TT_WORD
or a string quote character, then the value of
sval
is a String
that contains the characters of the identifier or of the string
(without the delimiting string quotes). For all other types of tokens recognized,
the value of sval
is null
.
22.14.7 public double nval;
If the value of ttype
is TT_NUMBER
, then the value of nval
is the numerical value
of the number.
22.14.8 public StreamTokenizer(InputStream in)
This constructor initializes a newly created StreamTokenizer
by saving its argument, the input stream in
, for later use. The StreamTokenizer
is also initialized
to the following default state:
'A'
through 'Z'
, 'a'
through 'z'
, and 0xA0
through 0xFF
are considered to be alphabetic.
0x00
through 0x20
are considered to be whitespace.
'/'
is a comment character.
'\''
and double quote '"'
are string quote characters.
//
comments and /*
comments are not recognized.
22.14.9 public void resetSyntax()
The syntax table for this StreamTokenizer
is reset so that every byte value is
"ordinary"; thus, no character is recognized as being a whitespace, alphabetic,
numeric, string quote, or comment character. Calling this method is therefore
equivalent to:
ordinaryChars(0x00, 0xff)
The three flags controlling recognition of line terminators, //
comments, and /*
comments are unaffected.
22.14.10 public void wordChars(int low, int hi)
The syntax table for this StreamTokenizer
is modified so that every character in
the range low
through hi
has the "alphabetic" attribute.
22.14.11 public void whitespaceChars(int low, int hi)
The syntax table for this StreamTokenizer
is modified so that every character in
the range low
through hi
has the "whitespace" attribute.
22.14.12 public void ordinaryChars(int low, int hi)
The syntax table for this StreamTokenizer
is modified so that every character in
the range low
through hi
has no attributes.
22.14.13 public void ordinaryChar(int ch)
The syntax table for this StreamTokenizer
is modified so that the character ch
has no attributes.
22.14.14 public void commentChar(int ch)
The syntax table for this StreamTokenizer
is modified so that the character ch
has the "comment character" attribute.
22.14.15 public void quoteChar(int ch)
The syntax table for this StreamTokenizer
is modified so that the character ch
has the "string quote" attribute.
22.14.16 public void parseNumbers()
The syntax table for this StreamTokenizer
is modified so that each of the twelve
characters
0 1 2 3 4 5 6 7 8 9 . -
22.14.17 public void eolIsSignificant(boolean flag)
This StreamTokenizer
henceforth recognizes line terminators as tokens if and
only if the flag
argument is true
.
22.14.18 public void slashStarComments(boolean flag)
This StreamTokenizer
henceforth recognizes and skips Java-style "traditional"
comments, which are delimited by /*
and */
and do not nest, if and only if the
flag
argument is true
.
22.14.19 public void slashSlashComments(boolean flag)
This StreamTokenizer
henceforth recognizes and skips Java-style end-of-line
comments that start with //
if and only if the flag
argument is true
.
22.14.20 public void lowerCaseMode(boolean flag)
This StreamTokenizer
henceforth converts all the characters in identifiers to
lowercase if and only if the flag
argument is true
.
22.14.21 public int nextToken() throws IOException
If the previous token was pushed back (§22.14.22), then the value of ttype
is
returned, effectively causing that same token to be reread.
Otherwise, this method parses the next token in the contained input stream. The type of the token is returned; this same value is also made available in the ttype
field, and related data may be made available in the sval
and nval
fields.
First, whitespace characters are skipped, except that if a line terminator is encountered and this StreamTokenizer
is currently recognizing line terminators, then the type of the token is TT_EOL
.
If a numeric character is encountered, then an attempt is made to recognize a number. If the first character is '-'
and the next character is not numeric, then the '-'
is considered to be an ordinary character and is recognized as a token in its own right. Otherwise, a number is parsed, stopping before the next occurrence of '-'
, the second occurrence of '.'
, the first nonnumeric character encountered, or end of file, whichever comes first. The type of the token is TT_NUMBER
and its value is made available in the field nval
.
If an alphabetic character is encountered, then an identifier is recognized, consisting of that character and all following characters up to, but not including, the first character that is neither alphabetic nor numeric, or up to end of file, whichever comes first. The characters of the identifier may be converted to lowercase if this StreamTokenizer
is in lowercase mode.
If a comment character is encountered, then all subsequent characters are skipped and ignored, up to but not including the next line terminator or end of file. Then another attempt is made to recognize a token. If this StreamTokenizer
is currently recognizing line terminators, then a line terminator that ends a comment will be recognized as a token in the same manner as any other line terminator in the contained input stream.
If a string quote character is encountered, then a string is recognized, consisting of all characters after (but not including) the string quote character, up to (but not including) the next occurrence of that same string quote character, or a line terminator, or end of file. The usual escape sequences (§3.10.6) such as \n
and \t
are recognized and converted to single characters as the string is parsed.
If //
is encountered and this StreamTokenizer
is currently recognizing //
comments, then all subsequent characters are skipped and ignored, up to but not including the next line terminator or end of file. Then another attempt is made to recognize a token. (If this StreamTokenizer
is currently recognizing line terminators, then a line terminator that ends a comment will be recognized as a token in the same manner as any other line terminator in the contained input stream.)
If /*
is encountered and this StreamTokenizer
is currently recognizing /*
comments, then all subsequent characters are skipped and ignored, up to and including the next occurrence of */
or end of file. Then another attempt is made to recognize a token.
If none of the cases listed above applies, then the only other possibility is that the first non-whitespace character encountered is an ordinary character. That character is considered to be a token and is stored in the ttype
field and returned.
22.14.22 public void pushBack()
Calling this method "pushes back" the current token; that is, it causes the next call
to nextToken
to return the same token that it just provided. Note that this method
does not restore the line number to its previous value, so if the method lineno
is
called after a call to pushBack
but before the next call to nextToken
, an incorrect
line number may be returned.
22.14.23 public int lineno()
The number of the line on which the current token appeared is returned. The first
token in the input stream, if not a line terminator, is considered to appear on line 1
.
A line terminator token is considered to appear on the line that it precedes, not on
the line it terminates; thus, the first line terminator in the input stream is considered to be on line 2
.
22.14.24 public String toString()
The current token and the current line number are converted to a string of the form:
"Token[x], line m"
where m is the current line number in decimal form and x depends on the type of the current token:
TT_EOF
, then x is "EOF
".
TT_EOL
, then x is "EOL
".
TT_WORD
, then x is the current value of sval
(§22.14.6).
TT_NUMBER
, then x is "n=
" followed by the result of converting the current value of nval
(§22.14.7) to a string (§20.10.15).
Overrides the toString
method of Object
(§20.1.2).