3.1 Unicode

Java programs are written using the Unicode character set, version 2.0. Information about this encoding may be found at:

http://www.unicode.org/ and ftp://unicode.org/

Versions of Java prior to 1.1 used Unicode version 1.1.5 (see The Unicode Standard: Worldwide Character Encoding (§1.2) and updates). See §20.5 for a discussion of the differences between Unicode version 1.1.5 and Unicode version 2.0.

Except for comments (§3.7), identifiers, and the contents of character and string literals (§3.10.4, §3.10.5), all input elements (§3.5) in a Java program are formed only from ASCII characters (or Unicode escapes (§3.3) which result in ASCII characters). ASCII (ANSI X3.4) is the American Standard Code for Information Interchange. The first 128 characters of the Unicode character encoding are the ASCII characters.