Resolution of ISO 9660 Ambiguities for Wide Characters

This specification resolves ISO 9660 ambiguities with respect to wide (16-bit) character sets, such as the UCS-2 character set.

Wide Character Byte Ordering

All UCS-2 characters shall be recorded according to ISO 9660:1988 section 7.2.2, 16-bit numerical value, most significant byte first ("Big Endian").

Allowed Character Set

All UCS-2 code points shall be allowed except for the following UCS-2 code points:

All code points between (00)(00) and (00)(1F), inclusive. (Control Characters)

(00)(2A) '*'(Asterisk)

(00)(2F) '/' (Forward Slash)

(00)(3A) ':' (Colon)

(00)(3B) ';' (Semicolon)

(00)(3F) '?' (Question Mark)

(00)(5C) '\' (Backslash)

Special Directory Identifiers

Section 7.6 of ISO 9660 describes the recording of reserved directory identifiers for the root, current, and parent directory identifiers as single (00) or single (01) bytes.

In a wide character set, it is not possible to represent a character in a single byte. The following portions of the ISO 9660:1988 specification referring to reserved directory identifiers are ambiguous.

The ISO 9660:1988 sections in question are as follows:

These special case directory identifiers are not intended to represent characters in a graphic character set. These characters are placeholders, not characters. Therefore, these definitions remain unchanged on a volume recorded in Unicode.

Simply put, Special Directory Identifiers shall remain as 8-bit values, even on a UCS-2 volume, where other characters have been expanded to 16-bits.

Root Directory

The Directory Identifier of a Directory Record describing the Root Directory shall consist of a single (00) byte.

Current Directory

The Directory Identifier of the first Directory Record of each directory shall consist of a single (00) byte.

Parent Directory

The Directory Identifier of the second Directory Record of each directory shall consist of a single (01) byte.

Separator Characters

The separator characters SEPARATOR 1 and SEPARATOR 2 are specified as 8-bit characters, which can not be represented in a wide character set, so the ISO 9660:1988 specification sections referring to SEPARATOR 1 and SEPARATOR 2 are ambiguous. The ISO 9660:1988 sections in question are as follows:

The values SEPARATOR 1 and SEPARATOR 2 shall be represented differently depending on the d1 character set.

In the case of an SVD identifying a UCS-2 character set, the values of SEPARATOR 1 and SEPARATOR 2 shall be recorded as a UCS-2 character with an equivalent code point value.

Otherwise, the definitions of SEPARATOR 1 and SEPARATOR 2 shall be recorded according to section 7.4.3 of ISO 9660:1988.

Simply put, SEPARATOR 1 and SEPARATOR 2 shall be expanded to 16-bits.

Separator Representations

Separator

ISO 9660 1988 Volume Bit Combination

Unicode Volume UCS-2 Codepoint

SEPARATOR 1

(2E)

(00)(2E)

SEPARATOR 2

(3B)

(00)(3B)


Sort Ordering

ISO 9660 specifies the order of path table records within a path table, and specifies the order of directory records within a directory. These sorting algorithms assume an 8-bit character set is used. These sorting algorithms are ambiguous when used with wide characters.

The ISO 9660:1988 sections in question are as follows:

The only change required is to redefine the value of the sort justification pad byte to zero (00).

Simply put, comparing the byte contents in all positions remains a suitable sorting algorithm for the descriptor fields recorded in a UCS-2 SVD Directory Hierarchy. This is one of the primary reasons for selecting the Big Endian format to represent all UCS-2 characters.

Natural Language Sorting

On a Unicode volume, the 16-bit UCS-2 code points are used to determine the Order of Path Table Records and the Order of Directory Records.

No attempt will be made to provide natural language sorting on the media. Natural language sorting may optionally be provided by a display application as desired.

Justification Pad Bytes

The sort ordering algorithms as specified in ISO 9660:1988 sections 6.9.1 and 9.3 are acceptable except for the value of the justification "pad byte".

The value of the justification "pad byte" as specified in ISO 9660:1988 section 6.9.1 shall be (00). This is changed from a value of (20) as specified in that same section.

The value of the justification "pad byte" as specified in ISO 9660:1988 section 9.3 subsections (a) and (b) shall be (00). This is changed from a value of (20) as specified in those same sections.

The value of the justification "pad byte" as specified in ISO 9660:1988 section 9.3 subsections (c) shall be (00). This is changed from a value of (30) as specified in that same section.

Simply put, set all the justification "pad bytes" to zero to simplify sorting.

Mandatory Sort Ordering.

Correct sort ordering is mandatory on UCS-2 volumes.

Descriptor Fields affected by the UCS-2 Escape Sequence

If a UCS-2 escape sequence is detected in a supplementary volume descriptor, the following descriptor fields referenced from that supplementary volume descriptor shall contain UCS-2 characters.

ISO 9660:1988 Section 8.5.4 System Identifier

ISO 9660:1988 Section 8.5.5 Volume Identifier

ISO 9660:1988 Section 8.5.13 Volume Set Identifier

ISO 9660:1988 Section 8.5.14 Publisher Identifier

ISO 9660:1988 Section 8.5.15 Data Preparer Identifier

ISO 9660:1988 Section 8.5.16 Application Identifier

ISO 9660:1988 Section 8.5.17 Copyright File Identifier

ISO 9660:1988 Section 8.5.18 Abstract File Identifier (missing section)

ISO 9660:1988 Section 8.5.19 Bibliographic File Identifier

ISO 9660:1988 Section 9.1.11 File Identifier

ISO 9660:1988 Section 9.4.5 Directory Identifier

ISO 9660:1988 Section 9.5.11 System Identifier (of Extended Attribute Record)

Relaxation of ISO 9660 Restrictions on UCS-2 Volumes

Several ISO 9660 restrictions will be relaxed to achieve a more useful recording specification. Joliet receiving systems shall be capable of receiving media recorded with restrictions which have been relaxed relative to ISO 9660.

Maximum File Identifier Length Increased

Joliet receiving systems shall receive directory hierarchies recorded with file identifiers longer than those allowed by ISO 9660 receiving systems.

ISO 9660 (Section 7.5.1) states that the sum of the following shall not exceed 30:

On Joliet compliant media, however, the sum as calculated above shall not exceed 128, to allow for longer file identifiers.

The above lengths shall be expressed as a number of bytes.

Maximum Directory Identifier Length Increased

Joliet receiving systems shall receive directory hierarchies recorded with file names longer than those allowed by ISO 9660 receiving systems.

ISO 9660 (Section 7.6.3) states that the length of a directory identifier shall not exceed 31.

On Joliet compliant media, however, the length of a directory identifier shall not exceed 128, to allow for longer directory identifiers.

The above lengths shall be expressed as a number of bytes.

Directory Names May Have File Name Extensions

ISO 9660 does not allow directory identifiers to contain file name extensions.

On Joliet compliant media, however, directory identifiers may contain file name extensions.

The Joliet directory identifier format shall be calculated according to ISO 9660 section 7.5.1 "File Identifier format", with the exception that the length of a directory identifier may exceed 31, but shall not exceed 128.

In addition, the Joliet directory identifier format shall comply with ISO 9660 section 7.6.2 "Reserved Directory Identifiers".

The directory identifier length shall be calculated according to ISO 9660 section 7.5.2 "File Identifier length".

The above lengths shall be expressed as a number of bytes.

Maximum Directory Hierarchy Depth May Exceed 8 Levels

ISO 9660 (Section 6.8.2.1) specifies restrictions regarding the Depth of Directory Hierarchy. This section of ISO 9660 specifies that this number of levels in the hierarchy shall not exceed eight.

On Joliet compliant media, however, the number of levels in the hierarchy may exceed eight.

Joliet compliant media shall comply with the remainder of ISO 9660 section 6.8.2.1, so that for each file recorded, the sum of the following shall not exceed 240:

The above lengths shall be expressed as a number of bytes.