Korean Documents Detected as Japanese

ID: Q248306


The information in this article applies to:
  • Microsoft Site Server version 3.0
  • Microsoft Index Server version 2.0


SYMPTOMS

Microsoft Site Server 3.0 Search incorrectly detects Korean documents as Japanese.


CAUSE

Microsoft has confirmed this to be a problem in the Microsoft products listed at the beginning of this article.


WORKAROUND

If it is possible to pre-process the documents, converting them to HTML, and then you can add the language and charset tags. Otherwise, the Site Server Search crawl (also known as Gatherer) server must be dedicated to crawling Korean documents to allow proper language handling of Korean language text documents. Text documents cannot be tagged. Therefore, using document tagging to identify the language of the document is not an option in this case.

The following configuration is required on Site Server Service Pack 2 or later:

Regional Settings

Set the region to Korean and select the Set as system default locale option. This installs the Korean character set and makes iso-8959-5 the default character set. Restart the computer to activate the system locale change.

Input Locales

Korean and Japanese need to both be listed. Korean should be the default input locale. The Japanese character set is needed to recognize some of the characters.

Internet Explorer Language Settings

In Internet Explorer, click Internet Options, click Languages, and then click the General tab. Make sure Korean is listed, because Site Server Search uses a part of Internet Explorer (WinInet) to crawl the documents.

With the above settings, all Korean and most Japanese text documents are recognized as Korean. English text documents, however, are correctly recognized as English.

Additional query words:

Keywords :
Version : winnt:2.0,3.0
Platform : winnt
Issue type : kbprb


Last Reviewed: December 29, 1999
© 2000 Microsoft Corporation. All rights reserved. Terms of Use.