Many corporations provide e-commerce offerings targeted towards countries around the world. Sometimes the set of products and services offered in individual countries is different and can be served by unrelated sites. However, often the products and services are the same, and the sites differ only in the language used.
The solution is simple, right? Just take the original site, translate the text (into German, for example), and then make it available on a separate server. The problem with this approach is that Web sites are rarely just developed and frozen; they evolve over time, often so fast that by the time the translation team finishes with a particular snapshot, the production team might already have a newer version.
A better approach is to integrate globalization into the development process. This means that as developers work on code and formatting, translators should be able to work on translating user interface text to a foreign language. What makes this difficult is the fact that ASP pages contain a mixture of code (in VBScript or JScript®), HTML formatting tags, and text strings that are seen by the user.
In this article, I'll describe how globalization can be integrated into the Web site development process, using Microsoft tools enhanced with some custom code. The main benefit is that foreign-language Web sites will always be up to date with the development version.
A Globalization Solution
I recently finished a project where I developed a set of custom software tools to address the problem of Web site globalization. Here is an overview of how it works.
The language-dependent strings are separated from the rest of the page content by using custom design-time controls (DTCs) for Visual InterDev®. The Unicode text strings are stored in a SQL Server database and are retrieved and displayed on the page (in English). At this time, the pages reside on the server alongside Microsoft® Internet Information Services (IIS).
Translators use a special tool to browse through English strings and enter translations of these strings in other languages. Another tool (the site compiler) reads pages from the authoring site and creates foreign-language sites by recognizing and replacing English strings with their equivalents in the particular language. At this point, the language-specific sites are almost ready.
This solution preserves the availability and performance characteristics of the original Web site. There are no additional queries to the string database once the site is compiled. Not all strings have to be localized. Some strings are used as constants in the code and are hidden from the user. The Web developers have to replace strings with DTCs, so they get to decide which ones should stay the same for any language. This means translators don't have to know anything about these internal strings; they can simply translate all strings presented by the string-editing tool. Because the translators work with the string-editing tool, they also don't have to know anything about ASP or HTML. The Web developers can continue using Visual InterDev since the DTCs integrate with it.
Design-time Controls
If you have used Visual InterDev or Microsoft FrontPage®, you are probably already familiar with DTCs. They are special ActiveX® controls that can be inserted directly into a Web page in Visual InterDev or FrontPage during the design stage. They never actually get passed to the client. Instead, they insert some information on the page before it's sent from the server. For the rest of this article I will only mention Visual InterDev, but the information applies to FrontPage as well.
A DTC has properties that can be set by the user, and it generates a piece of text that is substituted for it at runtime. A DTC only exists while the Web page is being edited in Visual InterDev; the saved page contains only text generated by the DTC and some information that Visual InterDev uses to instantiate the control and set its properties later, when the page is edited again. This information is encoded inside HTML comments, so it's invisible to the Web server and the browser. The text generated by the DTC is called runtime text. Visual InterDev comes with a standard set of DTCs, and there is an SDK that allows you to create your own controls with Visual Basic® or Visual C++®.
I used two custom DTCs to separate strings from the rest of the ASP page content. The first custom control, IntlString, was used in place of any string literal that would otherwise appear on the page. It has a string ID property that is used to retrieve the string body from a database and return it to Visual InterDev as runtime text.
The second custom control, PageOptions, is used only once per page. Its function is to establish a page-specific range of string IDs, making it possible to browse through English strings and enter their translations.
Database Tables
As I mentioned before, the software uses a SQL Server 7.0 database to store strings and some additional information. The first table, Languages, contains the list of languages for site translation (see Figure 1). Each language row has information about Windows® and HTML code pages, as well as the human-readable language name and the suffix used to create the output directory for the site in this language. Because the language list is stored in the database, it can easily be extended or modified using standard SQL Server tools, such as Enterprise Manager.
Two tables are used to store the strings themselves. The first one, StringIds, contains all string IDs (see Figure 2), and the second one, Strings, contains one or more language-specific versions of each string with the same string ID (see Figure 3). The Strings table is linked to the Languages and StringIds tables using foreign key relationships on the LangId and StringId columns, respectively.
The final table, PageRanges, is used to break up the entire range of strings into page-specific ranges (see Figure 4). Each page is assigned a range of 1000 string IDs out of which the string editor assigns new values. The table stores the URL of each page and its starting string ID.
The PageOptions control, when placed on a page, obtains its URL from Visual InterDev and tries to look up the page's base string ID. If the page is not found, it inserts a new row into the table with the appropriate URL and base ID.
String Editing and Site Compilation
Any new or existing string that will need localization has to be entered into the database and replaced with an IntlString DTC in Visual InterDev. For this purpose, I developed a separate string editor application that acts as a user-friendly interface to the string database. The string editor allows the user to browse strings hierarchically, first by page name, then by string ID. The user sees all language versions of the same string ID at the same time; this makes it a good environment for the translators. The tool also allows the user to create new strings and delete existing ones.
A small but significant usability feature is that, when creating a new string, the editor automatically inserts the contents of the clipboard as the English version of the new string. This makes transferring strings from existing Web pages go a little faster.
The site compiler is a tool that reads Web pages from the directory on the authoring server and generates language-specific versions of the site in separate directories. To see how the site compiler does that, let's look at how Visual InterDev marks design-time controls on a page.
The code in Figure 5 shows a Button DTC, which is part of the standard Visual InterDev installation. A DTC starts with the startspan comment, followed by the <OBJECT> tag that specifies the control's class ID (CLSID). This is followed by the parameter map, which lists parameter name/value pairs. The code highlighted in blue is generated by the DTC itself, and is automatically inserted into the page by Visual InterDev. Finally, the endspan comment marks the end of the DTC.
Using a simple state-machine parser, the site compiler scans the contents of each page looking for the startspan sequence, then checks to see if the CLSID of the control is that of IntlString. If it finds a match, the code then looks for the string ID parameter and retrieves the language-specific version of the string from the database. Finally, all content up to the endspan comment (the English version of the string) is replaced with the retrieved language-specific string.
Of course, this is only done for HTML and ASP pagesother files are copied as is.
Creating a DTC in Visual C++
As mentioned earlier, a DTC is a type of ActiveX control. It is just a COM object that implements most ActiveX control interfaces, as well as some special ones that are used
to communicate with Visual InterDev. You can use either
Visual C++ or Visual Basic to write a DTC. Here, I will focus on using C++.
Creating a DTC requires using the Design-Time Control SDK, which can be obtained from MSDN Online at http://msdn.microsoft.com/vinterdev/downloads/download.asp?ID=016.
Create a new project, select the ATL COM AppWizard project type from the list, and then accept the defaults on the dialogs that follow. Next, using the ATL Object Wizard, insert an ActiveX control into the project by selecting Controls | Full Control.
Add the following lines to the main .cpp module of the project (after #include <initguid.h>), as well as to the .h file for the ActiveX control:
|