This article may contain URLs that were valid when originally published, but now link to sites or pages that no longer exist. To maintain the flow of the article, we've left these URLs in the text, but disabled the links.


MIND

This article assumes you're familiar with Visual InterDev, Visual C++, VBScript
Multilingual Web Site Development
Kostya Vasilyev

Internationalizing your Web site is not as simple as translating English text into your target language. There are also design requirements to consider. Find out how to easily separate Web site translation and design.

Many corporations provide e-commerce offerings targeted towards countries around the world. Sometimes the set of products and services offered in individual countries is different and can be served by unrelated sites. However, often the products and services are the same, and the sites differ only in the language used.
      The solution is simple, right? Just take the original site, translate the text (into German, for example), and then make it available on a separate server. The problem with this approach is that Web sites are rarely just developed and frozen; they evolve over time, often so fast that by the time the translation team finishes with a particular snapshot, the production team might already have a newer version.
      A better approach is to integrate globalization into the development process. This means that as developers work on code and formatting, translators should be able to work on translating user interface text to a foreign language. What makes this difficult is the fact that ASP pages contain a mixture of code (in VBScript or JScript®), HTML formatting tags, and text strings that are seen by the user.
      In this article, I'll describe how globalization can be integrated into the Web site development process, using Microsoft tools enhanced with some custom code. The main benefit is that foreign-language Web sites will always be up to date with the development version.

A Globalization Solution
      I recently finished a project where I developed a set of custom software tools to address the problem of Web site globalization. Here is an overview of how it works.
      The language-dependent strings are separated from the rest of the page content by using custom design-time controls (DTCs) for Visual InterDev®. The Unicode text strings are stored in a SQL Server database and are retrieved and displayed on the page (in English). At this time, the pages reside on the server alongside Microsoft® Internet Information Services (IIS).
      Translators use a special tool to browse through English strings and enter translations of these strings in other languages. Another tool (the site compiler) reads pages from the authoring site and creates foreign-language sites by recognizing and replacing English strings with their equivalents in the particular language. At this point, the language-specific sites are almost ready.
      This solution preserves the availability and performance characteristics of the original Web site. There are no additional queries to the string database once the site is compiled. Not all strings have to be localized. Some strings are used as constants in the code and are hidden from the user. The Web developers have to replace strings with DTCs, so they get to decide which ones should stay the same for any language. This means translators don't have to know anything about these internal strings; they can simply translate all strings presented by the string-editing tool. Because the translators work with the string-editing tool, they also don't have to know anything about ASP or HTML. The Web developers can continue using Visual InterDev since the DTCs integrate with it.

Design-time Controls
      If you have used Visual InterDev or Microsoft FrontPage®, you are probably already familiar with DTCs. They are special ActiveX® controls that can be inserted directly into a Web page in Visual InterDev or FrontPage during the design stage. They never actually get passed to the client. Instead, they insert some information on the page before it's sent from the server. For the rest of this article I will only mention Visual InterDev, but the information applies to FrontPage as well.
      A DTC has properties that can be set by the user, and it generates a piece of text that is substituted for it at runtime. A DTC only exists while the Web page is being edited in Visual InterDev; the saved page contains only text generated by the DTC and some information that Visual InterDev uses to instantiate the control and set its properties later, when the page is edited again. This information is encoded inside HTML comments, so it's invisible to the Web server and the browser. The text generated by the DTC is called runtime text. Visual InterDev comes with a standard set of DTCs, and there is an SDK that allows you to create your own controls with Visual Basic® or Visual C++®.
      I used two custom DTCs to separate strings from the rest of the ASP page content. The first custom control, IntlString, was used in place of any string literal that would otherwise appear on the page. It has a string ID property that is used to retrieve the string body from a database and return it to Visual InterDev as runtime text.
      The second custom control, PageOptions, is used only once per page. Its function is to establish a page-specific range of string IDs, making it possible to browse through English strings and enter their translations.

Database Tables
      As I mentioned before, the software uses a SQL Server 7.0 database to store strings and some additional information. The first table, Languages, contains the list of languages for site translation (see Figure 1). Each language row has information about Windows® and HTML code pages, as well as the human-readable language name and the suffix used to create the output directory for the site in this language. Because the language list is stored in the database, it can easily be extended or modified using standard SQL Server tools, such as Enterprise Manager.
      Two tables are used to store the strings themselves. The first one, StringIds, contains all string IDs (see Figure 2), and the second one, Strings, contains one or more language-specific versions of each string with the same string ID (see Figure 3). The Strings table is linked to the Languages and StringIds tables using foreign key relationships on the LangId and StringId columns, respectively.
      The final table, PageRanges, is used to break up the entire range of strings into page-specific ranges (see Figure 4). Each page is assigned a range of 1000 string IDs out of which the string editor assigns new values. The table stores the URL of each page and its starting string ID.
      The PageOptions control, when placed on a page, obtains its URL from Visual InterDev and tries to look up the page's base string ID. If the page is not found, it inserts a new row into the table with the appropriate URL and base ID.

String Editing and Site Compilation
      Any new or existing string that will need localization has to be entered into the database and replaced with an IntlString DTC in Visual InterDev. For this purpose, I developed a separate string editor application that acts as a user-friendly interface to the string database. The string editor allows the user to browse strings hierarchically, first by page name, then by string ID. The user sees all language versions of the same string ID at the same time; this makes it a good environment for the translators. The tool also allows the user to create new strings and delete existing ones.
      A small but significant usability feature is that, when creating a new string, the editor automatically inserts the contents of the clipboard as the English version of the new string. This makes transferring strings from existing Web pages go a little faster.
      The site compiler is a tool that reads Web pages from the directory on the authoring server and generates language-specific versions of the site in separate directories. To see how the site compiler does that, let's look at how Visual InterDev marks design-time controls on a page.
      The code in Figure 5 shows a Button DTC, which is part of the standard Visual InterDev installation. A DTC starts with the startspan comment, followed by the <OBJECT> tag that specifies the control's class ID (CLSID). This is followed by the parameter map, which lists parameter name/value pairs. The code highlighted in blue is generated by the DTC itself, and is automatically inserted into the page by Visual InterDev. Finally, the endspan comment marks the end of the DTC.
      Using a simple state-machine parser, the site compiler scans the contents of each page looking for the startspan sequence, then checks to see if the CLSID of the control is that of IntlString. If it finds a match, the code then looks for the string ID parameter and retrieves the language-specific version of the string from the database. Finally, all content up to the endspan comment (the English version of the string) is replaced with the retrieved language-specific string.
      Of course, this is only done for HTML and ASP pages—other files are copied as is.

Creating a DTC in Visual C++
      As mentioned earlier, a DTC is a type of ActiveX control. It is just a COM object that implements most ActiveX control interfaces, as well as some special ones that are used to communicate with Visual InterDev. You can use either Visual C++ or Visual Basic to write a DTC. Here, I will focus on using C++.
      Creating a DTC requires using the Design-Time Control SDK, which can be obtained from MSDN Online at http://msdn.microsoft.com/vinterdev/downloads/download.asp?ID=016.
      Create a new project, select the ATL COM AppWizard project type from the list, and then accept the defaults on the dialogs that follow. Next, using the ATL Object Wizard, insert an ActiveX control into the project by selecting Controls | Full Control.
      Add the following lines to the main .cpp module of the project (after #include <initguid.h>), as well as to the .h file for the ActiveX control:


 #include <designer.h>
 #include <dtc60.h>
Also, add "C:\Program Files\dtcsdk60\include" to the project's include path.
      The IActiveDesigner interface is what lets an ActiveX control function as a DTC. Visual InterDev and FrontPage require that DTCs implement this interface before hosting them. To include this interface, add

 public IActiveDesigner
to the base class list. Insert the following line into the COM interface map (denoted by BEGIN_COM_MAP and END_ COM_MAP):

 COM_INTERFACE_ENTRY_IID(IID_IActiveDesigner, IActiveDesigner)
Next you need to add the following few lines of code to the control's declaration in the .h file:

 STDMETHOD(GetRuntimeClassID)(CLSID *pclsid);
 STDMETHOD(GetRuntimeMiscStatusFlags)(DWORD *dwMiscFlags);
 STDMETHOD(QueryPersistenceInterface)(REFIID riid);
 STDMETHOD(SaveRuntimeState)(REFIID riidItf, REFIID riidObj, void *pObj);
 STDMETHOD(GetExtensibilityObject)(IDispatch **ppvObjOut);
Add this code to the control's implementation file:

 STDMETHODIMP CTestCtl1::GetExtensibilityObject(IDispatch ** ppDisp)
 {
   return QueryInterface(IID_IDispatch, (void **) ppDisp);
 }
 
 STDMETHODIMP CTestCtl1::GetRuntimeClassID(CLSID *pclsid)
 {
   return E_NOTIMPL;
 }
 
 STDMETHODIMP CTestCtl1::GetRuntimeMiscStatusFlags(DWORD * pdw)
 {
   if (pdw == NULL)
     return E_INVALIDARG;
   *pdw = 0;
   return S_OK;
 }
      The first method simply returns the IDispatch pointer that is used by Visual InterDev. The second method specifies that the control does not have any representation at runtime—which makes sense because a Web DTC's only runtime representation is the ASP code that is generates.
      The code in Figure 6 allows the DTC to generate runtime text. A DTC has to override the two methods listed here in order to provide runtime text. The first method negotiates the particular interface (IPersistTextStream) that the host (Visual InterDev) has to provide to the second method. The second method actually saves the runtime text to the stream object provided by the host. The text has to be written to the stream object as a Unicode (OLECHAR) string, without any terminator. The host transfers the entire contents of the stream to the ASP page.
      A DTC saves its state as a set of named properties and their values. The control host (Visual InterDev) works with the DTC to save these values in the ASP page as special tags inside ASP comments (as shown in Figure 5).
      The default ActiveX control (as generated by Visual InterDev) implements the IPersistStreamInit interface, which saves the control's state as a binary blob. This does not work for a DTC because Visual InterDev needs to be able to know the names and values of individual properties to encode them as <PARAM> tags. This is accomplished by using the IPersistPropertyBag interface. ATL has a template that makes supporting this interface very easy.
      From the control's class declaration, delete this line

 public IPersistStreamInitImpl< CTestCtl1 >
and this one:

 COM_INTERFACE_ENTRY(IPersistStreamInit)
Replace them with this line

 public IPersistPropertyBagImpl< CTestCtl1 >
in the control's base class list, and this line

 COM_INTERFACE_ENTRY(IPersistPropertyBag)
in the control's COM interface map.
      The ATL template uses the ATL property map to obtain the names and values of properties. Add desired properties to the map, using their dispid's from the IDL file:

 PROP_ENTRY("StringID", 1, CLSID_NULL)

Finishing the DTC
      The control's size can be set in the constructor, and is expressed in units of 0.01 millimeter.


 CTestCtl1 :: CTestCtl1 ()
 {
   m_sizeExtent.cx = 4000;
   m_sizeExtent.cy = 1500;
   m_sizeNatural = m_sizeExtent;
 }
      For Visual InterDev to find a DTC, it needs to be advertised in the registry as such. To do this, merge the following code into the .rgs file generated by ATL Object Wizard:

 ForceRemove 'Programmable'
 ForceRemove 'Implemented Categories'
 {
   '{73CEF3DD-AE85-11CF-A406-00AA00C00940}'
 }
 InprocServer32 = s '%MODULE%'
      The next step is to build the project and launch it. You will need to enter the path of the main Visual InterDev executable for debugging. On my system, that path is:

 C:\Program Files\Microsoft Visual Studio\Common\IDE\IDE98\Devenv.exe
      To be able to insert instances of your DTC into Web pages, it is necessary to make the control show up in the Visual InterDev toolbox. Right-click on one of the toolboxes (they are usually stacked up on the left side of Visual InterDev window) and select Customize Toolbox. You should see the dialog shown in Figure 7. Check the box next to your control's name. If your control does not show up in this list, check the .rgs file and make sure you merged the Implemented Categories text correctly.
Figure 7: Selecting a Control
      Figure 7: Selecting a Control

      At this point, the control should show up in the toolbox. To insert an instance into a Web page, simply drag it and drop it where you want it. You can reveal the text generated by the control by right-clicking on it and selecting Show Runtime Text from the context menu. Figure 8 shows a DTC within a page, along with its text.
Figure 8: A DTC and its Text
      Figure 8: A DTC and its Text

Intercontrol Communication
      You might remember that my solution used two kinds of controls, which communicate with each other and with Visual InterDev.
      DTCs and Visual InterDev communicate by means of "choices." A choice is identified by its name (called Type) and contains two values, one user-friendly (called Description) and one for consumption by other pieces of code (called Text). A choice may also contain one or more additional data items (called Tags), each of which is identified by a string name and has a VARIANT value. A DTC can define a choice, change its value, or subscribe to a choice published by another control or Visual InterDev itself and be notified when its values change.
      DTCs don't communicate directly with one another. They communicate with a special piece of code in Visual InterDev called the Choices Engine. The Choices Engine manages information about available choices and their subscribers (if they have any), and takes care of notifying subscribers of choice value changes.
      A control that wants to publish or subscribe to choices has to implement the IDesignTimeControl interface. To declare this interface, you need to add the following line to the control's base class list:

 public IDesignTimeControl
Also, add this line to the COM interface map:

 COM_INTERFACE_ENTRY(IDesignTimeControl)
Next, you add IDesignTimeControl method declarations to the class:

 STDMETHOD(get_DesignTimeControlSite)(IDesignTimeControlSite**
     DesignTimeControlSite);
 STDMETHOD(putref_DesignTimeControlSite)(IDesignTimeControlSite*
     DesignTimeControlSite);
 STDMETHOD(get_DesignTimeControlSet)(BSTR * DesignTimeControlSet);
 STDMETHOD(OnGetChoices)(Choices * Choices);
 STDMETHOD(OnRebind)(Choices * Choices);
 STDMETHOD(OnChoiceConflict)(Choice * Choice, VARIANT_BOOL Conflict);
 STDMETHOD(OnChoiceChange)(ChoiceSink * ChoiceSink, 
     dtcChoiceChange Change);
 STDMETHOD(OnHostingChange)(dtcHostingChange change, 
     VARIANT_BOOL * Cancel);
      The primary interface exposed by the Choices Engine is IDesignTimeControlSite. This is the callback interface that is used by the DTC. Visual InterDev assigns each DTC its own site object, which is stored by the DTC as the DesignTimeControlSite property.
      The DTC needs a member variable to store the value of its site. You can declare it like this in the control's declaration:

 private: 
 CComPtr<IDesignTimeControlSite>    m_spSite;
Next you need to implement the methods that you declared earlier to set and get this property:

 HRESULT CTestCtl1::get_DesignTimeControlSite(
     IDesignTimeControlSite**ppSite)
 {
 (*ppSite) = m_spSite.p;
 (*ppSite)->AddRef();
 return S_OK;
 }
 
 HRESULT CTestCtl1::putref_DesignTimeControlSite(
     IDesignTimeControlSite*pSite)
 {
 m_spSite = pSite;
 return S_OK;
 }
      You need to implement the remaining methods of IDesignTimeControl to get the basic implementation (see Figure 9).
      In the globalization program, the PageOptions control obtains the URL of its page by subscribing to a standard choice published by Visual InterDev. I'll explain how you modify the boilerplate methods shown earlier to subscribe to a choice.
      First, add a ChoiceSink and a path variable declaration to the control's class definition:

 private: 
 CComPtr<IDesignTimeControlSite>  m_spSite;
 CComPtr<ChoiceSink>              m_spChoiceSink;
 CComBSTR                         m_strPath;
      Next, modify the putref_DesignTimeControlSite method as shown in Figure 10. This code registers the DTC's interest in the choice named TargetEnvironment, which is one of several choices published by Visual InterDev.
      The Visual InterDev Choices Engine notifies the control of any changes to a subscribed choice by calling the control's OnChoiceChange method (see Figure 11). This code first obtains the Tags collection associated with the subscribed choice, then uses the get_Item method to get the value of the tag named path, which contains the full path name of the host page.

Dealing with Code Pages
      To a program, a string is simply a sequence of bytes. A character set (also called a code page) is the particular way that these bytes are mapped into human-readable glyphs (or graphical representations). The default Windows character set, ISO-8859-1 (also known as code page number 1252), has standard ASCII characters in the first 128 byte values, and characters suitable for Western European languages in the second 128 values. This is sufficient to represent text in many languages, such as German and French.
      Other languages (such as my native Russian) use characters not present in this character set, and require use of a different character set that contains those characters. Code page 1251 in Windows contains Cyrillic characters in the upper half of its value range.
      In short, when dealing with strings encoded with one byte per character, it's not enough to have the binary representation of the string to correctly render it—it's also necessary to know which character set to use.
      There is an encoding type, however, that avoids this problem. Unicode encoding uses two bytes per character, which is sufficient to represent most characters in most languages. SQL Server supports Unicode as a native type (nvarchar and nchar data types), as does OLE DB. This means that most of your code could carry strings as Unicode (BSTR or TCHAR strings in C++).
      On the browser side, Microsoft Internet Explorer 5.0 supports two options for dealing with international text. One is to use UTF-8, a special variation of Unicode that tries to use one byte per character as much as possible. The other is to use specific code pages, a more economical solution if the entire page can be expressed in one character set. I decided to go with the second option.
      In both cases, there is a way to include character set information in HTML code by using the <META> tag. Internet Explorer uses the tag's contents to automatically switch to the correct character set. The format is as follows:


 <meta http-equiv="Content-Type"
 content="text/html; charset=windows-1251">
      One final detail was to set the correct code page in ASP and IIS. The ASP VBScript engine uses Unicode strings internally, but converts them to a particular code page for output to a page (via Response.Write). In IIS, each ASP session has a code page associated with it, which can be set by assigning the Session.CodePage variable.

VBScript Code Strings
      Another problem I needed to solve had to do with formatting messages with programmatic content. Consider what happens if you want to create a message containing information about the user's latest logon. Originally, it was done like this:


   strMsg = "Welcome " & strUserName & ", your last logged on on " & _
   strLastLogonDate & ", which was " & strElapsedDays & " days ago."
The problem is that in other languages, the word order may be different and this code will break, even if the pieces of text are translated properly.
      The solution was to write a special formatting routine that took a format string and several replacement values and used them to put together the message—sort of like printf, but instead of inserting values in the order in which they appear in the argument list, placeholders in the format string determine the order.

 strFormat = "Welcome %1, you last logged on on %2, which was %3 days ago"
 strMsg = FormatMessage3 (strFormat, strUserName, strLastLogonDate, _   
     strElapsedDays)
      The format string can be translated as a complete sentence. If a particular language's grammar rules require that, for example, the login date has to come before the user name, the %2 in the format string can be moved in front of %1. At the same time, the code (the call to FormatMessage3) would stay the same, once again showing how flexible the process is.

Further Web Site Customization
      The site compiler handled most of the process of producing localized Web sites, and a few remaining details were handled by abstracting them into small separate files that were maintained in several language-specific versions.
      For example, I described earlier how each site needs to have ASP code to set the code page, language ID, and locale ID. The best place to do this is in Global.asa, but it already contained a good deal of other code. So I set up a separate file with just the constants for each language and included it in Global.asa:


 <!--#include file = "incLangDefs.asa" -->
 
 <SCRIPT LANGUAGE=ScriptLanguage RUNAT=Server> 
 Sub Session_OnStart
     Session.LCID = kLangDefs_LCID
     Session.CodePage = kLangDefs_CodePage
     Session("LangId") = kLangDefs_LangId
 End Sub
 </SCRIPT>
      Then I made one version of incLangDefs.asa for each language, leaving me free to modify common code in Global.asa:

 <SCRIPT LANGUAGE=ScriptLanguage RUNAT=Server> 
     Const kLangDefs_LCID = 1033 ' US English
     Const kLangDefs_CodePage = 1252 ' US / Western European
     Const kLangDefs_LangId = 1
 </SCRIPT>

Conclusion
      This article described an approach and toolset that make it possible to develop Web sites in multiple languages simultaneously. The solution used Visual InterDev and custom DTCs written in C++. With this example in hand, you can easily implement a similar solution.

MSDN
Going Global: Not for the Halfhearted at: http://msdn.microsoft.com/workshop/ management/intl/internatln.asp

From the January 2000 issue of Microsoft Internet Developer.