Amy Burns
Producer
Microsoft Corporation
October 16, 1997
The following article was originally published in the Site Builder Network Magazine.
They call it a World Wide Web. Which is why site builders and producers increasingly concern themselves with localization -- designing a Web site to be appropriate for a worldwide audience.
The process, as we discovered in localizing Microsoft's Internet Explorer product site, involves a good deal more than simply translating text into a variety of foreign languages.
Let's say we have a site on the Web, in English, that sells plastic objects. We want to make it easier for people all over the world to purchase our wide variety of plastic stuff. First, we'll decide which languages we want to translate the pages into, and then we'll look at our design. Finally, we'll want to examine our database backend and system.
It's important up front to consider how many, and which, language versions you want to provide. Asian languages require special text encoding, so if you want to localize in those languages -- now, or even in the future -- it's good to prepare from the beginning. Okay, we're aiming high. We're going to localize our hypothetical plastics site into 33 languages, Asian languages included!
We need to consider the content. We know our audience wants to get in and out quickly, with all the information they need. From our side, we need to keep in mind that our localizers (the people who do the translations) are expensive, and are paid by the word. We need to get to the point. Look at all the content, figure out the most important information, and be sure it's stated on the top-level page. Whenever possible, try not to drill down deeper than two levels. Also, consider the mood of your content; not all humor translates from language to language, or culture to culture, and some language that's merely informal to us could be considered rude in other parts of the world. Colloquialisms can be misinterpreted or completely misunderstood by your audience or the localizers. It's probably best just to stick to the facts.
Another consideration is the lag time in updating localized content. It takes much longer to update content when it has to go through the translation process before it can be posted. Be sure the content you post will not be out of date quickly.
Let's review the layout, design, and code. The goal is for the site to function beautifully on a 14.4 modem. We need to select the graphic images we use carefully, limit how many we use, and ensure that they have small file sizes. We also want to avoid images that contain words. It is more difficult to translate images, and a comparable word in another language might be too long for the size of the original image, which can mess up the image, the page layout, or both. Try not to lock in table widths; otherwise, longer words will break the layout, and background images used inside the table will stretch.
Be sure to use the correct META tags for your character sets. In our case, we want to use UTF-8, so our META tag will be <META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=UTF-8">. This will enable us to transmit double-byte character sets back and forth without corruption. I'll explain that further in a bit. Be sure your META tags are directly under the <HEAD> tag in your HTML; if not, many creative interpretations of the page could be generated.
Comment, comment, comment your code. Localizers are generally hired for their translation skills, not their ability to write code. The easier you make it for them to see where they are in the HTML ocean, the better. When you get your pages back from a localizer, you won't need to redo all your code. If your pages contain scripting that needs to be localized, be sure to hire a localizer who understands enough about scripting not to inadvertently alter your scripting efforts.
Use Cascading Style Sheets (CSS) whenever possible. CSS allows you to change fonts for all your pages in one place, and there will be fewer tags within the text for the localizers to sift through.
(Here at Microsoft, we have a set of localizing software in internal development for Microsoft site builders. One of its purposes is to lock the code from the localizers, and to highlight the text that needs to be translated. Such details as the ALT TEXT aren't as easily overlooked when using this tool.)
For our plastics Web site, we need to include a form for people to fill in their addresses and the number of plastic things they want to buy. We can set up the calculation of the cost by checking the language directory that the form is displaying and the amount of stuff selected, then, on the same page, dynamically generate the cost of the order in the designated currency. (This article will not cover how to actually get the money.)
Be sure the form has at least four lines that can accept numbers and text. Those lines should be fairly long, with loose parameters around the text fields. Addresses in some countries can be much longer than those in others. You also want to allow people their own interpretations of how to fill in the address fields. Regardless of the information users put in, you want to be able to pull it out uniformly, stick on a label, and send it out, even if it's in an unfamiliar form.
We'll want to set up separate directory files for each language to mirror the directory structure of our U.S. plastics site. For example, the U.S. image file
http://example.microsoft.com/ieou/images/whatever.gif might be
http://example.microsoft.com/ieou/intl_es/images/whatever.gif.
This makes file changes from language to language easier. A search-and-replace tool can change all the links easily. But take special care in doing such a global search and replace; read each occurrence of the change before you confirm it, ensuring that it modifies only the files you want to change.
Now, let's check out our backend!
Our site employs Microsoft SQL Server , Microsoft Internet Information Server (IIS), running on Windows NT, and Active Server Pages (ASP) technology.
SQL Server collects the data from your orders coming in via the order form on your site. IIS runs the entire operation on the Internet. ASP technology handles the information coming in from the form to the SQL Server database, retrieves the data from SQL Server, and displays the information in an HTML format you can see and manipulate.
To serve the correct set of localized pages, we can use a piece of Visual Basic Scripting Edition (VBScript) to sniff the browser for the language being used. From the results of the sniff, we load the correct language version of our plastics site. Because we are using Internet Explorer 4.0, Unicode is already enabled. Unicode allows single-byte characters to be represented as double-byte characters. That allows character sets coming in from all over the world to be transmitted without corruption.
However, we are also using SQL Server, which currently has some difficulties with Unicode. To work around those difficulties, we want to convert the Unicode to UTF-8. SQL Server can handle UTF-8 without corrupting the data. Fortunately, the conversion algorithm for Unicode to UTF-8 already exists in Internet Explorer 4.0. By using the META tag <META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=UTF-8">, we tell Internet Explorer 4.0 to translate the Unicode to UTF-8, and we are able to send and receive data without corruption problems.
Once all those orders start to pour in, we can use our in-house HTML page to view the SQL Server data, and check to be sure we have all the language packs installed. If we don't, we could have perfectly encoded information in SQL Server -- but its display will be corrupted.
Now you can set up your own localized Web site, decide how you want to take the information from your users, and how you want to send them your merchandise.
One option is to have some sort of program that will convert the fields from the HTML page into a product that will generate labels. Print up the labels and away you go! You'll be a world-renowned merchandiser, make an enormous number of dollars, or lira, or pounds, or yen, or drachma, or whatever -- and every day will be sunny, and you'll live happily ever after!
Web producer Amy Burns travelled the world to find her way to Microsoft, pausing along the way to teach English in Taiwan, and Web-page building to U.S. high-school students. She also served as a counselor for emotionally troubled teens, which is not that different from her present job.
Probably the most important aspect of localizing a Web site is understanding the target audience.
If you're designing your site in the United States, remember that much of the rest of the world has lower bandwidth. Connection time is slower, and the hardware is usually older.
It is also more expensive to surf the Web in other parts of the world. Your audience will want to get their information quickly, with fewer bells and whistles than we tend to tolerate in the United States.