How to make a Multilingual Web server

Using Netscape Navigator and the Apache server, you can now make a multilingual Website.

Browser Configuration

The key to language selection is content negotiation.
In Netscape Navigator 3.0 (Windows and Mac versions), you may select your preferred language(s) using the Options-->General-->Language form. In Netscape for Unix (for now), set the X resource *httpAcceptLanguage , e.g. add the line
Netscape*httpAcceptLanguage: fr, en-US
to ~/.Xdefaults, then exec xrdb .Xdefaults before launching Netscape.
In Mosaic-L10N, use the Options-->Accept Languages form.

Currently, the format of this field is an ordered, comma-delimited list of languages, with optional dialects. For instance, you might enter
en-GB,en-US,fr,de
which would mean that your first preference is for UK English, then US english, then French, then German.
The language codes are taken from the ISO639 2-character codes and the dialects are taken from the ISO3166 2-character country codes (as used in Internet domains).
In future, you might be able to assign a weight to these, depending on how well you can read the language, for instance:
en-GB;qs=0.95,en-US;qs=0.9,fr;qs=0.5,de;qs=0.3
but for now only the ordering is significant.

When your browser is configured, you will notice that it emits an HTTP_ACCEPT_LANGUAGE string in addition to the usual strings of HTTP_ACCEPT, HTTP_USER_AGENT, etc.

Server Configuration

In order to serve several languages, you must configure the server for content negotiation. One way to do this in the Apache server is by using a Type Map file. In conf/srm.conf you must enable type maps by uncommenting the line
AddType application/x-type-map var
You might also wish to change the DirectoryIndex default, e.g.
DirectoryIndex index.var
When you are done, you will need to restart httpd, e.g. killall -HUP httpd. Note that if you changed DirectoryIndex you will need to have created index.var files in all directories that previously used e.g. index.html as a default index.

A typical Type Map file looks like this:

URI: start; vary="type,language"

URI: English.html
Content-type: text/html
Content-language: en-GB

URI: American.html
Content-type: text/html
Content-language: en-US

URI: Quebec.html
Content-type: text/html
Content-language: fr-CA

URI: French.html
Content-type: text/html
Content-language: fr

URI: German.shtml
Content-type: text/x-server-parsed-html
Content-language: de

URI: Japanese-iso.html
Content-type: text/html; charset=iso-2022-jp
Content-language: ja

URI: Japanese-euc.html
Content-type: text/html; charset=x-euc-jp
Content-language: ja

With this configuration, assuming that you have set DirectoryIndex to this file (e.g. index.var), a user coming into this directory will see the index in their preferred language automatically, without having to click on a Français Ici link or similar aid. Note that all the entries in the Type Map file must be regular html files, not CGI scripts or redirects.

There are many approaches to building a multilingual structure; one could use Type Map indices in all directories, and keep all language versions of a document together in one directory, or one could use a Type Map file to split the directory structure at the root, and use separate hierarchies for different languages. If the first approach is used, it is possible to see the same document in different languages simply by changing the Accept Languages setting on the browser and reloading the document.

Note re. cache

When content-negotiation is being used, there are several documents with the same URL, which causes some problems. Early versions of Apache did not handle this properly. In Apache 1.1.1 the documents are sent without a Last-Modified header, which makes them uncacheable.

In the proposed HTTPD 1.1, the server should send the http Vary header, as follows:

Vary: Accept-Language
which indicates to a proxy cache that content-negotiation is taking place and that the cached document may be incorrect.

Here is a script "select-lang" which performs content-negotiation using http redirects. This resolves the cacheing problem at the expense of requiring another (small) transfer. The script may be modified to guess languages based on ip address. Here is an example of the script in action. Here it is again using the Apache Action directive (cgi-bin is hidden).

Note re. Robots

Where content-negotiation is used to provide multiple languages, the default document (sent in reply to a request with no Accept-Language header) may include an Alternates header (HTTPD 1.1) to indicate which languages are available. The syntax has not yet (July 96) been decided.

Charsets

The above method works very well with Western European languages; those that use the ISO-8859-1 character set (the default HTML set). To view documents in languages such as Russian, Japanese, or Greek, it is necessary to select an alternative character set. This may be accomplished by specifying an extended MIME type, for instance
Content-type: text/html; charset=iso-8859-5
Unfortunately, many current browsers interpret this as a binary file, and launch a Save-to-File form rather than displaying the page. Browser authors are recommended to parse this correctly in future, or at least ignore the extension. In the meantime, the required functionality may be obtained using an HTML META tag, for instance:
<META HTTP-EQUIV="Content-type" CONTENT="text/html; charset=iso-8859-5">
This will be correctly interpreted by later versions of Netscape Navigator and used to select the correct font. The current setting of the document charset may be seen in the View --> Document Info form.
When the charset is specified in this way, Netscape will automatically change fonts when moving between documents in different languages.

It is recommended that the charset be specified in the parent document also, so that the browser will have the correct font selected before any of the document is parsed, for instance:
<A HREF="DocumentURL" CHARSET="iso-8859-5">Document Name<\A>

Future browsers are recommended to implement the HTTP Accept-Charset header in order to negotiate properly in those cases (Cyrillic, Japanese) where multiple fonts exist for the language.

Resource Page

Multilingual Resources here.

Further Reading

Internet Drafts are available from InterNIC

The META generator at Vancouver Webpages will generate some charset and language tags.


A.Daviel