Using Multiple Languages in HTML

There are three considerations for presenting HTML in non-English languages. First, that the document is delivered in the desired natural language (such as English, French, etc.) and dialect (US, British, etc.). Second, that the document is presented in the correct character set. This is a requirement for most Eastern languages (Russian, Japanese, etc.). Third, that the document is presented in the correct directionality. This is a consideration for languages such as Hebrew, Arabic, Japanese that are customarily written right-to-left or top-to-bottom.

The default charset in HTML is ISO-8859-1, which is an 8-bit Western European character set. There is a set of escape sequences (such as ç) which will translate 7-bit HTML into ISO-8859-1 for presentation. Thus Western European languages are already supported in charset and directionality. In order to present the required language, newer browsers and servers perform Content Negotiation.

For languages with a different character set from English, an extended MIME type may be used which includes a charset modifier. In many character sets, 7-bit US ASCII is a subset of the 8-bit charset, thus the HTML structure reads normally since all HTML tags are 7-bit.

Content Negotiation

Content-negotiation uses the features of e.g. the Apache server to serve a document based on natural language. The browser sends an http_accept_language request and the server uses a type-map file to find the correct file. See how to make a Multilingual Web server for a discussion of type map files.

Language Samples - Content Negotiated

Samples using Content-Negotiation
Here is the full Var file.

You can try the interaction of your HTTP_ACCEPT_LANGUAGE variable with these different .var files:

Character sets

A charset modifier may be appended to the Content-type header, like this:
Content-type: text/plain;charset=x-euc-jp
This may be generated using the Asis feature in Apache (set in srm.conf).
Netscape View-->Document Info will tell you the charset of the current document.

This is known to work with Netscape 2.0 for X-11. It overrides the language selection and automatically selects the correct font. Netscape 2.0 (X11) currently uses these locales. Netscape 3.0 does things slightly differently; this page lists the currently (3.0b4) understood languages and charsets.

Other browsers, however, mostly mis-interpret the charset parameter and think the document is a binary file.

Netscape 3.0 understands this parameter in a META tag, e.g.

<META HTTP-EQUIV="Content-type" CONTENT="text/html;charset=x-euc-jp">
which will probably be ignored by other browsers, thus may be safe to use (assuming the server does not parse HTTP-EQUIV headers into real http headers). The META generator will generate a few of these pairs.

Language Samples - text/html with and without charset


Multilingual Resources

See also

Remove an Entry
[Home] [Problems?] Copyright Status
Webpages Admin