Using Multiple Languages in HTML
There are three considerations for presenting HTML in non-English
languages. First, that the document is delivered in the desired
natural language (such as English, French, etc.) and dialect (US, British, etc.).
Second, that the document is presented in the correct character set. This
is a requirement for most Eastern languages (Russian, Japanese, etc.). Third,
that the document is presented in the correct directionality. This is a
consideration for languages such as Hebrew, Arabic, Japanese that are
customarily written right-to-left or top-to-bottom.
The default charset in HTML is ISO-8859-1, which
is an 8-bit Western European character set. There is a
set of escape sequences
(such as ç) which will translate 7-bit HTML into ISO-8859-1
for presentation. Thus Western European languages are already supported
in charset and directionality. In order to present the required language,
newer browsers and servers perform Content Negotiation.
For languages with a different character set from English, an extended MIME
type may be used which includes a charset modifier. In many character
sets, 7-bit US ASCII is a subset of the 8-bit charset, thus the HTML
structure reads normally since all HTML tags are 7-bit.
Content Negotiation
Content-negotiation uses the features of e.g. the
Apache server
to serve a document based on natural language. The browser sends
an http_accept_language request and the server uses a type-map file
to find the correct file.
See how to make a Multilingual Web server
for a discussion of type map files.
Language Samples - Content Negotiated
Samples using Content-Negotiation
Here is the full Var file.
You can try the interaction of your HTTP_ACCEPT_LANGUAGE variable with
these different .var files:
- en-CA,en-GB,en-US,fr,de,
Var file
- en-CA,en-GB,en-US,fr,de using qs prefers en-CA,
Var file
- en-CA,en-GB,en-US,fr,de using qs prefers en-GB,
Var file
- en-CA,en-GB,en-US,fr,de using qs prefers en-US,
Var file
- en-CA,en-GB,en-US,fr,de using qs prefers French (fr),
Var file
- fr,de,
Var file
- default to multiple languages,
Var file
Character sets
A charset modifier may be appended to the Content-type header, like this:
Content-type: text/plain;charset=x-euc-jp
This may be generated using the Asis feature in Apache (set in
srm.conf).
Netscape View-->Document Info will tell you the charset of the current document.
This is known to work with Netscape 2.0 for X-11. It overrides the language
selection and automatically selects the correct font.
Netscape 2.0 (X11) currently uses these locales.
Netscape 3.0 does things slightly differently;
this page lists the
currently (3.0b4) understood languages and charsets.
Other browsers, however, mostly mis-interpret the charset parameter and think
the document is a binary file.
Netscape 3.0 understands this parameter in a META tag, e.g.
<META HTTP-EQUIV="Content-type" CONTENT="text/html;charset=x-euc-jp">
which will probably be ignored by other browsers,
thus may be safe to use (assuming the server does not parse HTTP-EQUIV headers
into real http headers). The META generator
will generate a few of these pairs.
Language Samples - text/html with and without charset
- French (ISO Latin-1 / ISO 8859-1),
with charset=iso-8859-1,
using META tag
- français.html
- German (ISO Latin-1 / ISO 8859-1),
with charset=iso-8859-1,
using META tag
- Russian (ISO 8859-5, with GIF),
with charset=iso-8859-5,
using META tag
- Russian (KOI-8),
with charset=koi8,
using META tag
- Russian (CP1251),
with charset=cp1251,
using META tag
- Greek (ISO 8859-7, with GIF),
with charset=iso-8859-7,
using META tag
- Hebrew (ISO 8859-8) in Visual Directionality,
with charset=iso-8859-8,
using META tag
- Hebrew (ISO 8859-8) in Implicit Directionality,
with charset=iso-8859-8
- Chinese (GB 2312 / GB-encoding),
with charset=GB2312,
using META tag
- Chinese (GB 2312 / HZ-encoding),
with charset=hz-gb-2312,
using META tag
- Chinese (Big5),
with charset=Big5,
using META tag
- Korean (KSC 5601),
with charset=ksc_5601
(using META tag),
with charset=euc-kr
(using META tag)
- Japanese (JIS X 0208)
with charset=iso-2022-jp,
using META tag
- Japanese (EUC-JP),
with charset=x-euc-jp,
using META tag
- Japanese (JIS),
with charset=iso-2022-jp,
using META tag
- Japanese (Shift-JIS),
with charset=x-sjis,
using META tag
See also
Remove an Entry
Copyright Status
Webpages Admin