20 Aug 2024 - tsp
Last update 20 Aug 2024
9 mins
So who hasn’t had this problem? You may start with a simple webpage or blog, but as it grows, the need to add translated versions becomes more appealing. Many webpages solve this by employing a full blown content management system that regenerates all webpages on every request and simply filling the different text elements from a database including different translations. Or they use some kind of server side scripting to decide which language variant to deliver. But HTTP has a built in mechanism to decide which content types and also which languages to deliver to a client and browsers allow users to configure preferred data types and languages - thought this is unfortunately not well known to many users.
The following blog article describes how one can add language negotiation to an existing static site in a very simple manner while still allowing users to select the language.
During each request a browser sends request headers such
as Accept
and Accept-Language
. The Accept
header allows one to specify which formats are supported - for
example to allow the server to deliver images in a format
supported by a browser. The Accept-Language
header lists
the languages configured by the user in order of his
preferences with assigned weights. A typical Accept-Language
header may look like the following:
Accept-Language: de, en;q=0.8, it;q=0.7
This would mean the user prefers content in German (de
)
with a weight of q=1
. If this is not available he would like
to get English content (en
) with a weight of 0.8
and if
this is also not available content in Italian (it
) with a
weight of 0.7
.
In case the server supports delivery in different languages he has
to tell the clients - and especially proxy servers - that he is
delivering different content depending on this header. This is what’s
the Vary
header is for. In response to each query the server
includes this header that just lists request header fields that have
to match to deliver same content so proxies are allowed to return
the cached content to a client. In case the header looks like
Vary: header1, header2
In this case the header values of the fields header1
and header2
influence which content is delivered. In case
Vary is set to *
the response is uncacheable since any
factor - also outside the HTTP headers - influence the content
generated. When only performing language negotiation Vary should
be set to Accept-Language
So how does one implement this in a static site? In my case the
webpage is generated using Jekyll like described in a previous blog article
and I already had much content that was served at the webroot /
.
The idea will be to serve different translations from a different URI.
For example the page /test.html
contains the original version in
plain English. The German translation should be available
at /translations/de/test.html
, a French one
at /translations/fr/test.html
. When a user requests /test.html
the server should check if there is a translation available in the users
preferred language. If so the translation should be delivered, else the
original English version. In addition the user should be able to override
this an select any of the translations at /translations/XXX/test.html
- including
the original English version by accessing /translations/en/test.html
even though the original English article is contained at /test.html
.
This will involve two rewrites:
/translations/en
. In this case just
strip the prefix and terminate any further rewrites./translations/XXX
where XXX
is any supported language. In this case just allow the request.Àccept-Language
header listing any of the
supported languages. In this case check also if the translated file
exists - and if rewrite into the /translations/XXX/
directory.The following directory structure reflects the content of the webpage - for
Jekyll it’s the layout inside the _site
directory:
/usr/www/www.example.com/www/
|-- index.html
|-- 2024/
| |-- 01/
| | |-- 01-example.html
| | |-- 02-example2.html
|-- translations/
|-- de/
| |-- index.html
| |-- 2024
| | |-- 01/
| | | |-- 01-example.html
|-- it/
|-- 2024
|-- 01/
|-- 02-example2.html
The root directory contains the original - in the example case the English - version.
URI | Original (English) version | Explicit English version | German version | Italian version |
---|---|---|---|---|
/index.html | /index.html | /translations/en/index.html | /translations/de/index.html | |
/2024/01/01-example.html | /2024/01/01-example.html | /translations/en/2024/01/01-example.html | /translations/de/2024/01/01-example.html | |
/2024/01/02-example2.html | /2024/01/02-example2.html | /translations/en/2024/01/02-example2.html | /translations/it/2024/01/02-example2.html |
The following modules are required:
mod_rewrite
to actually rewrite the URIs for the internal requests.If not enabled one can either enable the modules in httpd.conf
(for Apache
2.4 for example in /usr/local/etc/apache24/httpd.conf)
:
LoadModule rewrite_module libexec/apache24/mod_rewrite.so
or one can enable them using the a2enmod
command:
sudo a2enmod rewrite
Now one can configure the rewrite rules as well as the Vary header in the VirtualHost
configuration, in the Location
configuration or in the corresponding .htaccess
.
The following example is a configuration in a VirtualHost
configuration:
<VirtualHost *:443>
ServerName www.example.com
ServerAdmin complains@example.com
DocumentRoot /usr/www/www.example.com/www/
# Set the Vary header to signal we deliver
# different content based on Accept-Language request headers
# and enable RewriteEngine
Header set Vary "Accept-Language"
RewriteEngine On
# In case our request URI goes to the English translation we
# rewrite back to the document root
RewriteCond %{REQUEST_URI} ^/translations/en/(.*) [NC]
RewriteRule ^/translations/en/(.*) /$1 [L]
# We rewrite in case:
# * We dont already point to a translation (not startign with /translations/LANG/
# * We have a supported language in Accept-Language header
# * The translated file exists
#
# In case nothing applies we dont rewrite but try to serve from the
# root directory (that is also our default language)
RewriteCond %{REQUEST_URI} !^/translations/(de|it)/ [NC]
RewriteCond %{HTTP:Accept-Language} ^(de|it) [NC]
RewriteCond %{DOCUMENT_ROOT}/translations/%1%{REQUEST_URI} -f
RewriteRule ^(.*)$ /translations/%1/$1 [L]
</VirtualHost>
Lets take a look how all of this works. At first we simply set the Vary
header
using Header set
in all responses to be set to Accept-Language
to inform
clients and proxies we are delivering different content based on the Accept-Language
header.
After that we enable the rewrite engine by setting RewriteEngine On
.
The structure of rewrite rules is simple - they are composed of a sequence
of RewriteCond
directives that specify conditionals for the following RewriteRule
.
There can be one or more of those conditions. If all apply the following RewriteRule
is executed. Recall the flags for the rewrite conditions and rules:
[L]
means to stop processing in case the rule matches. If this is not specified the
following conditions and rules apply too.[NC]
means we want to match in a case insensitive matter[C]
allows one to chain different rules - if a rule matches the next one is processed
as usual but if it does not match the next rule is also considered not matching.[CO]
allows one to set a cookie. This flag requires 3 parameters and supports 5
optional ones. Those are [CO=NAME:VALUE:DOMAIN:lifetime:path:secure:httponly:samesite]
.Take a look at the official documentation for more information on those flags.
The first rule that has been used matches all requests to translations in our
default language - in this case en
.
RewriteCond %{REQUEST_URI} ^/translations/en/(.*) [NC]
RewriteRule ^/translations/en/(.*) /$1 [L]
The first condition reads all REQUEST_URIs starting with the string /translations/en/
should
match, the string following is matched by the matching brackets (.*)
that can later be
referenced by $1
. If this condition matches we rewrite all of those URIs and just strip
the /translations/en/
part. Then processing stops.
The second rule is the most complex one where the magic happens:
RewriteCond %{REQUEST_URI} !^/translations/(de|it)/ [NC]
RewriteCond %{HTTP:Accept-Language} ^(de|it) [NC]
RewriteCond %{DOCUMENT_ROOT}/translations/%1%{REQUEST_URI} -f
RewriteRule ^(.*)$ /translations/%1/$1 [L]
We match all URIs that are not starting with one of our supported translation
sub directories /translations/de/
and /translations/it/
. We do this to not
rewrite requests that already target a specific translation. The second condition request
the Accept-Language
header to start with one of our supported languages at highest
priority. The third condition checks if the REQUEST_URI
is available under the
translation matching the Accept-Language
header setting - the -f
flag checks
if the file exists. In case all conditions apply we prepend the /translations/LANG/
prefix in front of our REQUEST_URI
and stop processing.
After reloading the webserver configuration using
apachectl graceful
the negotiation mechanism is working as expected. Users receive the language that matches
their browser settings as good as possible - as long as translations are not manually
selected. Unfortunately there is no stock way of generating the same directory hierarchy
as the _posts
folder. One can implement this behavior using a plugin - more on
this will follow up soon.
This article is tagged:
Dipl.-Ing. Thomas Spielauer, Wien (webcomplains389t48957@tspi.at)
This webpage is also available via TOR at http://rh6v563nt2dnxd5h2vhhqkudmyvjaevgiv77c62xflas52d5omtkxuid.onion/