Language negotiation for static website using Apache httpd

20 Aug 2024 - tsp
Last update 20 Aug 2024
Reading time 9 mins

So who hasn’t had this problem? You may start with a simple webpage or blog, but as it grows, the need to add translated versions becomes more appealing. Many webpages solve this by employing a full blown content management system that regenerates all webpages on every request and simply filling the different text elements from a database including different translations. Or they use some kind of server side scripting to decide which language variant to deliver. But HTTP has a built in mechanism to decide which content types and also which languages to deliver to a client and browsers allow users to configure preferred data types and languages - thought this is unfortunately not well known to many users.

The following blog article describes how one can add language negotiation to an existing static site in a very simple manner while still allowing users to select the language.

How content and language negotiation works

During each request a browser sends request headers such as Accept and Accept-Language. The Accept header allows one to specify which formats are supported - for example to allow the server to deliver images in a format supported by a browser. The Accept-Language header lists the languages configured by the user in order of his preferences with assigned weights. A typical Accept-Language header may look like the following:

Accept-Language: de, en;q=0.8, it;q=0.7

This would mean the user prefers content in German (de) with a weight of q=1. If this is not available he would like to get English content (en) with a weight of 0.8 and if this is also not available content in Italian (it) with a weight of 0.7.

In case the server supports delivery in different languages he has to tell the clients - and especially proxy servers - that he is delivering different content depending on this header. This is what’s the Vary header is for. In response to each query the server includes this header that just lists request header fields that have to match to deliver same content so proxies are allowed to return the cached content to a client. In case the header looks like

Vary: header1, header2

In this case the header values of the fields header1 and header2 influence which content is delivered. In case Vary is set to * the response is uncacheable since any factor - also outside the HTTP headers - influence the content generated. When only performing language negotiation Vary should be set to Accept-Language

The idea

So how does one implement this in a static site? In my case the webpage is generated using Jekyll like described in a previous blog article and I already had much content that was served at the webroot /. The idea will be to serve different translations from a different URI. For example the page /test.html contains the original version in plain English. The German translation should be available at /translations/de/test.html, a French one at /translations/fr/test.html. When a user requests /test.html the server should check if there is a translation available in the users preferred language. If so the translation should be delivered, else the original English version. In addition the user should be able to override this an select any of the translations at /translations/XXX/test.html - including the original English version by accessing /translations/en/test.html even though the original English article is contained at /test.html.

This will involve two rewrites:

The used directory structure

The following directory structure reflects the content of the webpage - for Jekyll it’s the layout inside the _site directory:

/usr/www/www.example.com/www/
|-- index.html
|-- 2024/
|   |-- 01/
|   |   |-- 01-example.html
|   |   |-- 02-example2.html
|-- translations/
    |-- de/
    |   |-- index.html
    |   |-- 2024
    |   |   |-- 01/
    |   |   |   |-- 01-example.html
    |-- it/
        |-- 2024
            |-- 01/
                |-- 02-example2.html

The root directory contains the original - in the example case the English - version.

URI Original (English) version Explicit English version German version Italian version
/index.html /index.html /translations/en/index.html /translations/de/index.html  
/2024/01/01-example.html /2024/01/01-example.html /translations/en/2024/01/01-example.html /translations/de/2024/01/01-example.html  
/2024/01/02-example2.html /2024/01/02-example2.html /translations/en/2024/01/02-example2.html   /translations/it/2024/01/02-example2.html

Apache configuration

The following modules are required:

If not enabled one can either enable the modules in httpd.conf (for Apache 2.4 for example in /usr/local/etc/apache24/httpd.conf):

LoadModule rewrite_module libexec/apache24/mod_rewrite.so

or one can enable them using the a2enmod command:

sudo a2enmod rewrite

Now one can configure the rewrite rules as well as the Vary header in the VirtualHost configuration, in the Location configuration or in the corresponding .htaccess. The following example is a configuration in a VirtualHost configuration:

<VirtualHost *:443>
    ServerName www.example.com
    ServerAdmin complains@example.com
    DocumentRoot /usr/www/www.example.com/www/

    # Set the Vary header to signal we deliver
    # different content based on Accept-Language request headers
    # and enable RewriteEngine

    Header set Vary "Accept-Language"

    RewriteEngine On

    # In case our request URI goes to the English translation we
    # rewrite back to the document root

    RewriteCond %{REQUEST_URI} ^/translations/en/(.*) [NC]
    RewriteRule ^/translations/en/(.*) /$1 [L]

    # We rewrite in case:
    # * We dont already point to a translation (not startign with /translations/LANG/
    # * We have a supported language in Accept-Language header
    # * The translated file exists
    #
    # In case nothing applies we dont rewrite but try to serve from the
    # root directory (that is also our default language)

    RewriteCond %{REQUEST_URI} !^/translations/(de|it)/ [NC]
    RewriteCond %{HTTP:Accept-Language} ^(de|it) [NC]
    RewriteCond %{DOCUMENT_ROOT}/translations/%1%{REQUEST_URI} -f
    RewriteRule ^(.*)$ /translations/%1/$1 [L]
</VirtualHost>

Lets take a look how all of this works. At first we simply set the Vary header using Header set in all responses to be set to Accept-Language to inform clients and proxies we are delivering different content based on the Accept-Language header.

After that we enable the rewrite engine by setting RewriteEngine On.

The structure of rewrite rules is simple - they are composed of a sequence of RewriteCond directives that specify conditionals for the following RewriteRule. There can be one or more of those conditions. If all apply the following RewriteRule is executed. Recall the flags for the rewrite conditions and rules:

Take a look at the official documentation for more information on those flags.

The first rule that has been used matches all requests to translations in our default language - in this case en.

RewriteCond %{REQUEST_URI} ^/translations/en/(.*) [NC]
RewriteRule ^/translations/en/(.*) /$1 [L]

The first condition reads all REQUEST_URIs starting with the string /translations/en/ should match, the string following is matched by the matching brackets (.*) that can later be referenced by $1. If this condition matches we rewrite all of those URIs and just strip the /translations/en/ part. Then processing stops.

The second rule is the most complex one where the magic happens:

RewriteCond %{REQUEST_URI} !^/translations/(de|it)/ [NC]
RewriteCond %{HTTP:Accept-Language} ^(de|it) [NC]
RewriteCond %{DOCUMENT_ROOT}/translations/%1%{REQUEST_URI} -f
RewriteRule ^(.*)$ /translations/%1/$1 [L]

We match all URIs that are not starting with one of our supported translation sub directories /translations/de/ and /translations/it/. We do this to not rewrite requests that already target a specific translation. The second condition request the Accept-Language header to start with one of our supported languages at highest priority. The third condition checks if the REQUEST_URI is available under the translation matching the Accept-Language header setting - the -f flag checks if the file exists. In case all conditions apply we prepend the /translations/LANG/ prefix in front of our REQUEST_URI and stop processing.

After reloading the webserver configuration using

apachectl graceful

the negotiation mechanism is working as expected. Users receive the language that matches their browser settings as good as possible - as long as translations are not manually selected. Unfortunately there is no stock way of generating the same directory hierarchy as the _posts folder. One can implement this behavior using a plugin - more on this will follow up soon.

This article is tagged:


Data protection policy

Dipl.-Ing. Thomas Spielauer, Wien (webcomplains389t48957@tspi.at)

This webpage is also available via TOR at http://rh6v563nt2dnxd5h2vhhqkudmyvjaevgiv77c62xflas52d5omtkxuid.onion/

Valid HTML 4.01 Strict Powered by FreeBSD IPv6 support