Multi-lingual support in software development

One of the biggest challenges in software development is multi-lingual support. Surely you could create different versions of the application, and translate the bits of text where translation is needed. But this is not practical, as you would then have to maintain a multitude of versions.

In this post I hope to shed light on some tasks to make your application multi-lingual.

Resource files and gettext

Texts are taken from resource files. This could be a simple key-value based list of text used throughout the application, stored in a separate file, which is then included globally. For example in PHP one could do the following in the English resource file:

abstract class Resource
{
	public static $texts = array(
		'BTN_OK' => 'OK',
		'BTN_CANCEL' => 'Cancel',
		'NAV_ARTICLES' => 'Articles'
	);
}

And the following in the Dutch resource file:

abstract class Resource
{
	public static $texts = array(
		'BTN_OK' => 'OK',
		'BTN_CANCEL' => 'Annuleren',
		'NAV_ARTICLES' => 'Artikelen'
	);
}

Simple isn’t it? Now, imagine you are creating a new feature, which needs a bunch of new texts, you’d have to add to this file every time. Not so maintainable.

My preferred method to store text resources is using gettext, which is a common system for translating texts. It collects text strings from the source code files, and puts them in a certain format in a .po file.

It can be annoying to edit the .po file directly, but thankfully there is POEdit, a tool to manage .po files. You can configure it to search through certain source code directories for text strings, tell it how to detect text strings (in which function call(s) they are encapsulated), and add custom parameters to the xgettext command.

Tip: use keys which almost uniquely identify the given text. You may re-use them for common texts, but be wary that they sometimes translate differently depending on the graphical user interface context. For example, “Edit” can be in a menu, but also on a button. In English, they are the same “Edit” text, but in Japanese for example, one may use 編集 (henshuu) for the menu item, but use 編集する (henshuu suru) for the button text, to denote it is an action. In the source code you would then use for example MENU_EDIT as the key for the menu version, and BTN_EDIT for the button version.

Plural forms

Another problem with multi-lingual support is to correctly translate plural forms. Most languages have two forms, one for singular and one for plural. A common example: “1 result” and “8 results”. In Dutch that would be “1 resultaat” and “8 resultaten”. The zero value uses the plural form in most languages: you don’t say “0 result” but “0 results”. Some languages, such as Japanese or Chinese have only one form: they don’t have a special form for plural. In Japanese you would use 検索結果:1件, which also works for 検索結果:64件. Some languages though, have 3 (Polish, Romanian, Russian), 4 (Slovenian, Welsh), 5 (Irish Gaelic) or even 6 (Arabic) plural forms! [source] And, to my utter astonishment, these forms need a separate key for the translation string! Ouch.

(Maybe it was just a combination of factors with my projects that I had to supply a separate key for each plural form, but it would make much more sense to see it accepting just one key, and it would correctly select the plural form entry for the given number…)

Database design

If your application also uses a database, then you will also have to deal with separate records for each language. For example, if you have a table for categories, you will have to split the text fields into a separate table.

Table splitting for multi-lingual support
The category table is split into two tables

In this example, the translatable fields label and description have been separated into a new table, which is related to the main category table by the category_id field. The lang field tells the system what language the given label and description values are in.

Tip: I recommend using the two-letter ISO-639-1 codes for language identifiers. Additionally, if you also need to work with the different Chinese scripts for example, reserve a few characters for the script version. Use zh_Hans for the simplified version, and zh_Hant for the traditional version.

Presentation: flags or not?

How are you going to tell your visitors that your application is also available in other languages? For computer and mobile applications, they will most probably be set in the application’s settings screen, but for websites it is a bit different.

Mostly, you would make the application so that it automatically selects the language of the browser, but you will also want to give the user the chance to change the language in case he is accessing the website from an internet cafe in a foreign country. This language selection control needs to be in a clear place, but not too intrusive – making it a challenging problem for designers.

Most people use flags. I strongly recommend against it. There are many articles out there explaining why it is bad. In short: flags represent countries, not languages. A prime example is English: should we represent it with the American flag, the Great Britain flag, the Canadian flag or the Australian or New-Zealandian flag?

The best approach would be to just display the available languages in their respective languages, like Wikipedia does. It is good practice to have a small pull-down somewhere at the top, and perhaps additionally a list of languages in the footer at the bottom.

SEO practices and social media

Most website programmers store the language in a session variable. While this works well for your visitors, search engine robots will not work well with this. Furthermore, we can not ignore the social media ubiquity. When people share translated content from your page on social media, they will notice that Facebook or Twitter grabbed the content in the default language. This is because the background requests from Facebook or Twitter, do not share the same session as the sharing user.

The best solution to this problem is to put the language code in the URL. You could put it in the query parameter, like http://example.org/page.php?lang=en. But I have found that this does not work well with search engine robots. The better solution is to put the language code as the first path item in your URL: http://example.org/en/page.php. If such a change would cause a lot of code re-factoring  your other option would be to create a sub-domain for the given language: http://en.example.org/page.php. This will ensure that social media, grabbing content from that URL, will also get the content in the given language.

4 thoughts on “Multi-lingual support in software development”

  1. Great article. However, the best case to store the language in the url, is to setup a different DOMAIN name for each version : mydomain.com, mydomain.it…etc…since one of the first thing search engines does, is to take a look at the domain extension in order to identify the language and local of a website.

    Best,

    Tony

    1. Tony, thanks for your comment! Indeed, you could use the TLD (top-level domain) to distinguish between versions, but then we have the same country vs language problem again. The country TLD signifies a country, not a language. For example, how would you tell the system by the “.be” TLD for Belgium, whether to display French or Dutch (Flemish)? And what about Switzerland? I believe they speak German, French and Italian there. Or Canada: English or French? So it does not really solve the multi-language problem.

      Also, buying extra domains gets a bit expensive, while it is free to add sub-domains. So you need to have the budget for that 🙂

      I still think the way Wikipedia does it, is the best way.

What are your thoughts?