28 February 2010

Misconceptions about Java Internationalization

Let me start with a joke:
What do you call someone who speaks three languages?
Trilingual.
What do you call someone who speaks two languages?
Bilingual.
What do you call someone who speaks one language?
American.
To be fair on Americans, even most of us multilingual Europeans tend to be biased when it comes to internationalization, tacitly assuming that text is written left-to-right and can be sorted from A to Z.

Most Java developers are familiar with resource bundles backed by properties files. The basics can be found in the Internationalization Trail of the Java Tutorial. Multilingual Java applications often come with a set of properties files, e.g.
  • MyApp_de_AT.properties
  • MyApp_de.properties
  • MyApp_es.properties
  • MyApp.properties
where MyApp.properties contains the "default" message resources in English, MyApp_de.properties and MyApp_es.properties contain the German and Spanish resources, respectively, and MyApp_de_AT.properties contains some country specific variants for the Austrian flavour of German. Usually, files for country specific variants are sparsely populated, containing only those properties that actually differ from the mainstream language version, like Jänner (de_AT) vs. Januar (de) vs. January (en).

However, you may be surprised in this case to end up with a German string even when you requested a resource for an English locale.

Assume nothing is a sound principle for robust software development, and you should not assume that English is the default or fallback language. In fact, the fallback for resources from an unsupported locale is the system default locale, which is based on the host environment.

See the documentation for ResourceBundle.getBundle() and Locale.getDefault() for more details.

So when the default locale of your system is de_DE and you request a resource for locale en_US, the lookup order for the properties files is
  1. MyApp_en_US.properties
  2. MyApp_en.properties
  3. MyApp_de_DE.properties
  4. MyApp_de.properties
  5. MyApp.properties
Hence, ResourceBundle.getString() will return a German string from MyApp_de.properties, since the first three files do not exist and the English resources are preceded by the German ones in this sequence.

There are two solutions:
  1. As a user, set your default locale to en when launching the application.
  2. As a developer, make sure to provide a properties file for locale en (which may be empty).
The method for changing the default locale depends on your Java VM and your operating system. Setting the system property user.language may work on some platforms, but not with the Sun JDK 1.6.0 under Linux. Instead, you need to set the environment variable LANG before launching the Java VM.

The preferred solution is the second one, of course. Even when MyApp_en.properties is empty, it will be picked up as entry point for resource lookup. If a given key cannot be found in this file, the parent file MyApp.properties will be used as fallback, which is just the desired behaviour.

No comments: