or: Why Rudolph Is More Than Just a Shiny Nose
Dunder sat, glumly staring at the computer screen.
“What’s up, Dunder?” asked Rudolph, entering the stable and shaking off the snow from his antlers.
“Well,” Dunder replied, “I’ve just finished coding the new reindeer intranet Santa Claus asked me to do. You know how he likes to appear to be at the cutting edge, talking incessantly about Web 2.0, AJAX, rounded corners; he even spooked Comet recently by talking about him as if he were some pushy web server.
“I’ve managed to keep him happy, whilst also keeping it usable, accessible, and gleaming — and I’m still on the back row of the sleigh! But anyway, given the elves will be the ones using the site, and they come from all over the world, the site is in multiple languages. Which is great, except when it comes to the preview JavaScript I’ve written for the reindeer order form. Here, have a look…”
As he said that, he brought up the order form in French on the screen. (Same in English).
“Looks good,” said Rudolph.
“But if I add some items,” said Dunder, “the preview appears in English, as it’s hard-coded in the JavaScript. I don’t want separate code for each language, as that’s just silly — I thought about just having if statements, but that doesn’t scale at all…”
“And there’s more, you aren’t displaying large numbers in French properly, either,” added Rudolph, who had been playing and looking at part of the source code:
function update_text() {var hay = getValue('hay');var carrots = getValue('carrots');var bells = getValue('bells');var total = 50 * bells + 30 * hay + 10 * carrots;var out = 'You are ordering '+ pretty_num(hay) + ' bushel' + pluralise(hay) + ' of hay, '+ pretty_num(carrots) + ' carrot' + pluralise(carrots)+ ', and ' + pretty_num(bells) + ' shiny bell' + pluralise(bells)+ ', at a total cost of <strong>' + pretty_num(total)+ '</strong> gold pieces. Thank you.';document.getElementById('preview').innerHTML = out;}function pretty_num(n) {n += '';var o = '';for (i=n.length; i>3; i-=3) {o = ',' + n.slice(i-3, i) + o;}o = n.slice(0, i) + o;return o;}function pluralise(n) {if (n!=1) return 's';return '';}- Source: /code/javascript-internationalisation/1.txt
“Oh, botheration!” cried Dunder. “This is just so complicated.”
“It doesn’t have to be,” said Rudolph, “you just have to think about things in a slightly different way from what you’re used to. As we’re only a simple example, we won’t be able to cover all possibilities, but for starters, we need some way of providing different information to the script dependent on the language. We’ll create a global i18n object, say, and fill it with the correct language information. The first variable we’ll need will be a thousands separator, and then we can change the pretty_num function to use that instead:
function pretty_num(n) {n += '';var o = '';for (i=n.length; i>3; i-=3) {o = i18n.thousands_sep + n.slice(i-3, i) + o;}o = n.slice(0, i) + o;return o;}- Source: /code/javascript-internationalisation/2.txt
“The i18n object will also contain our translations, which we will access through a function called _() — that’s just an underscore. Other languages have a function of the same name doing the same thing. It’s very simple:
function _(s) {if (typeof(i18n)!='undefined' && i18n[s]) {return i18n[s];}return s;}- Source: /code/javascript-internationalisation/3.txt
“So if a translation is available and provided, we’ll use that; otherwise we’ll default to the string provided — which is helpful if the translation begins to lag behind the site’s text at all, as at least something will be output.”
“Got it,” said Dunder. “ _('Hello Dunder') will print the translation of that string, if one exists, ‘Hello Dunder’ if not.”
“Exactly. Moving on, your plural function breaks even in English if we have a word where the plural doesn’t add an s — like ‘children’.”
“You’re right,” said Dunder. “How did I miss that?”
“No harm done. Better to provide both singular and plural words to the function and let it decide which to use, performing any translation as well:
function pluralise(s, p, n) {if (n != 1) return _(p);return _(s);}- Source: /code/javascript-internationalisation/4.txt
“We’d have to provide different functions for different languages as we employed more elves and got more complicated — for example, in Polish, the word ‘file’ pluralises like this: 1 plik, 2-4 pliki, 5-21 plików, 22-24 pliki, 25-31 plików, and so on.” (More information on plural forms)
“Gosh!”
“Next, as different languages have different word orders, we must stop using concatenation to construct sentences, as it would be impossible for other languages to fit in; we have to keep coherent strings together. Let’s rewrite your update function, and then go through it:
function update_text() {var hay = getValue('hay');var carrots = getValue('carrots');var bells = getValue('bells');var total = 50 * bells + 30 * hay + 10 * carrots;hay = sprintf(pluralise('%s bushel of hay', '%s bushels of hay', hay), pretty_num(hay));carrots = sprintf(pluralise('%s carrot', '%s carrots', carrots), pretty_num(carrots));bells = sprintf(pluralise('%s shiny bell', '%s shiny bells', bells), pretty_num(bells));var list = sprintf(_('%s, %s, and %s'), hay, carrots, bells);var out = sprintf(_('You are ordering %s, at a total cost of <strong>%s</strong> gold pieces.'),list, pretty_num(total));out += ' ';out += _('Thank you.');document.getElementById('preview').innerHTML = out;}- Source: /code/javascript-internationalisation/5.txt
“ sprintf is a function in many other languages that, given a format string and some variables, slots the variables into place within the string. JavaScript doesn’t have such a function, so we’ll write our own. Again, keep it simple for now, only integers and strings; I’m sure more complete ones can be found on the internet.
function sprintf(s) {var bits = s.split('%');var out = bits[0];var re = /^([ds])(.*)$/;for (var i=1; i<bits.length; i++) {p = re.exec(bits[i]);if (!p || arguments[i]==null) continue;if (p[1] == 'd') {out += parseInt(arguments[i], 10);} else if (p[1] == 's') {out += arguments[i];}out += p[2];}return out;}- Source: /code/javascript-internationalisation/6.txt
“Lastly, we need to create one file for each language, containing our i18n object, and then include that from the relevant HTML. Here’s what a blank translation file would look like for your order form:
var i18n = {thousands_sep: ',',"%s bushel of hay": '',"%s bushels of hay": '',"%s carrot": '',"%s carrots": '',"%s shiny bell": '',"%s shiny bells": '',"%s, %s, and %s": '',"You are ordering %s, at a total cost of <strong>%s</strong> gold pieces.": '',"Thank you.": ''};- Source: /code/javascript-internationalisation/7.txt
“If you implement this across the intranet, you’ll want to investigate the xgettext program, which can automatically extract all strings that need translating from all sorts of code files into a standard .po file (I think Python mode works best for JavaScript). You can then use a different program to take the translated .po file and automatically create the language-specific JavaScript files for us.” (e.g. German .po file for PledgeBank, mySociety’s .po-.js script, example output)
With a flourish, Rudolph finished editing. “And there we go, localised JavaScript in English, French, or German, all using the same main code.”
“Thanks so much, Rudolph!” said Dunder.
“I’m not just a pretty nose!” Rudolph quipped. “Oh, and one last thing — please comment liberally explaining the context of strings you use. Your translator will thank you, probably at the same time as they point out the four hundred places you’ve done something in code that only works in your language and no-one else’s…”
Thanks to Tim Morley and Edmund Grimley Evans for the French and German translations respectively.


Comments
Got something to add? You can just leave a comment.
08/12/2007
You know this, but it’s worth pointing out that using the English version as the key to translation has a couple of problems. Firstly, it’s fairly verbose, but more importantly you can wind up with places where the same phrase should be used in English, but different phrases in another language. At that point, you have to start getting cute with the exact string you use, or fall back on tokens; it’s often easier to manage as tokens throughout (and solves the verbosity problem to an extent). Writing tools to manage the translations isn’t difficult (for one system I wrote a short python script to keep track of the hashes of the ‘primary’ translation from tokens to English — since all the developers wrote English — so you’d know when a translation was out of date or missing; it could pull out the new things to translate, and it wouldn’t be difficult to extent to bundle them off to a translate semi-automatically).
You’ll get collisions using English as the token when you run into homographs – English has a disturbing number of them. For instance, if you had a site of fun things to do with the family, you might want to label ‘Fair’, ‘Circus’ and ‘Tractor competition’. But you’d also want weather forecasts for the days they were on, and might want ‘Rain’, ‘Calm’ or ‘Fair’.
08/12/2007
This is a very good introduction. I realize you said at the outset that, by necessity, you couldn’t cover every detail, but I thought people might be interested in this brief observation about your sprintf function. The way you have currently coded it you’re restricted to putting the variables in the string in a specific order. That probably won’t be a problem in the particular application that you present, but in general it can lead to translators having to use odd-sounding non-standard sentence formations to coerce the variables into the order they’re provided in. What would be better is a system where instead of specifying a string like, “You are ordering s, at a total cost of <strong>%s</strong> gold pieces.”, you could instead say, “You are ordering %{1:s}, at a total cost of <strong>{2:s}</strong> gold pieces.” The numbers in the variable markers represent which variable from the list goes where. That way, translators can put the variables in wherever they need to, even multiple times per variable if they want.
08/12/2007
James: Sure; as I use gettext, I found it easiest for my JavaScript to behave similarly to and be more integrated with that, but as you say, token mapping would be possible too.
Rory: Thanks :-) I’d use “%2$s” as that’s what most other sprintf() functions use. As I – sorry, Rudolph – says, there are quite a few fuller JavaScript sprintf() functions on the internet – e.g. Ash Searle’s and alexei’s both include argument reordering.
08/12/2007
Very interesting introduction indeed!
James, to disambiguate translation strings, at Skyrock we use what you could call “locale redundancy.”
The principle: you have a base language (in your case, English) with strings like “fair (adj.)” that is itself translated to the same language without the disambiguation. This helps for translations too; for example, if we have a string like “Their posts:”, we’re unsure about languages with genders for the plural pronouns, so we have both “(m) Their posts:” and “(f) Their posts:” in our i10n database.
09/12/2007
An interesting read, however wherever possible I’d suggest trying to generate your look up object (i18n) server side which allows you to take advantage of existing localisation mechanisms. By doing this you could avoid having to write much of the code described above. Generally I’d include the look up object in the HTML head (assuming few translations) or in a separate dynamically generated file (with suitable HTTP headers set to ensure caching).
Of course some applications will require these kind of manipulations to done client side but I’m guessing they’re the exception rather than the rule.
09/12/2007
Ed: I’m afraid I don’t follow you. Are you suggesting that upon every change to the stock ordering form, a JavaScript request is made to the server to generate the correct summary text? Of course, everything that is being done server-side has its translations done server-side as well, this is simply about adding i18n to client side code (which as we all know will be added as progressive enhancement :) ); I can’t see what code you could having to write. Obviously, you can e.g. generate your JavaScript automatically from translation .po files; that’s what I do and Rudolph suggested in the paragraph before he finished editing.
10/12/2007
Michel – neat :-). That has the advantage of being easier for translators to work with than pure tokens, and (with some care and cleverness around longer strings) probably isn’t much more verbose.
12/12/2007
@Mathew about Ed:
I think he means create the javascript with a dynamic language like PHP e.g.
Say you create your JS files with markers in place that get run through gettext on the server side and are presented as a “normal” JS file to the browser.
16/01/2008
Hi guys, I have an alternative suggestion for you.
There is also a gettext implementation for javascript. It works very nicely for us.
Apparently it also has a sprintf like support, but I haven’t tested it yet.
I’ve written a blog post about it: http://wallsoft.blogspot.com/2008/01/gettext-for-php-and-javascript.html
The library can be found here: http://code.google.com/p/gettext-js/
Impress us