Xbench: translation memory and quality assurance for free

A practical guide for building a translation memory with free tools and making translations faster and more consistent.

A common feeling in game localization is that assisted translation tools are too complex and expensive for our needs. They are overkill . That’s not necessarily true. There are now free and open ┬átranslation memory tools that can be learned in minutes, giving measurable benefits *in terms of turnarounds, quality and consistency.

One of these is surely Apsic Xbench, a free concordancer and QA tool which had a pivoltal role in our team. I will show step by step how it can be used in order to help your daily work. If you want to do the same, you just need to download and install these two free programs.

ApSIC Xbench: the tool itself - http://www.apsic.com/en/products_xbench.html

(Update: Apsic has since added a paid version but the free 2.9 build is fine for most people, especially if you don’t need Unicode support) TMBuilder: transforms Excel files into Tmx usable by the above - http://ankudowicz.com/tmbuilder/


We’ve all been there. You clearly remember having already translated a word or a sentence (and it was a good translation too!) but you just forgot how.

It may be on the same file. So you need to stop translating, try one or two searches, then finally you can copy your translation, find where out where you left the text and paste.

Or it may be in a past translation. And you dig through your hard disk to find it, then check inside the files, then maybe don’t find it at all and you give up, bitter and sad. In both cases, the process slowly gnaws your time and energy.

Enter Xbench. Let’s say that we have already translated the first part of our friend Venetica and that we are ready to start the next.

Game texts are commonly split in many, small strings, and that’s very good for translation memories, because it creates short, accurate entries. All we need to do is take the old translations and put them in a two columns file like this:

Aligning legacy translations Aligning legacy translations

Then, we just need to convert it in a format supported by Xbench. The simplest option is to save it as "Unicode text" or "Csv (comma delimited)" directly from Excel, but it may create some problems with line breaks. That’s why we usually prefer Tmx, which is solid and widely supported. The process is very simple. Save the Excel file and start TMBuilder. Then you just set source and target language ("English US" and "Italian" in our case) and the output format (TMX). You can leave the other options alone.

Converting into TMX Converting into TMX

Now simply fire up Xbench, select Project>New, set the formato to tmx, load the file we just created and we’re done! So, let’s say that you have another case of amnesia. You are pretty sure that you translated Darskstreets before, but how was it in Italian? "Strade buie"? "Strade oscure"?

Well, just select the source words you are looking for ("Darskstreets"), then press Ctrl+Alt+Ins (or configure another combination, if you prefer). Xbench will pop up and show all the sentences that contain it ("Callefosca"!).You are than free to copy and paste what you need. The system works from any application. A process that took minutes of your time, is now instantaneous and can browse through tens of thousand of words! []

One step beyond: priority and key terms

Xbench allows you to load multiple translation memories and give them a priority. Let’s say that we have a tmx with last week’s translation, finalized and confirmed by the client, and another one with the work we did yesterday, still a bit rough and work in progress. We want the most reliable results to appear first, and the rest showing only when that possibility is exhausted, and that’s precisely what Xbench allows to do. Each translation memory can be set as high, medium or low priority (and its order of appearance be tweaked even further within each class). It may not seem much, but having results to appear in the order we want and color coded by how much we trust them, makes the process even faster and second nature (and that’s what we want, when the Xbench window will be called hundreds of times per day!)

Setting TMX priorities Setting TMX priorities

Thinking about our translation memories, there’s one we really keep in evidence, and that’s our glossary. No problem: we select the "Key terms" box and all its results will have a star next to them to be clearly recognizable.

The next level: quality assurance

We like our glossary. We worked hard to shape it and all the terms inside it are just how they should be. Clients too like their glossaries. Think about platform terminology: use the wrong translation for Xbox 360’s buttons, use the wrong verb to describe a Wiimote shake, dare to call the PlayStation 3 a console (and not a system) and the game you translate might be forbidden publishing. All of a sudden, it’s not only a matter of style, it’s making your work viable. Xbench can help. To start, set one or more tmx files as "Key terms" like we did above. Then prepare the text to be checked, converting it to tmx if needed, and add it to the project. Only thing, instead of being "Key terms", just set it as "Ongoing translation" Done! Now you just need to select the "QA" tab and start the analysis with "Check ongoing translation". I know, the first impact is a bit discouraging, but soldier on! It’s a bit like Word’s spellchecker: most of the results are false positives, but the real mistakes it flags are pure gold, as a normal human would never find them at all! [] Among the categories you can find

- Inconsistency in target: you have translated Darkstreets both as Callebuia and Callefosca. Give a look and make them consistent in one way or another

Inconsistency in source: you have translated both Darkstreets and Backstreets as Callefosca. Give a look and make them consistent if needed. However, be sure that it’s not the source being a being inconsistent (maybe they just wrote Darkstreets as Dark Streets). In that case you can leave it alone.

  • Numeric mismatch: the English text contains the number "2", but the Italian doesn’t. Often it’s just because it was written in word form ("due"), others because you just got the wrong number (and you might want to fix this ASAP)
  • Tag mismatch: check and fix
  • Double space: check and fix
  • KeyTerm Mismatch: Xbench checks the translation (ongoing translation) against the glossary (key terms) and puts here all instances that don’t match. Like before, there are many false positives, but just think how much time and efforts it saved you!

Once you get familiar with the process, you can add a final check. After the translation has been checked thoroughly and all the Key Term mismatches are fixed, create a new Xbench project with only the translation to be checked. Then flag it both as Ongoing translation and Key terms and start QA.

I know, it doesn’t sound very logical, but it’s a very powerful tool.

Imagine that you translated a menu option name at line 12 and another translator translated a string that mentions it at line 96000. The other translator didn’t know your translation because the translation memory wasn’t shared yet, and the first QA didn’t take it into consideration because it’s not a glossary term (after all, it’s just a menu title that appears twice in the whole project).

Your game may contain a tutorial that calls a menu with the wrong name! Shame on you! With the option above, Xbench will pick it. The menu title will be considered a Key Term, and when the rogue description calls it in the wrong way, it will be dutifully reported.

Good work!

Update (September 2014) You can see the content of this and previous posts inside our "Joe Freelancer VS the Mammoth Game Translation" presentation for Localization World 2011. We don’t really use these manual techniques today, as they are fully automated within memoQ, but I think this can still be useful for understanding what is really going on behind the scenes (and if you need to get a project off the ground immediately and without any budget)

Alain Dellepiane

Alain Dellepiane @gloc247 29 May 2011
Alain is the founder of team GLOC. Want to read more about localization? You should probably try this blog's Best of, which has a dozen of the best articles ready to read. (View all posts by Alain Ôם)

Stay updated.

Receive a monthly email with the best game localization news, papers and tools.

Check out the latest issue. Free of charge, zero spam, unsubscribe any time.