Global Content Management

Feb 07, 2006

Globalization, Internationalization, Localization and Translation are all terms that describe making content readable in the world marketplace. Our CM Professionals community has chapters around the world and hopes to translate its Web site navigation and some content into several languages for outreach to new countries, but the challenges are formidable. So, what is the best way to manage a content globalization project?

First and foremost, you must learn to write for translation, which means to write simply, clearly and, above all, to write for reuse. This means tight control of your vocabulary, not just words but phrases and whole sentences and paragraphs that are critical to your message. Simplified Technical English (STE) provides a set of guidelines for clear writing that will lend itself to translation.

Keep in mind that localizing content is expensive, in direct proportion to the number of supported languages. If you can reuse key terms and important phrases you won't have to pay for new translations. The promise of Translation Memory (TM) tools like TRADOS and SDLX is "never translate the same sentence twice." Terms or segments of text (usually sentences) are stored with their translations in a database. When a translator enters new text, the TM tool looks for various "matches," perfect 100% matches, "fuzzy" matches with some of the words, or in-context exact (ICE) matches that recognize surrounding sentences.

Man v. Machine
Another option, very attractive at first glance, is Machine Translation (MT). Since computers were invented and broke cryptographic codes in World War II, programmers have seen translation as a similar problem to be solved any day. But the complexities, nuances, and fundamental ambiguity in all human languages (not just notorious English) baffle computer-aided translation (CAT) experts. Today, they speak more modestly about a machine assisting human translators with their work, or humans assisting the machine by clarifying ambiguities for the machine, in real time, as it attempts to translate.

Many Web sites offer MT as a promotional free service. Best known may be Altavista's Babelfish (powered by SYSTRAN), SDL-powered, and now Google, whose toolbar also incorporates word translations. Language Engineering Company (LEC) offers paid subscriptions to its Web-based translation services.

Even the most preliminary efforts to use MT will expose its weaknesses. One way to see how absurd MT can be is to "round trip" a translation. Ask the MT system to translate, and then request a back translation to the source language. If you get back what you put in, you are very lucky and you might even have a useful translation. However, it may not mean precisely what you want it to mean in the target language. To know this, you need a native speaker. The best way to find translators with expertise in your industry is to use a professional translators association. The largest of these is ProZ (I'm a member). You can submit samples of text to be translated and judge the quality of candidate translators by having expert reviewers comment on the translation quality using their unique KudoZ rating system. You will almost always need both a translator and an independent reviewer.

XML in Any Language
Once you have your translations, you need to keep them stored in a centralized tool that facilitates reuse. The Localisation Industry Standards Association (LISA) has developed a Translation Memory eXchange standard (TMX) and the Organization for the Advancement of Structured Information Standards (OASIS) has developed a similar standard called XML Localisation Interchange File Format (XLIFF). 

Whichever format you choose (being XML, files can be converted to one another using XSLT), you need something to read and write the files. Heartsome has developed cross-platform Java-based editors for both formats that are a fraction of the cost of standard TM tools (which are Windows only). Keeping your TM files in XML makes you independent of a particular tool supplier. TM industry-leader TRADOS fought the move to open XML files. Giant SDL International, which backed TMX and XLIFF strongly, last year acquired TRADOS, which had over 70% of the TM market to SDL's 20%. 

Somebody Help
AuthorAssistant is an intriguing new tool from SDL to help writers write more consistently and reuse existing corporate text assets. The goal is to leverage TM matching technology even for work in the original language. Imagine that as writers type, a huge database of already-written content is analyzed and provides "matches" to prompt authors to write the same thing again, instead of reinventing the phrase. A small firm called PreciseTerm offers similar tools using a Web service. 

SDL's Translation Management System and Idiom Technologies WorldServer are enterprise-level tools to optimize globalization projects, keeping track of translators and language service providers, providing spreadsheet analyses of localization costs for a project, sending email notifications to translators, etc. But six-figure costs prohibit these tools from our small CM Pros Web site.

Even with the best tools and translators, sometimes even a native speaker may not know current technical jargon. GlossPost offers a terrific collection of thousands of bilingual and multilingual glossaries on specific subjects. I'm attempting to collect a multilingual glossary of common terms for Web and Internet use. How do you say "click here" in French and Spanish? "Cliquez ici" and "Haga clic aqui" will get you by. CM Pros and I need phrases in more languages. We'll post the results to a resource Web site for localizers at Now, how do you say "help us" in ten languages?