Aspiration Paper: Open Translation Tools

Aspiration has published a paper entitled "Open Translation Tools: Disruptive Potential to Broaden Access to Knowledge", documenting learnings and outcomes from the first-ever Open Translation Tools Convergence. The event brought together two passionate communities: those creating open source software tools to support translating open content, and those with a need for better tools to support translation of the open content they create.

OTT07 VideoIn addition to the paper, participants were interviewed for the event video Voices From Open Translation 2007, sharing their views on the open translation movement, and reflecting on where they envision the field evolving in the future. In the 10-minute piece, developers and content creators discuss the mandates for open content and open source, and the natural marriage between the two communities of practice.

OTT07 VideoOTT07 participants also mapped out and categorized almost 50 open translation tools, and the results are now published using Aspiration's Social Source Commons platform. Have a look, and if we missed any tools you know about, please add them!

Open Translation Tools 2007 was co-organized by Aspiration and Multimedia Institute (MI2), and was supported by the generosity of the Open Society Institute, with additional support provided by TechSoup.

The paper documents the state of open translation tools, as mapped out by participants at the event. The full executive summary from the paper is included below.

Executive Summary

The first-ever Open Translation Tools Convergence (OTT07) took place from 29 November to 1 December, 2007 in Zagreb, Croatia. The event brought together two passionate communities: those creating open source software tools to support translating open content, and those with a need for better tools to support translation of the open content they create. “Open content” was interpreted to encompass a range of resource types available under open licenses such as Creative Commons (CC) and Free Document License (FDL), ranging from books to manuals to documents to blog posts to multimedia. “Open translation tool” was interpreted to encompass any piece of software which supports or performs language translation, and which is distributed under a free or open source software (FOSS) license. This paper describes the learnings and outcomes from the event.

Open Translation Tools was co-organized by Aspiration and Multimedia Institute (MI2), and was supported by the generosity of the Open Society Institute, with additional support for participant travel provided by TechSoup. The OTT07 agenda was collaboratively developed by participants and event organizers in the time leading up to and during the gathering, and the proceedings were directed using Aspiration's collaborative approach to event facilitation.

The event focused on a relatively specific category of translation tools: those which support or enable the human translation of text content. Participants engaged in two parallel paths of learning and mapping: one team documented available open translation tools and technologies, categorizing and differentiating the available offerings, while another team oversaw the enumeration of a set of “use cases” which describe how publishers of open content want to be able to translate and manage that content.

While not intended to be comprehensive, the use cases documented at OTT07 span a broad range of scenarios and needs, and serve as an excellent sampler of the types of translation requirements which exist in the open content community and beyond. The use cases were grouped into 7 categories: content translation, multimedia translation, translation workflow, machine translation, Interpreting, content rendering, and localization. In parallel with enumerating use cases, participants assembled a collection of considerations to be addressed when considering a translation strategy; those are included later in this paper. The complete set of open translation use cases can be found in the Appendix I.

The field of translation is in a state of transition, and software tools to support language translation are evolving with corresponding rapidity. Increasingly available internet resources are quickly expanding the possible and the practical when it comes to translating content, and processes and business models which have remained relatively staid for decades are being rethought. The state of open translation tool offerings reflects the same flux. Real-time access to a global network of translation services and talent is a resource only now beginning to be leveraged by the translation industry, and upstart multilingual projects on the internet are pushing the state of the art by treating translation as an exercise in distributed problem solving. In addition, most open translation tools are only now beginning to incorporate substantial workflow into the software, tracking user roles, permissions and detailed state information for each translation project. From the RSS-enabled platforms like Worldwide Lexicon, which automates translation requests and submissions, to crowd-sourced tools like dotSUB, which though not open source still employs an open approach to data and translation for subtitling digital videos, open translation tools are demonstrating their ability to not only track but also outpace closed and proprietary counterparts.

The goal at OTT07 was to take a snapshot of the open translation tools that were available, and in turn to analyze the gaps. Participants engaged in a range of brainstorming and knowledge capture activities; collaborative mapping of available tools served as a prelude to a discussion of what a “dream translation tool” might provide. Tools were grouped into 7 categories: PO and XLIFF localization editors, translation workflow, subtitling, machine translation, translation memory, dictionary and glossary, and wiki translation. As with almost any collection of software tools, these categories blur and overlap on a tool-by-tool basis; the categories are somewhat arbitrary and many tools fall into more than one. A complete listing of open translation tools mapped out at the event is provided as Appendix II of this paper.

A primary point of discussion at OTT07 was “what's missing” in open translation tools. While a range of issues were identified, two primary functionality gaps surfaced in a range of conversations: workflow support for managing and tracking the broad range of translation tasks and processes, and distributed translation with memory aggregation, offering remote translators the ability to contribute translations to sites of their choosing and to have those translation mappings stored for use in future translations. Another often-identified gap was the lack of integration and interoperability between tools; different communities have their own toolsets, but it is difficult for a translation project to make coherent use of a complete toolset. Among the interoperability issues which require further attention in the open translation tools ecology are common API's that enable tools to share data and requests, plugins for web content management systems (CMS's) to export content into PO files, better tool integration features including shared glossaries, common user interfaces and subsystems, rich file import/export, and finally generic code libraries for common feature requirements.

As part of the tool mapping exercises at OTT07, participants were asked to envision what their “dream translation tool” might look like. The idea was to specify a feature set for a tool which does not yet exist, but which would meet the broadest range of translation needs in terms of features, supported workflows, and business models. This was a purely theoretical exercise; participants generally agreed that large monolithic tools were not the right course for the future, and that a small, distributed set of tools that work well together was the recommended path for better supporting open translation efforts. The resulting work of software fiction and vision is described starting on page 22.

Several significant emerging tools and technology trends were highlighted during the proceedings at OTT07. One of the most compelling innovations shared at OTT07 was the unique use of RSS (Rich Site Syndication) being made by projects like Worldwide Lexicon to syndicate and distribute translation requests. While the focus of OTT07 was on human translation and the software tools that support those processes, machine translation (MT) was often discussed for its substantial role in the human translation process; participants agreed that MT would not be able to generate publication-quality translations in the foreseeable future, but that with the increasing availability of translation corpora, MT would have an increasingly important role in supporting the work of human translators and their translation tools. One of the most valuable but arguably under-developed aspects of open translation involves the concept of “translation memory” (TM), the process of storing translated sentence pairs, one in the source language and one in a target language; such memories are rarely shared or collectively maintained, and are instead constantly reinvented. And ironically, FOSS projects do not always take advantage of standards for storing translations; by adopting standard file formats and other data standards, such projects could contribute to shared translation memories and create major forward momentum for better translation and localization.

A number of related discussions took place during the three days of OTT07. One focus was on the lack of visibility for existing tools; while there are compelling open translation tools and technologies available, very few users are aware of their existence. Licensing for open content translation was also a central issue; translation of open content raises several issues regarding how the source content is licensed, especially with regard to the creation of derivative works. Also, the significance of regional and cultural issues in translation work can not be overstated; as norms and values vary, a range of secondary connotations and associations must be considered in crafting appropriate translations. These topics surfaced in a number of sessions on the agenda.

OTT07 generated a range of collaboration ideas and emerging plans. The most exciting project idea presented was to collaboratively author an Open Translation Book that would provide an overview of the emerging field and associated issues, while also explaining how to use open source translation tools to implement various open content translation use cases. On online community called Translator Commons was proposed as a destination where practitioners of open translation could go to find support and guidance for their practice, as well as resources such as regional and language-pair-specific translation knowledge and style guides. “Corpora Commons” was proposed as an initiative to support open source machine translation by aggregating translation memories from documents translated by the United Nations and individual governments, who have copious volumes of translated text in a range of language pairs. And participants at OTT07 called out API (Application Programmer Interface) design and adoption as a critical objective in growing the open translation movement. While many tools use standard data formats such as PO, XLIFF, and TMX, very few expose API's which would allow other tools to easily transfer data or invoke services remotely.

The event also engendered collaborations between participating projects. Meedan and Worldwide Lexicon (WWL) got better acquainted at OTT07, and are now partnered and seeking funding to build social-network-oriented translation tools which will tap the distributed translation talent spread across the internet. The Inkscape project and FLOSS Manuals first met at OTT07, and hosted a Book Sprint sponsored by Google in July 2008 in Paris.

319.01 KB