The CASMACAT Project (2012-2014) Built The Next Generation Translator's Workbench.

The CASMACAT project (2012-2014) built the next generation translator's workbench to improve productivity, quality, and work practices in the translation industry.

We carried out cognitive studies of actual unaltered translator behaviour based on key logging and eye tracking. The acquired data was examined for how interfaces with enriched information were used, to determine translator types and styles, and to build a cognitive model of the translation process.

Based on insights gained in the cognitive studies, we developed novel types of assistance to human translators and integrated them into a new workbench, consisting of an editor, a server, and analysis and visualisation tools. The workbench was designed in a modular fashion and can be combined with existing computer aided translation tools.

We developed new types of assistance along the following lines:

  • Interactive translation prediction, where the CASMACAT workbench makes suggestions to the human translator how to complete the translation. We adapted the existing interactive machine translation paradigm by adding input modalities, especially electronic pens and basing the suggestions on better exploitation of novel statistical machine translation models, such as ones based on syntactic structure.
  • Interactive editing, where the CASMACAT workbench provides additional information about the confidence of its assistance, integrates translation memories, and assists authoring and reviewing.
  • Adaptive translation models, where the CASMACAT workbench learns from the interaction with the human translator by updating and adapting its models instantly based on the translation choices of the user.

We demonstrated the workbench's effectiveness in extensive field tests of real-life practice of a translation agency. In addition, we also reached out to the wider language service industry and online volunteer translation platforms. The outcome of the CASMACAT project is available as open source software to industry, academia, and to individual end users.

Jesús González won the 2014 LRC Best Thesis Award for his work on CASMACAT.

WP1: User Interface Studies, Cognitive and User Modelling (CBS)

This WP lays the empirical foundations for the development of the CASMACAT workbench. A series of experiments will establish basic facts about translator behaviour in computer-aided translation, investigating the usefulness of visualisation option in post-editing and interactive translation, for different types of text and for translators with different degrees of expertise. We will investigate individual differences in translation behaviour, in particular translator types and translation styles. These experimental results inform the design of the CASMACAT editor (WP5); they will also be used to develop a computational cognitive model of the translation process. The cognitive model will be extended into user models which capture the differences between translators.

WP2: Interactive Translation Prediction (UPVLC)

On the one hand, current statistical MT systems are still far from achieving high-quality translation. On the other hand, conventional post-editing and computer-assisted translation do not take full advantage of current MT technology. To close this gap, IMT systems have been proposed with the aim of increasing translation productivity by incorporating human correction activities into the machine translation process itself.

Our goal in this project is to develop an innovative computer-assisted framework based on the IMT approach that will allow for the construction of systems that produce high-quality results by placing a human operator at the centre of the production process. The IMT paradigm embeds a statistical MT engine within an interactive editing environment. The human serves as the guarantor of high-quality; the role of the automated systems is to ensure increased productivity by proposing well-formed extensions to the current target text, which the operator may then accept, correct or ignore. Interactivity allows the system to take advantage of the human-validated portion of the text to improve the accuracy of subsequent predictions.

WP3: Interactive Editing (UEdin)

The most straight-forward integration of statistical machine translation into the human translation workflow is to task the human translator with post-editing the fully automatic output of a machine translation system. This work process is becoming more pervasive in the translation industry. At the same time, human translators tasked to perform post-editing of machine translation output generally dislike this activity and report higher strain and cognitive load. In this work package, we will develop new methods to aid editing of translations with the goal of not only increasing productivity but also to make the task less demanding.

WP4: Adaptive Translation Models (UPVLC)

The CASMACAT workbench will be a production system that will be trained on an existing set of stored translations and that will be used to create new translations in an IMT framework. Therefore, the models used will be estimated with the same techniques as for regular SMT. In addition, techniques for domain adaptation that have been proposed for SMT can also be applied for IMT. However, human interaction offers another unique opportunity to improve the performance of the IMT systems by tuning the translation models. In each iteration, the text obtained by means of additional user keystrokes to correct the suggestion produced by the IMT systems together with the corresponding aligned source segments can generally be converted into new, fresh training data, useful for adapting the system to changing environment.

WP5: Integration (CBS)

All the methods developed by the CASMACAT project will be integrated into a new translator's workbench, the CASMACAT workbench. Its development will take place throughout the project in three distinct stages: specification of requirements, implementation of core functionality, and integration of novel methods. While the CASMACAT workbench will be developed from scratch, it builds on the experience of the partners in implementing computer-aided translation software to study human translation processes.

WP6: Evaluation (CS)

In this work package, we will expose the CASMACAT workbench to a wider community of users and engage the localization industry to gain wider adoption. We will carry out field trials to study the use of the workbench in a real-world environment, promote it to community translation platforms and the language service industry.

WP6: Dissemination (CS)

The CASMACAT workbench will be of great interest to human translators, either acting as freelancers or organized by language service providers. We will reach the different type of potential users will be reached by different means. Freelance translators will be interested in hands-on tutorials on how to install and use the workbench. Larger language service providers will be interested in best practices of using the novel features of the workbench, and will need to integrate the technology into their existing infrastructure. For them, we will organize workshops where early adopters can share their experience.