Translationese — A brief introduction
Translationese is a phenomenon that is present mostly in human translations, which means the patterns corresponding to systematic differences in the translated text from the original text. The differences may be w.r.t. syntax, semantics, and discourse of human translations — humans tend to make translations more similar to each other. You must have come across an instance of translationese but did not really spot it because the changes are subtle many a times. Here’s a quick example, a native of the US would say that something costs “ten-fifty”, but a non-native translator might say that it costs “ten dollars fifty cents”. And this is perfectly correct, but it would unidiomatic, and this is a type of mistake that a non-native speaker might make.
Translationese is categorized into 5 types: Source Interference, Normalization, Implicitation, Explicitation, and Simplification.
Source Interference occurs when translation replicates the typical patterns of the source language which is rare in the target language. So, the source language shines through the translation. [3] For example,
Original: Den hvite mannen knipser
Translated: The white man clicks.
Here, the verb “knipser” in the Norwegian sentence means to click, and it’s implicit that it means clicking a camera. However, the translated English sentence does not really imply clicking a camera. Hence, the source language shines through the translation.
Normalization occurs when a translator tries to normalize the sentence to match the patterns of the target language. For instance, a translator tends to be more conservative while translating. It has been reported that translators use ‘that’ more often than authors of original English texts, since ‘that’ is usually omitted in an informal setting and occurs more in formal texts along with some formal and less common verbs. [1][2]
Implicitation occurs when the translated text is more implicit than the corresponding source text. It is the non-verbalization of information that a person might or might not be able to infer. For example,
Original: Go out — Translated: Sortir
Original: Come out — Translated: Sortir
Here is an example of English-to-French translation, go out/come out, both can be translated to “Sortir”, and the meaning is inferred by the native French.
Explicitation occurs when a translated text is more explicit than the corresponding source text. It is basically saying something that a person might have understood anyway. For example,
Sentence: the animal didn’t cross the street because it was too tired.
Explicit Sentence: the animal didn’t cross the street because the animal was too tired.
In the above example, it is inferred that the animal is tired, however, a translator might explicitize it.
Simplification occurs when a translator simplifies the translation. For example, Vanderauweran [4] found in her study that in English translations of Dutch novels, potentially ambiguous pronouns are replaced by more precisely identifiable forms. She also reports that “where quotation marks fail to distinguish a person’s speech or thought in the source text, they are almost invariably restored in the target text”. Simplification is also sometimes considered a type of explicitation since it raises the level of explicitness by resolving ambiguity.
It is unclear what exactly causes translationese, and the patterns vary according to the language pair in consideration. While it is difficult for humans to observe it most of the time, they tend to be better identified by machine learning models.
But, is translationese a performance barrier?
Cross-lingual NLP tasks are heavily dependent on translations by humans. There has been a lot of research that proves that translationese patterns affect the performance of many tasks. It has been found that the translation of source sentences that are a result of translation is easier to translate [5]. Cross-lingual models are seen to suffer performance-wise due to the presence of translationese. Hence, there has been a lot of work to identify and reduce translationese [6][7][8].
But there is this paper [9] that argues that machine translation systems, especially neural MTs focus so much on performance and robustness that they overlook other important phenomena. It has been found that NMTs often tend to “hallucinate” content that is unfaithful to the input document. This was first detected in the open-ended abstraction summarization task [10]. Models are expected to generate text that is more human-like thus containing more content-related words. Open-ended generation is different than document summarization tasks where the model is expected to be factual and faithful to the input document. Additionally, models are usually agnostic to the artifacts of the training data (eg. reference divergence, etc.). These factors make them vulnerable to hallucinations. And therefore, NMT models are also sometimes required to be faithful to the source language depending on the task, which means bringing source interference to the model. The researchers in the paper found that NMTs have a strong tendency towards robustness rather than faithfulness. And hence will always pay significantly less importance to the translationese phenomenon.
So the question is — do we want translationese or not? We won’t know for sure until we know our use case. Previous research has shown that we need a trade-off between robustness and faithfulness, which opens up the possibility to develop systems that are autonomous in selecting which property is best suited for a problem. We also want reliable methods to identify these patterns and NMT systems are currently state of the art in terms of translationese classification. But again, why is NMT better at identifying these patterns? There are numerous questions in translationese because it is such a subtle and intriguing phenomenon, which makes it an exciting research problem with several opportunities.
That’s it for the brief intro to translationese, I hope you found it interesting! I’d like to hear your feedback and comments. You can also contact me on Linkedin.
References
[1] Olohan & Baker (2000). Reporting that in translated English. Evidence for subconscious processes of explicitation?
[2] Becher, V. (2011). Explicitation and implicitation in translation. A corpus-based study of English-German and German-English translations of business texts
[3] Baker, M. (2019). Corpus Linguistics and Translation Studies*: Implications and applications.
[4] Vanderauwera (2022). Dutch novels translated into English: The transformation of a minority literature.
[5] Toral et al. (2018). Attaining the unattainable? reassessing claims of human parity in neural machine translation.
[6] Chowdhury et al. (2022). Towards Debiasing Translation Artifacts.
[7] Vanmassenhove et al. (2021). Machine translationese: Effects of algorithmic bias on linguistic complexity in machine translation.
[8] Rabinovich & Wintner (2015). Unsupervised identification of translationese.
[9] Parthasarathi et al. (2021) Sometimes We Want Translationese
[10] Maynez et al. (2020) On Faithfulness and Factuality in Abstractive Summarization