Google Translate vs. DeepL: A quantitative evaluation of close-language pair translation (French to English)

  • Ahmad Yulianto Faculty of Languages and Arts, Universitas Negeri Semarang (UNNES), INDONESIA
  • Rina Supriatnaningsih Faculty of Languages and Arts, Universitas Negeri Semarang (UNNES), INDONESIA
Keywords: assessment, evaluation, human text, machine translation, metric


Machine translation has improved in quality and worked best when applied to language pair of the same language family. This research was aimed to assess the quality of Google Translate and DeepL in terms of accuracy and readability. French to English translation data of En attendant Godot playscript by GT and DeepL were evaluated. The English Original version (EO) of the text served as reference. Two quantitative methods were employed i.e., manual with SAE J2450 translation metric and automatic assessment with Coh Metrix tool. The result of manual assessment shows that GT and DeepL outputs passed the grade, scoring 84 and 99.04 respectively. Referring to CdT Rubric, a translation is good when it has 80 - 99 points. In Coh-Metrix result GT and DeepL scores varied. Statistical analysis with ANOVA shows that GT and DeepL are not significantly different from EO. EO mean score is 99.69, GT is 100.4 and DeepL is 100.78. In conclusion, DeepL scores higher in manual assessment, indicative of its accuracy while GT and DeepL are more or less the same in Coh-Metrix assessment. In terms of readability, DeepL offers better reading ease as proved by Flesch Reading Ease, Flesch-Kincaid Grade Level and Coh Metrix Readability formulas, all in favor of DeepL. Despite this statistical result, there are many things that GT and DeepL need to improve like world knowledge and ability to decipher lexical and structural ambiguities.  


Download data is not yet available.


