The Korean Association for the Study of English Language and Linguistics

[ Article ]

Korea Journal of English Language and Linguistics - Vol. 24, No. 0, pp.257-276

ISSN: 1598-1398 (Print) 2586-7474 (Online)

Print publication date 31 Jan 2024

Received 20 Jan 2024 Revised 27 Feb 2024 Accepted 25 Mar 2024

DOI: https://doi.org/10.15738/kjell.24..202404.257

A Corpus-based Multilingual Comparison of AI-based Machine Translations

Cuilin Liu ; Se-Eun Jhang ; Homin Park ; Hyunjong Hahm

(First author) PhD student National Korea Maritime & Ocean University Claire1182904043@outlook.com
(Corresponding author) Professor National Korea Maritime & Ocean University jhang@kmou.ac.kr
(Co-author) Researcher Electronics and Telecommunications Research Institute hominpark@etri.re.kr
(Co-author) Associate Professor University of Guam hhahm@triton.uog.edu

© 2024 KASELL All rights reserved
This is an open-access article distributed under the terms of the Creative Commons License, which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.

Abstract

The present study aims to investigate whether, and to what extent, the corpus linguistic technique type-token ratio (TTR) is valid in identifying the quality of translation productions produced by different AI-based machine translation (MT) systems. Specifically, this study examined the discourse-level discrepancies of MT outputs generated by Google Translate, DeepL and ChatGPT 3.5 on the discourse level utilizing a self-complied multilingual corpus of English translations for the short story Eveline in Korean and Chinese. For this purpose, we calculated the TTR separately for different text segments within a moving span of running word-tokens and visualized the results with a two-dimensional approach. In addition, to verify the validity of this TTR method in predicting the discrepant qualities of the three MT systems, we took a comprehensive reference of three metrics (Bilingual Evaluation Understudy, BLEU; Metric for Evaluation of Translation with Explicit Ordering, METEOR; Recall-Oriented Understudy for Gisting Evaluation, ROUGE) that are commonly used to evaluate the quality of MTs. The paper demonstrated the validity of TTR graphs in assessing the quality of a particular MT system. The findings corroborate the argument in previous studies that AI-based MT produced less lexical diversity and information density.

Keywords:

type-token ratio (TTR), span, text structure, machine translation, Google Translate, DeepL, ChatGPT 3.5

Acknowledgments

The first draft of this paper was presented orally at the KACL-KASELL Summer Joint Conference, Korea University on June 3, 2023. Some parts of the first draft were also presented orally at the Joint Forum between Korea Maritime & Ocean University and The University of Kitakyushu held at The University of Kitakyushu, Japan on July 27, 2023.

References

Baker, M. 1993. Corpus linguistics and translation studies: Implications and applications. In M. Baker, G. Francis and E. Tognini-Bonelli, eds., Text and Technology, 233-250. Amsterdam: John Benjamins. [https://doi.org/10.1075/z.64.15bak]
Banerjee, S. and A. Lavie. 2005. METEOR: An automatic metric for MT evaluation with improved correlation with human judgments. In J. Goldstein, A. Lavie, C. Lin and C. Voss, eds., Proceedings of the ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization, 65-72. Michigan: Association for Computational Linguistics.
Bentivogli, L., A. Bisazza, M. Cettolo and M. Federico. 2016. Neural versus phrase-based machine translation quality: A case study. In J. Su, K. Duh and X. Carreras, eds., Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, EMNLP, 257-267. Austin: Association for Computational Linguistics. [https://doi.org/10.18653/v1/D16-1025]
Brezina, V. 2018. Statistics in Corpus Linguistics: A Practical Guide. Cambridge: Cambridge University Press. [https://doi.org/10.1017/9781316410899]
Brglez, M. and Š. Vintar. 2022. Lexical diversity in statistical and neural machine translation. Information 13(2), 93-107. [https://doi.org/10.3390/info13020093]
Broeder, P., G. Extra and R. V. Hout. 1993. Richness and variety in the developing lexicon. In C. Perdue, ed., Adult Language Acquisition: Cross-linguistic Perspectives. Vol. I: Field Methods, 145-163. Cambridge: Cambridge University Press.
Castilho, S., N. Resende and R. Mitkov. 2019. What influences the features of post-editese? A preliminary study. In Proceedings of the Human-Informed Translation and Interpreting Technology Workshop (HiT-IT 2019), 19-27. Shoumen, Bulgaria: Incoma Ltd. [https://doi.org/10.26615/issn.2683-0078.2019_003]
Chafe, W. 1987. Cognitive constraints on information flow. In R. Tomlin, ed., Coherence and Grounding in Discourse, 21-51. Amsterdam: John Benjamins. [https://doi.org/10.1075/tsl.11.03cha]
Chafe, W. 1994. Discourse, Consciousness, and Time: The Flow and Displacement of Conscious Experience in Speaking and Writing. Chicago: The University of Chicago Press.
Costa, Â., W. Ling, T. Luís, R. Correia and L. Coheur. 2015. A linguistically motivated taxonomy for machine translation error analysis. Machine Translation 29(2), 127-161. [https://doi.org/10.1007/s10590-015-9169-0]
Engber, C. 1995. The relationship of lexical proficiency to the quality of ESL compositions. Journal of L2 Writing 4(2), 138-155. [https://doi.org/10.1016/1060-3743(95)90004-7]
Ferris, D. 2011. Treatment of Error in Second Language Student Writing. Michigan: University of Michigan Press. [https://doi.org/10.3998/mpub.2173290]
Francis, W. N. and H. Kučera. 1982. Frequency Analysis of English Usage: Lexicon and Grammar. Houghton Mifflin.
Hart, C. 1969. Eveline. In C. Hart, ed., James Joyce’s ‘Dubliners’, 48-52. London: Faber and Faber.
Ilisei, I.-N. 2012. A Machine Learning Approach to the Identification of Translation Language: An Inquiry into Translationese Learning Models. Doctoral Dissertation, University of Wolverhampton, England, UK.
Kruger, H. 2012. A corpus-based study of the mediation effect in translated and edited language. Target. International Journal of Translation Studies 24(2), 355-388. [https://doi.org/10.1075/target.24.2.07kru]
Laufer, B. and P. Nation. 1995. Vocabulary size and use: lexical richness in L2 written production. Applied Linguistics 16(3), 307-322. [https://doi.org/10.1093/applin/16.3.307]
Lee, S. M. and N. Briggs. 2021. Effects of using machine translation to mediate the revision process of Korean university students' academic writing, ReCALL, 33(1), 18–33. [https://doi.org/10.1017/S0958344020000191]
Lin, C.-Y. 2004. Rouge: A package for automatic evaluation of summaries. In Proceedings of Workshop Text Summarization Branches Out, 74-81. Barcelona: Association for Computational Linguistics.
Malvern, D., B. Richards, N. Chipere and P. Durán. 2004. Lexical Diversity and Language Development: Quantification and Assessment. Basingstoke: Palgrave Macmillan. [https://doi.org/10.1057/9780230511804]
McCarthy, P. M. 2005. An Assessment of the Range and Usefulness of Lexical Diversity Measures and the Potential of the Measure of Textual, Lexical Diversity (MTLD). Doctoral dissertation, The University of Memphis, Memphis, TN, USA.
O’Halloran, K. 2007. The subconscious in James Joyce’s ‘Eveline’: A corpus stylistic analysis that chews on the ‘Fish hook’. Language and Literature 16(3), 227-244. [https://doi.org/10.1177/0963947007072847]
O’Loughlin, K. 1995. Lexical density in candidate output on direct and semi-indirect versions of an oral proficiency test. Language Testing 12(2), 217-237. [https://doi.org/10.1177/026553229501200205]
Papineni, K., S. Roukos, T. Ward and W. Zhu. 2002. BLEU: A method for automatic evaluation of machine translation. In P. Isabelle, E. Charniak and D. Lin, eds., Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (ACL), 311-318. Association for Computational Linguistics. [https://doi.org/10.3115/1073083.1073135]
Roberts, N., D. Liang, G. Neubig and Z. C. Lipton. 2020. Decoding and diversity in machine translation. arXiv:2011.13477, [cs.CL].
Rubino, R., E. Lapshinova-Koltunski and J. Genabith. 2016. Information density and quality estimation features as translationese Indicators for human translation classification. In K. Knight, A. Nenkova and O. Rambow, eds., Proceedings of NAACL-HLT 2016, 960-970. Association for Computational Linguistics. [https://doi.org/10.18653/v1/N16-1110]
Sakoe, H. and S. Chiba. 1978. Dynamic programming algorithm optimization for spoken word recognition. IEEE Transactions on Acoustics, Speech, and Signal Processing 26(1), 43-49. [https://doi.org/10.1109/TASSP.1978.1163055]
Shin, D. and Y. V. Chon. 2023. Second language learners’ post-editing strategies for machine translation errors. Language Learning & Technology 27(1), 1-25.
Stubbs, M. 2001. Words and Phrases: Corpus Studies of Lexical Semantics. Massachusetts: Blackwell.
Teich, E. 2003. Cross-linguistic Variation in System and Text: A Methodology for the Investigation of Translations and Comparable Texts. Berlin: Mouton de Gruyter. [https://doi.org/10.1515/9783110896541]
Toral, A. 2019. Post-editese: An exacerbated translationese. In M. Forcada, A. Way, B. Haddow and R. Sennrich, eds., Proceedings of Machine Translation Summit XVII: Research Track, 273-281. Dublin: European Association for Machine Translation.
Tuldava, J. 1998. Probleme und Methoden der Quantitativ-systermischen Lexikologie. [Translated from Russian original (1987)]. Verlag: Wissenschaftlicher Verlag Trier.
Vanmassenhove, E., D. Shterionov and A. Way. 2019. Lost in translation: Loss and decay of linguistic richness in machine translation. In M. Forcada, A. Way, B. Haddow and R. Sennrich, eds., Proceedings of Machine Translation Summit XVII: Research Track, 222-232. Dublin: European Association for Machine Translation.
Vanmassenhove, E., D. Shterionov and M. Gwilliam. 2021. Machine translationese: Effects of algorithmic bias on linguistic complexity in machine translation. In P. Merlo, J. Tiedemann and R. Tsarfaty, eds., Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics, 2203-2213. Association for Computational Linguistics. [https://doi.org/10.18653/v1/2021.eacl-main.188]
Vermeer, A. 2000. Coming to grips with lexical richness in spontaneous speech data. Language Testing 17(1), 65-84. [https://doi.org/10.1177/026553220001700103]
Youmans, G. 1991. A new tool for discourse analysis: The vocabulary management profile. Language 67(4), 763-789. [https://doi.org/10.2307/415076]
Yule, G. U. 1944. The Statistical Study of Literary Vocabulary. Cambridge: Cambridge University Press.