The Korean Association for the Study of English Language and Linguistics

[ Article ]

Korea Journal of English Language and Linguistics - Vol. 25, No. 0, pp.955-981

ISSN: 1598-1398 (Print) 2586-7474 (Online)

Print publication date 31 Jan 2025

Received 18 Apr 2025 Revised 13 May 2025 Accepted 30 Jun 2025

DOI: https://doi.org/10.15738/kjell.25..202507.955

Automated Analysis of ESL Interaction Tasks Using ChatGPT

Thomas Dillon

Professor, Foreign Language Education Centre Daegu Catholic University 13-13, Hayang-ro, Hayang-eup, Gyeongsan-si, Gyeongsangbuk-do, 38430, Korea dillon@cu.ac.kr

© 2025 KASELL All rights reserved
This is an open-access article distributed under the terms of the Creative Commons License, which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.

Abstract

This study explores the utility of ChatGPT, a large language model (LLM), for automated linguistic analysis in English as a Second Language (ESL) contexts. It examines whether ChatGPT can generate quantitative metrics, analyze learner prompts, assess vocabulary exposure, and evaluate questioning strategies. Ninety-nine CEFR A1 learners completed two structured chat tasks with ChatGPT. Data analysis was conducted using structured prompts and calibration procedures within ChatGPT-4o. Quantitative metrics (e.g., word counts, question types, sentence complexity) and qualitative classifications (e.g., vocabulary themes, follow-on question types) were generated by the model, formatted in .csv outputs, and partially verified through human-in-the-loop review. Results of transcript analysis indicate that ChatGPT effectively produces useful quantitative data including measures of sentence complexity and prompting skills. It also offers qualitative analysis of vocabulary exposure and investigative themes. Analysis of questioning skills revealed student ‘Wh’ word use and follow-on inquiry patterns. Despite noted strengths, ChatGPT showed limitations in analysis consistency, suggesting the need for teacher oversight. Recommendations include training educators in prompt-based analysis, guiding students in metric interpretation, and further validating LLM-generated data.

Keywords:

ChatGPT, automatic grading, vocabulary exposure, task based learning, interaction analysis, prompt literacy, questioning skills, metrics, AI in education, student interaction

Acknowledgments

This work was supported by the Daegu Catholic University Research Fund (20206001).

References

Akbarian, I. H., F. Farajollahi and R. M. J. Catalán. 2020. EFL learners’ lexical availability: exploring frequency, exposure, and vocabulary level. System 91, 102261. [https://doi.org/10.1016/j.system.2020.102261]
Araki, K., K. Karolczak, R. Rzepka and M. Mazur. (2016). A system for English vocabulary acquisition based on code-switching. International Journal of Distance Education Technologies 14, 52-75. [https://doi.org/10.4018/IJDET.2016070104]
Brown, T., B. Mann, N. Ryder, M. Subbiah, J. D. Kaplan, P. Dhariwal, ... and D. Amodei. (2020). Language models are few-shot learners. Advances in Neural Information Processing Systems 33, 1877-1901.
Chew, R., J. Bollenbacher, M. Wenger, J. Speer and A. Kim. (2023). LLM-assisted content analysis: using large language models to support deductive coding. arXiv preprint arXiv:2306.14924, .
Chiang, C. H. and H. Y. Lee. (2023). Can large language models be an alternative to human evaluations? arXiv preprint arXiv:2305.01937, .
Cho, B. E. (2004). Issues concerning Korean learners of English: English education in Korea and some common difficulties of Korean students. The East Asian Learner 1(2), 31-36.
Clark, M. (2013). The use of technology to support vocabulary development of English language learners. Master’s Thesis, St. John Fisher University.
Cong-Lem, N. and S. Daneshfar. (2024). Generative AI and second/foreign language education from Vygotsky’s cultural-historical perspective. In H.P. Bui and E. Namaziandost, eds., Innovations in Technologies for Language Teaching and Learning, 175-188. Springer Nature Switzerland. [https://doi.org/10.1007/978-3-031-63447-5_10]
Dai, W., Y. S. Tsai, J. Lin, A. Aldino, H. Jin, T. Li and G. Chen. (2024). Assessing the proficiency of large language models in automatic feedback generation: an evaluation study. Computers and Education: Artificial Intelligence 7, 100299. [https://doi.org/10.1016/j.caeai.2024.100299]
Dillon, T. (2024). Korean university students’ prompt literacy training with ChatGPT: investigating language learning strategies. English Teaching 79(3), 123-157. [https://doi.org/10.15858/engtea.79.3.202409.123]
Folse, K. S. (2004). The underestimated importance of vocabulary in the foreign language classroom. CLEAR News 8(2), 1-6. [https://doi.org/10.3998/mpub.23925]
Gao, J. (2021). Exploring the feedback quality of an automated writing evaluation system Pigai. International Journal of Emerging Technologies in Learning (iJET) 16(11), 322-330. [https://doi.org/10.3991/ijet.v16i11.19657]
Gozali, I., A. R. T. Wijaya, A. Lie, B. Y. Cahyono and N. Suryati. (2024). ChatGPT as an automated writing evaluation (AWE) tool: feedback literacy development and AWE tools' integration framework. JALT CALL Journal 20(1), n1. [https://doi.org/10.29140/jaltcall.v20n1.1200]
Ha, M. J. (2023). Computer-aided analysis of syntactic elaboration of written argumentation across topics and L1s. Computer Assisted Language Learning 24(1), 242-268.
Jeon, J. (2023). Chatbot-assisted dynamic assessment (CA-DA) for L2 vocabulary learning and diagnosis. Computer Assisted Language Learning 36(7), 1338-1364. [https://doi.org/10.1080/09588221.2021.1987272]
Jeon, J. and S. Lee. (2023). Large language models in education: a focus on the complementary relationship between human teachers and ChatGPT. Education and Information Technologies 28(12), 15873-15892. [https://doi.org/10.1007/s10639-023-11834-1]
Kim, M., S. Kim, S. Lee, Y. Yoon, J. Myung, H. Yoo and T. Y. Lee. (2024). Designing prompt analytics dashboards to analyze student-ChatGPT interactions in EFL writing. arXiv preprint arXiv:2405.19691, .
Latif, E. and X. Zhai. (2024). Fine-tuning ChatGPT for automatic scoring. Computers and Education: Artificial Intelligence 6, 100210. [https://doi.org/10.1016/j.caeai.2024.100210]
Lee, S., Y. Cai, D. Meng, Z. Wang and Y. Wu. (2024). Unleashing large language models’ proficiency in zero-shot essay scoring. In Findings of the Association for Computational Linguistics: EMNLP 2024, 181-198. [https://doi.org/10.18653/v1/2024.findings-emnlp.10]
Li, J., L. Gui, Y. Zhou, D. West, C. Aloisi and Y. He. (2023). Distilling ChatGPT for explainable automated student answer assessment. arXiv preprint arXiv:2305.12962, . [https://doi.org/10.18653/v1/2023.findings-emnlp.399]
Li, S., R. Ellis and Y. Zhu. (2016). Task-based versus task-supported language instruction: an experimental study. Annual Review of Applied Linguistics 36, 205-229. [https://doi.org/10.1017/S0267190515000069]
Linguistic Analysis Tools. TAACO: Tool for the automatic analysis of cohesion. Available online at https://www.linguisticanalysistools.org/taaco.html
Mayfield, E. and A. W. Black. (2020). Should you fine-tune BERT for automated essay scoring? In Proceedings of the Fifteenth Workshop on Innovative Use of NLP for Building Educational Applications ACL, 151 [https://doi.org/10.18653/v1/2020.bea-1.15]
Meyer, J., T. Jansen, R. Schiller, L. W. Liebenow, M. Steinbach, A. Horbach and J. Fleckenstein. (2024). Using LLMs to bring evidence-based feedback into the classroom: AI-generated feedback increases secondary students’ text revision, motivation, and positive emotions. Computers and Education: Artificial Intelligence 6, 100199. [https://doi.org/10.1016/j.caeai.2023.100199]
Mizumoto, A. and M. Eguchi. (2023). Exploring the potential of using an AI language model for automated essay scoring. Research Methods in Applied Linguistics 2(2), 100050. [https://doi.org/10.1016/j.rmal.2023.100050]
Pack, A., A. Barrett and J. Escalante. (2024). Large language models and automated essay scoring of English language learner writing: insights into validity and reliability. Computers and Education: Artificial Intelligence 6, 100234. [https://doi.org/10.1016/j.caeai.2024.100234]
Reiss, M. V. (2023). Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark. arXiv preprint arXiv:2304.11085, . [https://doi.org/10.31219/osf.io/rvy5p]
Riener, C. and D. Willingham. (2010). The myth of learning styles. Change: The Magazine of Higher Learning 42(5), 32-35. [https://doi.org/10.1080/00091383.2010.503139]
Santucci, V., F. Santarelli, L. Forti and S. Spina. (2020). Automatic classification of text complexity. Applied Sciences 10(20), 7285. [https://doi.org/10.3390/app10207285]
Shadiev, R. and Y. Feng. (2024). Using automated corrective feedback tools in language learning: a review study. Interactive Learning Environments 32(6), 2538-2566.
Uyar, A. C. and D. Büyükahıska. (2025). Artificial intelligence as an automated essay scoring tool: a focus on ChatGPT. International Journal of Assessment Tools in Education 12(1), 20-32. [https://doi.org/10.21449/ijate.1517994]
Wei, J., X. Wang, D. Schuurmans, M. Bosma, F. Xia, E. Chi, ... and D. Zhou. (2022). Chain-of-thought prompting elicits reasoning in large language models. Advances in Neural Information Processing Systems 35, 24824-24837.
Weng, X. and T. K. Chiu. (2023). Instructional design and learning outcomes of intelligent computer assisted language learning: systematic review in the field. Computers and Education: Artificial Intelligence 4, 100117. [https://doi.org/10.1016/j.caeai.2022.100117]
Yancey, K. P., G. Laflair, A. Verardi and J. Burstein. (2023). Rating short L2 essays on the CEFR scale with GPT-4. In Proceedings of the 18th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2023), 576-584. [https://doi.org/10.18653/v1/2023.bea-1.49]
Yıldız, T. A. (2023). The impact of ChatGPT on language learners’ motivation. Journal of Teacher Education and Lifelong Learning 5(2), 582-597. [https://doi.org/10.51535/tell.1314355]
Zhai, X. (2023). ChatGPT for next generation science learning. XRDS: Crossroads, The ACM Magazine for Students 29(3), 42-46. [https://doi.org/10.1145/3589649]
Zhao, Z., E. Wallace, S. Feng, D. Klein and S. Singh. (2021). Calibrate before use: improving few-shot performance of language models. In International Conference on Machine Learning PMLR, 12697-12706.