The Korean Association for the Study of English Language and Linguistics

[ Article ]

Korea Journal of English Language and Linguistics - Vol. 25, No. 0, pp.661-685

ISSN: 1598-1398 (Print) 2586-7474 (Online)

Print publication date 31 Jan 2025

Received 08 Mar 2025 Revised 21 Apr 2025 Accepted 13 May 2025

DOI: https://doi.org/10.15738/kjell.25..202505.661

Investigating the Impact of Interlocutor Type on English Oral Proficiency Interviews: A Comparative Analysis of Chatbot and Human Interlocutors

Yongkook Won ; Sunhee Kim

(First author) Visiting Researcher, Center for Educational Research Seoul National University purgatorio@snu.ac.kr
(Corresponding author) Associate Professor, Department of French Language Education, Adjunct Professor, Learning Sciences Research Institute Seoul National University sunhkim@snu.ac.kr

© 2025 KASELL All rights reserved
This is an open-access article distributed under the terms of the Creative Commons License, which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.

Abstract

Considering the recent emergence of voice chatbots as substitutes for human interlocutors in eliciting spoken responses during English oral proficiency interviews, this study examines how interlocutor type affects both fluency and holistic scores. Data were collected from 32 Korean college students, yielding 128 audio recordings across four distinct topics of varying complexity, with each topic administered via both chatbot and human interlocutors. Fluency features were analyzed using Praat software, while fluency and holistic scores were evaluated via many-facet Rasch measurement (MFRM) analyses by two raters. Results from Friedman and Wilcoxon tests indicate that both task complexity and interlocutor type influence temporal measures, although task complexity exerts a stronger effect on dysfluency measures. MFRM analyses further show that chatbot interlocutor difficulty significantly affects fluency but not holistic scoring, indicating distinct difficulty levels between interlocutors only in fluency scoring. Overall, these findings highlight both the potential and limitations of employing chatbot interlocutors in place of human interlocutors in oral proficiency interviews.

Keywords:

oral proficiency interview, artificial intelligence, English as a foreign language, speaking tests, chatbot

References

Abida, F. I. N., R. Kuswardani, O. Purwati, A. Rosyid and E. Minarti. 2023. Assessing language proficiency through AI chatbot-based evaluations. Proceedings of International Conference on Islamic Civilization and Humanities 1, 138-145.
Adamopoulou, E. and L. Moussiades. 2020. Chatbots: History, technology, and applications. Machine Learning with Applications 2, 100006. [https://doi.org/10.1016/j.mlwa.2020.100006]
Ayedoun, E., Y. Hayashi and K. Seta. 2015. A conversational agent to encourage willingness to communicate in the context of English as a foreign language. Procedia Computer Science 60, 1433-1442. [https://doi.org/10.1016/j.procs.2015.08.219]
Azizimajd, H. 2023. Investigating the impacts of voice-based student-chatbot interactions in the classroom on EFL learners’ oral fluency and foreign language speaking anxiety. Technology Assisted Language Education 1(2), 61-83.
Bachman, L. F. 2004. Statistical Analyses for Language Assessment, Cambridge University Press. [https://doi.org/10.1017/CBO9780511667350]
Bavelas, J. B., L. Coates and T. Johnson. 2000. Listeners as co-narrators. Journal of Personality and Social Psychology 79(6), 941-952. [https://doi.org/10.1037//0022-3514.79.6.941]
British Council. 2023. Interlocutor. Available online at https://www.teachingenglish.org.uk/professional-development/teachers/knowing-subject/d-h/interlocutor, .
Brown, A. 2003. Interviewer variation and the co-construction of speaking proficiency. Language Testing 20(1), 1-25. [https://doi.org/10.1191/0265532203lt242oa]
Brown, A. 2012. Interlocutor and rater training. In G. Fulcher and F. Davidson, eds., The Routledge Handbook of Language Testing, 413-425. Routledge.
Chartrand, T. L. and J. A. Bargh. 1999. The chameleon effect: The perception-behavior link and social interaction. Journal of Personality and Social Psychology 76(6), 893-910. [https://doi.org/10.1037//0022-3514.76.6.893]
Chichon, J. 2019. Factors influencing overseas learners’ Willingness to Communicate (WTC) on a pre-sessional programme at a UK university. Journal of English for Academic Purposes 39, 87-96. [https://doi.org/10.1016/j.jeap.2019.04.002]
Clark, H. H. 2002. Speaking in time. Speech Communication 36(1-2), 5-13. [https://doi.org/10.1016/S0167-6393(01)00022-X]
Cohen, J. 1992. A power primer. Psychological Bulletin 112(1), 155-159. [https://doi.org/10.1037//0033-2909.112.1.155]
Coniam, D. 2014. The linguistic accuracy of chatbots: Usability from an ESL perspective. Text & Talk 34(5), 545-567. [https://doi.org/10.1515/text-2014-0018]
Cotos, E. 2014. Oral English Certification Test (OECT): Rater Manual, Iowa State University.
Council of Europe. 2020. Common European Framework of Reference for Languages: Learning, Teaching, Assessment (Companion Volume). Available online at https://rm.coe.int/common-european-framework-of-reference-for-languages-learning-teaching/16809ea0d4
Davis, L. 2009. The influence of interlocutor proficiency in a paired oral assessment. Language Testing 26(3), 367-396. [https://doi.org/10.1177/0265532209104667]
de Jong, N. H., J. Pacilly and W. Heeren. 2021. PRAAT scripts to measure speed fluency and breakdown fluency in speech automatically. Assessment in Education: Principles, Policy & Practice 28(4), 456-476. [https://doi.org/10.1080/0969594X.2021.1951162]
Eckes, T. 2015. Introduction to Many-Facet Rasch Measurement: Analyzing and Evaluating Rater-Mediated Assessments, Peter Lang GmbH.
Field, A. 2009. Discovering Statistics using SPSS, Sage Publications.
Fillmore, C. J. 1979. On fluency. In C. J. Fillmore, D. Kempler and W. S. Y. Wang, eds., Individual Differences in Language Ability and Language Behavior, 85-101. Academic Press. [https://doi.org/10.1016/B978-0-12-255950-1.50012-3]
Foster, P., A. Tonkyn and G. Wigglesworth. 2000. Measuring spoken language: A unit for all reasons. Applied Linguistics 21(3), 354-375. [https://doi.org/10.1093/applin/21.3.354]
Fryer, L. and R. Carpenter. 2006. Bots as language learning tools. Language Learning & Technology 10(3), 8-14.
Fryer, L. K., M. Ainley, A. Thompson, A. Gibson and Z. Sherlock. 2017. Stimulating and sustaining interest in a language course: An experimental comparison of chatbot and human task partners. Computers in Human Behavior 75, 461-468. [https://doi.org/10.1016/j.chb.2017.05.045]
Fryer, L., D. Coniam, R. Carpenter and D. Lăpușneanu. 2020. Bots for language learning now: Current and future directions. Language Learning & Technology 24(2), 8-22.
Fulcher, G. 2003. Testing Second Language Speaking, Pearson.
García Laborda, J., S. Madarova and T. Magal Royo. 2024. Issues in the design and implementation of chatbots for oral language assessment. Journal of Research in Applied Linguistics 15(2), 43-54.
Han, D.-E. 2020. The effects of voice-based AI chatbots on Korean EFL middle school students’ speaking competence and affective domains. Asia-pacific Journal of Convergent Research Interchange 6(7), 71-80. [https://doi.org/10.47116/apjcri.2020.07.07]
Hill, J., W. R. Ford and I. G. Farreras. 2015. Real conversations with artificial intelligence: A comparison between human–human online conversations and human–chatbot conversations. Computers in Human Behavior 49, 245-250. [https://doi.org/10.1016/j.chb.2015.02.026]
Hou, Y.-C. 2006. A cross-cultural study of the perception of apology: Effect of contextual factors, exposure to the target language, interlocutor ethnicity and task language. Unpublished master’s thesis, National Sun Yat-sen University, Taiwan.
Hsu, M.-H., C. Pei-Shih, and C.-S. Yu. 2023. Proposing a task-oriented chatbot system for EFL learners speaking practice. Interactive Learning Environments 31(7), 4297-4308. [https://doi.org/10.1080/10494820.2021.1960864]
Huang, W., K. F. Hew and L. K. Fryer. 2022. Chatbots for language learning—Are they really useful? A systematic review of chatbot-supported language learning. Journal of Computer Assisted Learning 38(1), 237-257. [https://doi.org/10.1111/jcal.12610]
Isbell, D. and P. Winke. 2019. ACTFL Oral Proficiency Interview–computer (OPIc). Language Testing 36(3), 467-477. [https://doi.org/10.1177/0265532219828253]
Jacewicz, E., R. A. Fox, C. O’Neill and J. Salmons. 2009. Articulation rate across dialect, age, and gender. Language Variation and Change 21(2), 233-256. [https://doi.org/10.1017/S0954394509990093]
Jeon, J. and S. Lee. 2024. The impact of a chatbot-assisted flipped approach on EFL learner interaction. Educational Technology & Society 27(4), 218-234.
Kennedy, O., N. Kuwahara, T. Noble and C. Fukada. 2024. The effects of teacher nodding: Exploring mimicry, engagement, and wellbeing in the EFL classroom. Frontiers in Education, 9. [https://doi.org/10.3389/feduc.2024.1361965]
Kim, H.-S., Y. Cha and N. Y. Kim. 2021. Effects of AI chatbots on EFL students’ communication skills. Korean Journal of English Language and Linguistics 21, 712-734. [https://doi.org/10.22251/jlcci.2021.21.10.37]
Kim, H., D. K. Shin, H. Yang and J. H. Lee. 2019. A study of AI chatbot as an assistant tool for school English curriculum. Journal of Learner-Centered Curriculum and Instruction 19(1), 89-110. [https://doi.org/10.22251/jlcci.2019.19.1.89]
Kim, Y. 2020. Analysis of chatbots and chatbot builders for English language learning. Multimedia-Assisted Language Learning 23(4), 161-182.
Kohnke, L., B. L. Moorhouse and D. Zou. 2023. ChatGPT for language teaching and learning. RELC Journal 54(2), 537-550. [https://doi.org/10.1177/00336882231162868]
Kormos, J. and M. Dénes. 2004. Exploring measures and perceptions of fluency in the speech of second language learners. System 32(2), 145-164. [https://doi.org/10.1016/j.system.2004.01.001]
Kosmala, L. 2022. Exploring the status of filled pauses as pragmatic markers: The role of gaze and gesture. Pragmatics & Cognition 29(2), 272-296. [https://doi.org/10.1075/pc.21020.kos]
Lakin, J. L., V. E. Jefferis, C. M. Cheng and T. L. Chartrand. 2003. The chameleon effect as social glue: Evidence for the evolutionary significance of nonconscious mimicry. Journal of Nonverbal Behavior 27(3), 145-162. [https://doi.org/10.1023/A:1025389814290]
Lambert, C., Kormos, J., & Minn, D. (2017). Task repetition and second language speech processing. Studies in Second Language Acquisition 39(1), 167–196. [https://doi.org/10.1017/S0272263116000085]
Language Testing International. 2018. ACTFL OPIc Examinee Handbook. Available online at https://www.languagetesting.com/pub/media/wysiwyg/PDF/opic-examinee-handbook.pdf
Lazaraton, A. 1996. Interlocutor support in oral proficiency interviews: The case of CASE. Language Testing 13(2), 151-172. [https://doi.org/10.1177/026553229601300202]
Lennon, P. 1990. Investigating fluency in EFL: A quantitative approach. Language Learning 40(3), 387-417. [https://doi.org/10.1111/j.1467-1770.1990.tb00669.x]
Lennon, P. 2000. The lexical element in spoken second language fluency. In H. Riggenbach, ed., Perspectives on Fluency, 25-42. University of Michigan Press.
Linacre, J. M. 1994. Sample size and item calibration stability. Rasch Measurement Transactions 7(4), 328.
Linacre, J. M. 2014. A User’s Guide to FACETS (Version 3.80) [Computer software]. Available online at https://www.winsteps.com/a/Facets-ManualPDF.zip
Liu, X. J., J. Wang and B. Zou. 2025. Evaluating an AI speaking assessment tool: Score accuracy, perceived validity, and oral peer feedback as feedback enhancement. Journal of English for Academic Purposes 75, 101505. [https://doi.org/10.1016/j.jeap.2025.101505]
Mauldin, M. L. 1994. ChatterBots, TinyMuds, and the Turing Test: Entering the Loebner Prize competition. Proceedings of the AAAI Conference on Artificial Intelligence, 16-21.
May, L. 2011. Interactional competence in a paired speaking test: Features salient to raters. Language Assessment Quarterly 8(2), 127-145. [https://doi.org/10.1080/15434303.2011.565845]
McNamara, T. F. 1996. Measuring Second Language Performance, Longman.
McNamara, T. F. and T. Lumley. 1997. The effect of interlocutor and assessment mode variables in overseas assessments of speaking skills in occupational settings. Language Testing 14(2), 140-156. [https://doi.org/10.1177/026553229701400202]
Mora, J. C., I. Mora-Plaza and G. Bermejo Miranda. 2024. Speaking anxiety and task complexity effects on second language speech. International Journal of Applied Linguistics 34(1), 292-315. [https://doi.org/10.1111/ijal.12494]
Morton, J., G. Wigglesworth and D. Wiliams. 1997. Approaches to the evaluation of the interviewer performance in oral interaction tests. In G. Brindley and G. Wigglesworth, eds., Access: Issues in English Language Test Design and Delivery, 175-196. National Centre for English Language Teaching and Research.
NaturalSoft Ltd. 2023. NaturalReader [Computer software]. Available online at https://www.naturalreaders.com/index.html
Norris, J. M. and L. Ortega. 2009. Towards an organic approach to investigating CAF in instructed SLA: The case of complexity. Applied Linguistics 30(4), 555-578. [https://doi.org/10.1093/applin/amp044]
Ockey, G. J. and E. Chukharev-Hudilainen. 2021. Human versus computer partner in the paired oral discussion test. Applied Linguistics 42(5), 924-944. [https://doi.org/10.1093/applin/amaa067]
Ockey, G. J. and Z. Li. 2015. New and not so new methods for assessing oral communication. Language Value 7(1), 1-21. [https://doi.org/10.6035/LanguageV.2015.7.2]
Ockey, G. J., E. Chukharev-Hudilainen and R. R. Hirch. 2023. Assessing interactional competence: ICE versus a human partner. Language Assessment Quarterly 20(4-5), 377-398. [https://doi.org/10.1080/15434303.2023.2237486]
Peltonen, P. 2022. Connections between measured and assessed fluency in L2 peer interaction: A problem-solving perspective. International Review of Applied Linguistics in Language Teaching 60(4), 983-1011. [https://doi.org/10.1515/iral-2020-0030]
Plough, I. C., S. L. Briggs and S. Van Bonn. 2010. A multi-method analysis of evaluation criteria used to assess the speaking proficiency of graduate student instructors. Language Testing 27(2), 235-260. [https://doi.org/10.1177/0265532209349469]
Riggenbach, H. 1991. Toward an understanding of fluency: A microanalysis of nonnative speaker conversations. Discourse Processes 14(4), 423-441. [https://doi.org/10.1080/01638539109544795]
Robinson, P. 2001. Task complexity, cognitive resources, and syllabus design: A triadic framework for examining task influences on SLA. In P. Robinson, ed., Second Language Task Complexity: Researching the Cognition Hypothesis of Language Learning and Performance, 287-318. Cambridge University Press. [https://doi.org/10.1017/CBO9781139524780.012]
Robinson, P. 2011. Task-based language learning: A review of issues. Language Learning 61(s1), 1-36. [https://doi.org/10.1111/j.1467-9922.2011.00641.x]
Ross, S. and R. Berwick. 1992. The discourse of accommodation in oral proficiency interviews. Studies in Second Language Acquisition 14(2), 159-176. [https://doi.org/10.1017/S0272263100010809]
Sato, M. 2014. Exploring the construct of interactional oral fluency: Second language acquisition and language testing approaches. System 45, 79-91. [https://doi.org/10.1016/j.system.2014.05.004]
Song, M.-Y. 2017. Nonnative raters’ perceptions and judgments of Korean English learners’ fluency and pronunciation level. Korean Journal of English Language and Linguistics 17(4), 787-815. [https://doi.org/10.15738/kjell.17.4.201712.787]
Sung, M.-C. 2019. Development of a flowchart-based English-speaking chatbot for Korean primary students’ negotiation of meaning. Primary English Education 25(4), 101-122. [https://doi.org/10.25231/pee.2019.25.4.101]
Tai, T.-Y. and H. H.-J. Chen. 2022. The impact of intelligent personal assistants on adolescent EFL learners’ listening comprehension. Computer Assisted Language Learning 37(3), 1-28. [https://doi.org/10.1080/09588221.2022.2040536]
Timpe-Laughlin, V., T. Sydorenko. and P. Daurio. 2022. Using spoken dialogue technology for L2 speaking practice: What do teachers think? Computer Assisted Language Learning 35(5-6), 1194-1217. [https://doi.org/10.1080/09588221.2020.1774904]
Towell, R. 2002. Relative degrees of fluency: A comparative case study of advanced learners of French. International Review of Applied Linguistics in Language Teaching 40(2), 117-150. [https://doi.org/10.1515/iral.2002.005]
Towell, R., R. Hawkins and N. Bazergui. 1996. The development of fluency in advanced learners of French. Applied Linguistics 17(1), 84-119. [https://doi.org/10.1093/applin/17.1.84]
Van Moere, A. 2013. Raters and ratings. In A. J. Kunnan, ed., The Companion to Language Assessment, 1358-1374. John Wiley & Sons, Inc. [https://doi.org/10.1002/9781118411360.wbcla106]
Wang, C., B. Zou, Y. Du and Z. Wang. 2024. The impact of different conversational generative AI chatbots on EFL learners: An analysis of willingness to communicate, foreign language speaking anxiety, and self-perceived communicative competence. System 127, 103533. [https://doi.org/10.1016/j.system.2024.103533]
Wilson, M. and T. P. Wilson. 2005. An oscillator model of the timing of turn-taking. Psychonomic Bulletin and Review 12(6), 957-968. [https://doi.org/10.3758/BF03206432]
Wollny, S., J. Schneider, D. Di Mitri, J. Weidlich, M. Rittberger and H. Drachsler. 2021. Are we there yet? A systematic literature review on chatbots in education. Frontiers in Artificial Intelligence 4, 654924. [https://doi.org/10.3389/frai.2021.654924]
Won, Y. 2020. The effect of task complexity on test-takers’ performance in a performance-based L2 oral communication test for international teaching assistants. Journal of the Korea English Education Society 19(1), 27-52.
Won, Y., and S. Kim. 2023. The impact of topic selection on lexico-grammatical errors and scores in English oral proficiency interviews of Korean college students. Education Sciences 13(7), 695. [https://doi.org/10.3390/educsci13070695]
Wu, T.-T., I. P. Hapsari and Y.-M. Huang. 2025. Effects of incorporating AI chatbots into think–pair–share activities on EFL speaking anxiety, language enjoyment, and speaking performance. Computer Assisted Language Learning, 1-39. [https://doi.org/10.1080/09588221.2025.2478271]
Xu, Y., D. Wang, P. Collins, H. Lee and M. Warschauer. 2021. Same benefits, different communication patterns: Comparing children’s reading with a conversational agent vs. a human partner. Computers & Education 161, 104059. [https://doi.org/10.1016/j.compedu.2020.104059]
Yang, H., H. Kim, J. H. Lee and D. Shin. 2022. Implementation of an AI chatbot as an English conversation partner in EFL speaking classes. ReCALL 34(3), 327-343. [https://doi.org/10.1017/S0958344022000039]
Yang, J. 2022. Perceptions of preservice teachers on AI chatbots in English education. International Journal of Internet, Broadcasting and Communication 14(1), 44-52.
Zhang, A. 2017. Speech Recognition (Version 3.8) [Computer software]. Available online at https://github.com/Uberi/speech_recognition#readme, .
Zhang, M., L. Yao, S. J. Haberman and N. J. Dorans. 2020. Assessing scoring accuracy and assessment accuracy for spoken responses using human and machine scores. In K. Zechner and K. Evanini, eds., Automated Speaking Assessment: Using Language Technologies to Score Spontaneous Speech, 32-58. Routledge. [https://doi.org/10.4324/9781315165103-3]