| HOME | Archives | About | For Authors |
| [ Article ] | |
| Korea Journal of English Language and Linguistics - Vol. 26, No. 0, pp. 127-141 | |
| Abbreviation: KASELL | |
| ISSN: 1598-1398 (Print) 2586-7474 (Online) | |
| Print publication date 31 Jan 2026 | |
| Received 10 Oct 2025 Revised 12 Jan 2026 Accepted 13 Jan 2026 | |
| DOI: https://doi.org/10.15738/kjell.26..202601.127 | |
| Probing Good-Enough Processing in Large Language Models with a Paraphrasing Task | |
Jonghyun Lee ; Jeong-Ah Shin
| |
| (First author) Assistant Professor, English Studies Major, Division of Global Studies Korea University Sejong Campus 2511 Sejong-ro, Jochiwon-eup Sejong, Korea (j-lee@korea.ac.kr) | |
| (Corresponding author) Professor, Department of English Language and Literature Dongguk University 30, Pildong-ro 1-gil, Jung-gu Seoul, Korea (jashin@dgu.ac.kr) | |
© 2026 KASELL All rights reserved This is an open-access article distributed under the terms of the Creative Commons License, which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited. | |
Funding Information ▼ | |
This study investigates whether large language models (LLMs) exhibit human-like ‘good-enough’ processing patterns in syntactic comprehension or demonstrate mechanical accuracy. Previous research using forced-choice question-answering paradigms revealed that LLMs display incomplete syntactic reanalysis similar to humans when processing garden-path sentences. However, concerns arose that these patterns might reflect methodological artifacts rather than genuine processing characteristics, as direct questioning could bias models toward initial misinterpretations. To address this limitation, we employed a paraphrasing task that requires comprehensive sentence reformulation rather than binary responses, following Patson et al. (2009). We tested GPT-3.5 and GPT-4 on 24 garden-path sentences containing Optionally Transitive (OT) and Reflexive Absolute Transitive (RAT) verbs. Results demonstrate that good-enough processing patterns persist across both paradigms, with LLMs continuing to exhibit partial reanalysis in garden-path conditions even when generating full paraphrases. This confirms that previously observed error patterns represent genuine syntactic processing characteristics rather than experimental artifacts. Notably, GPT-4 showed improved performance in the paraphrasing task compared to forced-choice experiments, suggesting task-dependent variation in processing depth. Both models exhibited human-like incomplete processing despite their substantial computational resources, indicating that their pattern-matching mechanisms favor processing shortcuts over complete syntactic interpretation. These findings reveal that LLMs demonstrate good-enough processing similar to humans, with performance varying systematically across task formats.
| Keywords: large language models, garden-path sentences, good-enough processing, syntactic processing, paraphrasing task, ChatGPT |
|
This work was supported by a Korea University Grant and the Ministry of Education of the Republic of Korea and the National Research Foundation of Korea (NRF-2025S1A5A8005032).
| 1. | Achiam, J., S. Adler, S. Agarwal, L. Ahmad, I. Akkaya, F. L. Aleman, ... and B. McGrew. 2023. GPT-4 technical report. arXiv preprint arXiv:2303.08774. |
| 2. | Ardoin, T., Y. Cai and G. Wunder. 2025. Where confabulation lives: Latent feature discovery in LLMs. In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, 29801-29825.![]() |
| 3. | Bacon, G. and T. Regier. 2019. Does BERT agree? Evaluating knowledge of structure dependence through agreement relations. arXiv preprint arXiv:1908.09892. |
| 4. | Bender, E. M., T. Gebru, A. McMillan-Major and S. Shmitchell. 2021. On the dangers of stochastic parrots: Can language models be too big? In Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency, 610-623.![]() |
| 5. | Bernardy, J. P. and S. Lappin. 2017. Using deep neural networks to learn syntactic agreement. Linguistic Issues in Language Technology 15(2), 1-15.![]() |
| 6. | Bever, T. G. 1970. The cognitive basis for linguistic structures. In J. R. Hayes, ed., Cognition and the Development of Language, 279-362. New York: Wiley and Sons. |
| 7. | Brown, T., B. Mann, N. Ryder, M. Subbiah, J. D. Kaplan, P. Dhariwal, ... and D. Amodei. 2020. Language models are few-shot learners. Advances in Neural Information Processing Systems 33, 1877-1901. |
| 8. | Bubeck, S., V. Chandrasekaran, R. Eldan, J. Gehrke, E. Horvitz, E. Kamar, ... and Y. Zhang. 2023. Sparks of artificial general intelligence: Early experiments with GPT-4. arXiv preprint arXiv:2303.12712. |
| 9. | Caplan, D. 2010. Task effects on BOLD signal correlates of implicit syntactic processing. Language and Cognitive Processes 25(6), 866-901.![]() |
| 10. | Caplan, D., E. Chen and G. Waters. 2008. Task-dependent and task-independent neurovascular responses to syntactic processing. Cortex 44(3), 257-275.![]() |
| 11. | Chaves, R. P. 2020. What don’t RNN language models learn about filler-gap dependencies? In Proceedings of the Society for Computation in Linguistics 3(1), 20-30. |
| 12. | Chowdhury, S. A. and R. Zamparelli. 2018. RNN simulations of grammaticality judgments on long-distance dependencies. In Proceedings of the 27th International Conference on Computational Linguistics, 133-144. |
| 13. | Christianson, K., A. Hollingworth, J. Halliwell and F. Ferreira. 2001. Thematic roles assigned along the garden path linger. Cognitive Psychology 42(4), 368-407.![]() |
| 14. | Cleary, A. M., A. J. Ryals and J. S. Nomi. 2018. Dependent measures in memory research: From free recall to recognition. In Handbook of Research Methods in Human Memory, 19-35. New York: Routledge.![]() |
| 15. | Davis, E. and G. Marcus. 2015. Commonsense reasoning and commonsense knowledge in artificial intelligence. Communications of the ACM 58(9), 92-103.![]() |
| 16. | Dreyfus, H. L. 1972. What Computers Can’t Do: The Limits of Artificial Intelligence. New York: Harper and Row. |
| 17. | Farquhar, S., J. Kossen, L. Kuhn and Y. Gal. 2024. Detecting hallucinations in large language models using semantic entropy. Nature 630, 625-630.![]() |
| 18. | Ferreira, F., K. G. D. Bailey and V. Ferraro. 2002. Good-enough representations in language comprehension. Current Directions in Psychological Science 11(1), 11-15.![]() |
| 19. | Ferreira, F. and C. Clifton, Jr. 1986. The independence of syntactic processing. Journal of Memory and Language 25(3), 348-368.![]() |
| 20. | Ferreira, F. and N. D. Patson. 2007. The ‘good enough’ approach to language comprehension. Language and Linguistics Compass 1(1-2), 71-83.![]() |
| 21. | Frances, C. 2024. Good-enough language processing: A satisficing approach to language comprehension and production. Language and Linguistics Compass 18(1), e12513. |
| 22. | Franck, J., S. Colonna and L. Rizzi. 2015. Task-dependency and structure-dependency in number interference effects in sentence comprehension. Frontiers in Psychology 6, 349.![]() |
| 23. | Frazier, L. and K. Rayner. 1982. Making and correcting errors during sentence comprehension: Eye movements in the analysis of structurally ambiguous sentences. Cognitive Psychology 14(2), 178-210.![]() |
| 24. | Futrell, R., E. Wilcox, T. Morita and R. Levy. 2018. RNNs as psycholinguistic subjects: Syntactic state and grammatical dependency. arXiv preprint arXiv:1809.01329. |
| 25. | Gibson, E. 1998. Linguistic complexity: Locality of syntactic dependencies. Cognition 68(1), 1-76.![]() |
| 26. | Gibson, E. 2000. The dependency locality theory: A distance-based theory of linguistic complexity. In Y. Miyashita, A. Marantz and W. O’Neil, eds., Image, Language, Brain: Papers from the First Mind Articulation Project Symposium, 95-126. Cambridge, MA: MIT Press.![]() |
| 27. | Gilbert, R. A., M. H. Davis, M. G. Gaskell and J. M. Rodd. 2021. The relationship between sentence comprehension and lexical-semantic retuning. Journal of Memory and Language 116, 104188.![]() |
| 28. | Goldberg, Y. 2019. Assessing BERT’s syntactic abilities. arXiv preprint arXiv:1901.05287. |
| 29. | Gulordava, K., P. Bojanowski, É. Grave, T. Linzen and M. Baroni. 2018. Colorless green recurrent networks dream hierarchically. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), 1195-1205.![]() |
| 30. | Hauser, M. D., N. Chomsky and W. T. Fitch. 2002. The faculty of language: What is it, who has it, and how did it evolve? Science 298(5598), 1569-1579.![]() |
| 31. | Hu, J., J. Gauthier, P. Qian, E. Wilcox and R. Levy. 2020. A systematic assessment of syntactic generalization in neural language models. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 1725-1744.![]() |
| 32. | Lampinen, A. K., I. Dasgupta, S. C. Chan, H. R. Sheahan, A. Creswell, D. Kumaran, ... and F. Hill. 2024. Language models, like humans, show content effects on reasoning tasks. PNAS Nexus 3(7), pgae233.![]() |
| 33. | Lee, J. and J.-A. Shin. 2023. Decoding BERT’s internal processing of garden-path structures through attention maps. Korean Journal of English Language and Linguistics 23, 461-481.![]() |
| 34. | Lee, J. and J.-A. Shin. 2025. Good-enough but more error-prone: Garden-path processing in GPT models. Linguistic Research 42(3), 539-579. |
| 35. | Lee, J., J.-A. Shin and M. K. Park. 2022. (AL)BERT down the garden path: Psycholinguistic experiments for pre-trained language models. Korean Journal of English Language and Linguistics 22, 1033-1050. |
| 36. | Levy, R. 2008. Expectation-based syntactic comprehension. Cognition 106(3), 1126-1177.![]() |
| 37. | Linzen, T. and N. Leonard. 2018. Distinct patterns of syntactic agreement errors in recurrent networks and humans. In Proceedings of the 40th Annual Conference of the Cognitive Science Society, 690-695. |
| 38. | Linzen, T., E. Dupoux and Y. Goldberg. 2016. Assessing the ability of LSTMs to learn syntax-sensitive dependencies. Transactions of the Association for Computational Linguistics 4, 521-535.![]() |
| 39. | Loftus, E. F. and J. E. Pickrell. 1995. The formation of false memories. Psychiatric Annals 25(12), 720-725.![]() |
| 40. | Madaan, L., D. Esiobu, P. Stenetorp, B. Plank and D. Hupkes. 2025. Lost in inference: Rediscovering the role of natural language inference for large language models. In Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies, 9229-9242.![]() |
| 41. | MacDonald, M. C. 2013. How language production shapes language form and comprehension. Frontiers in Psychology 4, 226.![]() |
| 42. | MacDonald, M. C., M. A. Just and P. A. Carpenter. 1992. Working memory constraints on the processing of syntactic ambiguity. Cognitive Psychology 24(1), 56-98.![]() |
| 43. | MacDonald, M. C., N. J. Pearlmutter and M. S. Seidenberg. 1994. The lexical nature of syntactic ambiguity resolution. Psychological Review 101(4), 676-703.![]() |
| 44. | Marvin, R. and T. Linzen. 2018. Targeted syntactic evaluation of language models. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, 1192-1202.![]() |
| 45. | Osterhout, L. and P. J. Holcomb. 1992. Event-related brain potentials elicited by syntactic anomaly. Journal of Memory and Language 31(6), 785-806.![]() |
| 46. | Patson, N. D., E. S. Darowski, N. Moon and F. Ferreira. 2009. Lingering misinterpretations in garden-path sentences: Evidence from a paraphrasing task. Journal of Experimental Psychology: Learning, Memory, and Cognition 35(1), 280-285.![]() |
| 47. | Qian, Z., S. Garnsey and K. Christianson. 2018. A comparison of online and offline measures of good-enough processing in garden-path sentences. Language, Cognition and Neuroscience 33(2), 227-254.![]() |
| 48. | Radford, A., J. Wu, R. Child, D. Luan, D. Amodei and I. Sutskever. 2019. Language models are unsupervised multitask learners. OpenAI Blog 1(8), 9. |
| 49. | Rayner, K. and L. Frazier. 1987. Parsing temporarily ambiguous complements. Quarterly Journal of Experimental Psychology 39A(4), 657-673.![]() |
| 50. | Salverda, A. P., M. Brown and M. K. Tanenhaus. 2011. A goal-based perspective on eye movements in visual world studies. Acta Psychologica 137(2), 172-180.![]() |
| 51. | Searle, J. R. 1980. Minds, brains, and programs. Behavioral and Brain Sciences 3(3), 417-424.![]() |
| 52. | Sharma, M., M. Tong, T. Korbak, D. Duvenaud, A. Askell, S. Bowman, E. Durmus, Z. Hatfield-Dodds, S. Johnston, S. Kravec, T. Maxwell, S. McCandlish, K. Ndousse, O. Rausch, N. Schiefer, D. Yan, M. Zhang and E. Perez. 2024. Towards Understanding Sycophancy in Language Models. In Proceedings of International Conference on Representation Learning 2024, 110-144. |
| 53. | Shultz, T. R., J. M. Wise and A. S. Nobandegani. 2025. Text understanding in GPT-4 versus humans. Royal Society Open Science 12(2), 241313.![]() |
| 54. | Slattery, T. J., P. Sturt, K. Christianson, M. Yoshida and F. Ferreira. 2013. Lingering misinterpretations of garden path sentences arise from competing syntactic representations. Journal of Memory and Language 69(2), 104-120.![]() |
| 55. | Tian, Y., T. Huang, M. Liu, D. Jiang, A. Spangher, M. Chen, ... and N. Peng. 2024. Are large language models capable of generating human-level narratives? In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 17659-17681.![]() |
| 56. | Trueswell, J. C., M. K. Tanenhaus and S. M. Garnsey. 1994. Semantic influences on parsing: Use of thematic role information in syntactic ambiguity resolution. Journal of Memory and Language 33(3), 285-318.![]() |
| 57. | Turpin, M., J. Michael, E. Perez and S. Bowman. 2023. Language models don’t always say what they think: Unfaithful explanations in chain-of-thought prompting. Advances in Neural Information Processing Systems 36, 74952-74965. |
| 58. | van Schijndel, M. and T. Linzen. 2018. Modeling garden path effects without explicit hierarchical syntax. In Proceedings of the 40th Annual Meeting of the Cognitive Science Society, 2603-2608. |
| 59. | Vaswani, A., N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, ... and I. Polosukhin. 2017. Attention is all you need. Advances in Neural Information Processing Systems 30. |
| 60. | Wilcox, E., R. Levy and R. Futrell. 2019. Hierarchical representation in neural language models: Suppression and recovery of expectations. In Proceedings of the 2019 ACL Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP, 181-190.![]() |
| 61. | Wilcox, E., R. Levy, T. Morita and R. Futrell. 2018. What do RNN language models learn about filler-gap dependencies? In Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP, 211-221.![]() |