The Korean Association for the Study of English Language and Linguistics

[ Article ]

Korea Journal of English Language and Linguistics - Vol. 22, No. 0, pp.1033-1050

ISSN: 1598-1398 (Print) 2586-7474 (Online)

Print publication date 31 Jan 2022

Received 03 Sep 2022 Revised 25 Sep 2022 Accepted 30 Sep 2022

DOI: https://doi.org/10.15738/kjell.22..202210.1033

(AL)BERT Down the Garden Path: Psycholinguistic Experiments for Pre-trained Language Models

Jonghyun Lee ; Jeong-Ah Shin ; Myung-Kwan Park

(first author) Graduate Student (PhD), Dept. of English Language and Literature, Seoul National University museeq@snu.ac.kr
(corresponding author) Professor, Division of English Language and Literature, Dongguk University jashin@dongguk.edu
(co-author) Professor, Division of English Language and Literature, Dongguk University korgen2003@naver.com

© 2022 KASELL All rights reserved
This is an open-access article distributed under the terms of the Creative Commons License, which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.

Abstract

This study compared the syntactic capabilities of several neural language models (LMs) including Transformers (BERT / ALBERT) and LSTM and investigated whether they exhibit human-like syntactic representations through a targeted evaluation approach, a method to evaluate the syntactic processing ability of LMs using sentences designed for psycholinguistic experiments. By employing garden-path structures with several linguistic manipulations, whether LMs detect temporary ungrammaticality and use a linguistic cue such as plausibility, transitivity, and morphology is assessed. The results showed that both Transformers and LSTM exploited several linguistic cues for incremental syntactic processing, comparable to human syntactic processing. They differed, however, in terms of whether and how they use each linguistic cue. Overall, Transformers had a more human-like syntactic representation than LSTM, given their higher sensitivity to plausibility and ability to retain information from previous words. Meanwhile, the number of parameters does not seem to undermine the performance of LMs, contrary to what was predicted in previous studies. Through these findings, this research sought to contribute to a greater understanding of the syntactic processing of neural language models as well as human language processing.

Keywords:

targeted evaluation approach, transformers, garden-path structure, natural language processing, psycholinguistics

Acknowledgments

This work was supported by the Ministry of Education of the Republic of Korea and the National Research Foundation of Korea (NRF-2018S1A5A2A03031616).

References

Adams, B. C., C. Clifton and D. C. Mitchell. 1998. Lexical guidance in sentence processing? Psychonomic Bulletin & Review 5(2), 265-270. [https://doi.org/10.3758/BF03212949]
Baayen, R. H., D. J. Davidson and D. M. Bates. 2008. Mixed-effects modeling with crossed random effects for subjects and items. Journal of Memory and Language 59(4), 390-412. [https://doi.org/10.1016/j.jml.2007.12.005]
Bever, T. G. 1970. Cognitive basis for linguistic structures. In J. R. Hayes, ed., Cognition and the Development of Language, 279-362. New York: Wiley.
Devlin, J., M. W. Chang, K. Lee and K. Toutanova. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805
Demberg, V. and F. Keller. 2008. Data from eye-tracking corpora as evidence for theories of syntactic processing complexity. Cognition 109(2), 193-210. [https://doi.org/10.1016/j.cognition.2008.07.008]
Ettinger, A. 2020. What BERT is not: Lessons from a new suite of psycholinguistic diagnostics for language models. Transactions of the Association for Computational Linguistics 8, 34-48. [https://doi.org/10.1162/tacl_a_00298]
Frazier, L. 1987. Sentence processing: A tutorial review. In M. Coltheart, ed., Attention and Performance 12: The Psychology of Reading, 559-586. Lawrence Erlbaum Associates, Inc.
Frazier, L. and K. Rayner. 1982. Making and correcting errors during sentence comprehension: Eye movements in the analysis of structurally ambiguous sentences. Cognitive Psychology 14(2), 178-210. [https://doi.org/10.1016/0010-0285(82)90008-1]
Ferreira, F. and J. M. Henderson. 1991. Recovery from misanalyses of garden-path sentences. Journal of Memory and Language 30(6), 725-745. [https://doi.org/10.1016/0749-596X(91)90034-H]
Futrell, R., E. Wilcox, T. Morita and R. Levy. 2018. RNNs as psycholinguistic subjects: Syntactic state and grammatical dependency. arXiv preprint arXiv:1809.01329.
Futrell, R., E. Wilcox, T. Morita, P. Qian, M. Ballesteros and R. Levy. 2019. Neural language models as psycholinguistic subjects: Representations of syntactic state. arXiv preprint arXiv:1903.03260. [https://doi.org/10.18653/v1/N19-1004]
Goldberg, Y. 2019. Assessing BERT's syntactic abilities. arXiv preprint arXiv:1901.05287.
Goodkind, A. and K. Bicknell. 2018. Predictive power of word surprisal for reading times is a linear function of language model quality. In Proceedings of the 8th Workshop on Cognitive Modeling and Computational Linguistics (CMCL 2018), 10-18. [https://doi.org/10.18653/v1/W18-0102]
Gulordava, K., P. Bojanowski, E. Grave, T. Linzen and M. Baroni. 2018. Colorless green recurrent networks dream hierarchically. arXiv preprint arXiv:1803.11138. [https://doi.org/10.18653/v1/N18-1108]
Hu, J., J. Gauthier, P. Qian, E. Wilcox and R. P. Levy. 2020. A systematic assessment of syntactic generalization in neural language models. arXiv preprint arXiv:2005.03692, . [https://doi.org/10.18653/v1/2020.acl-main.158]
Jegerski, J. 2013. Self-paced reading. In J. Jegerski and B. VanPatten, eds., Research Methods in Second Language Psycholinguistics, 36-65. Routledge. [https://doi.org/10.4324/9780203123430]
Just, M. A. and P. A. Carpenter. 1980. A theory of reading: from eye fixations to comprehension. Psychological Review 87(4), 329-354. [https://doi.org/10.1037/0033-295X.87.4.329]
Kuncoro, A., L. Kong, D. Fried, D. Yogatama, L. Rimell, C. Dyer and P. Blunsom. 2020. Syntactic Structure Distillation Pretraining for Bidirectional Encoders. arXiv preprint arXiv:2005.13482. [https://doi.org/10.1162/tacl_a_00345]
Kuznetsova, A., P. B. Brockhoff and R. H. Christensen. 2017. lmerTest package: Tests in linear mixed effects models. Journal of Statistical Software 82, 1-26. [https://doi.org/10.18637/jss.v082.i13]
Lan, Z., M. Chen, S. Goodman, K. Gimpel, P. Sharma and R. Soricut. 2019. Albert: A lite bert for self-supervised learning of language representations. arXiv preprint arXiv:1909.11942.
Levy, R. 2008. Expectation-based syntactic comprehension. Cognition 106(3), 1126-1177. [https://doi.org/10.1016/j.cognition.2007.05.006]
Linzen, T., E. Dupoux and Y. Goldberg. 2016. Assessing the ability of LSTMs to learn syntax-sensitive dependencies. Transactions of the Association for Computational Linguistics 4, 521-535. [https://doi.org/10.1162/tacl_a_00115]
Marvin, R. and T. Linzen. 2018. Targeted syntactic evaluation of language models. arXiv preprint arXiv:1808.09031. [https://doi.org/10.18653/v1/D18-1151]
Pickering, M. J. and M. J. Traxler. 1998. Plausibility and recovery from garden paths: An eye-tracking study. Journal of Experimental Psychology: Learning, Memory, and Cognition 24(4), 940. [https://doi.org/10.1037/0278-7393.24.4.940]
Roberts, L. and C. Felser. 2011. Plausibility and recovery from garden paths in second language sentence processing. Applied Psycholinguistics 32(2), 299-331. [https://doi.org/10.1017/S0142716410000421]
Smith, N. J. and R. Levy. 2013. The effect of word predictability on reading time is logarithmic. Cognition 128(3), 302-319. [https://doi.org/10.1016/j.cognition.2013.02.013]
Staub, A. 2007. The parser doesn't ignore intransitivity, after all. Journal of Experimental Psychology: Learning, Memory, and Cognition 33(3), 550. [https://doi.org/10.1037/0278-7393.33.3.550]
Tenney, I., P. Xia, B. Chen, A. Wang, A. Poliak, R. T. McCoy ... and E. Pavlick. 2019. What do you learn from context? probing for sentence structure in contextualized word representations. arXiv preprint arXiv:1905.06316.
Trueswell, J. C., M. K. Tanenhaus and S. M. Garnsey. 1994. Semantic influences on parsing: Use of thematic role information in syntactic ambiguity resolution. Journal of Memory and Language 33(3), 285-318. [https://doi.org/10.1006/jmla.1994.1014]
van Gompel, R. P. and M. J. Pickering. 2001. Lexical guidance in sentence processing: A note on Adams, Clifton, and Mitchell (1998). Psychonomic Bulletin & Review 8(4), 851-857. [https://doi.org/10.3758/BF03196228]
van Schijndel, M. and T. Linzen. 2018. Modeling garden path effects without explicit hierarchical syntax. CogSci.
van Schijndel, M., A. Mueller and T. Linzen. 2019. Quantity doesn't buy quality syntax with neural language models. arXiv preprint arXiv:1909.00111. [https://doi.org/10.18653/v1/D19-1592]
Wilcox, E., R. Levy, T. Morita and R. Futrell. 2018. What do RNN Language Models Learn about Filler-Gap Dependencies? arXiv preprint arXiv:1809.00042. [https://doi.org/10.18653/v1/W18-5423]
Wilcox, E., R. Levy and R. Futrell. 2019. Hierarchical representation in neural language models: Suppression and recovery of expectations. arXiv preprint arXiv:1906.04068. [https://doi.org/10.18653/v1/W19-4819]