On Pronoun Prediction in the L2 Neural Language Model
© 2023 KASELL All rights reserved
This is an open-access article distributed under the terms of the Creative Commons License, which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
Abstract
In recent years, artificial neural(-network) language models (LMs) have achieved remarkable success in tasks involving sentence processing. However, despite leveraging the advantages of pre-trained neural LMs, our understanding of the specific syntactic knowledge acquired by these models during processing remains limited. This study aims to investigate whether L2 neural LMs trained on L2 English leaners’ textbooks can acquire syntactic knowledge similar to that of humans. Specifically, we examine the L2 LM’s ability to predict pronouns within the framework of previous experiments conducted with L1 humans and L1 LMs. Our focus is on pronominal coreference, a well-studied linguistic phenomenon in psycholinguistics that has been extensively investigated. This research expands on existing studies by exploring whether the L2 LM can learn Binding Condition B, a fundamental aspect of pronominal agreement. We replicate several previous experiments and examine the L2 LM’s capacity to exhibit human-like behavior in pronominal agreement effects. Consistent with the findings of Davis (2022), we provide further evidence that, like L1 LMs, the L2 LM fails to fully capture the range of behaviors associated with Binding Condition B, in comparison to L1 humans. Overall, neural LMs face challenges in recognizing the complete spectrum of Binding Condition B and are limited to capturing aspects of it only in specific contexts.
Keywords:
neural language model, binding condition, pronoun, surprisal, coreferenceAcknowledgments
This work was supported by the Dongguk University Research Fund of 2022 (S-2022-G0001-00134).
References
- Badecker, W. and Straub, K. 2002. The processing role of structural constraints on interpretation of pronouns and anaphors. Journal of Experimental Psychology: Learning, Memory, and Cognition 28(4), 748. [https://doi.org/10.1037/0278-7393.28.4.748]
- Bhattacharya, D. and van Schijndel, M. 2020. Filler-gaps that neural networks fail to generalize. Proceedings of the 24th Conference on Computational Natural Language Learning, 486-495. [https://doi.org/10.18653/v1/2020.conll-1.39]
- Chomsky, N. 1981. Lectures on Government and Binding: The Pisa lectures. De Gruyter Mouton.
- Chow, W. Y., Lewis, S. and Phillips, C. 2014. Immediate sensitivity to structural constraints in pronoun resolution. Frontiers in psychology 5, 630. [https://doi.org/10.3389/fpsyg.2014.00630]
- Choi, S. J., Park, M. K. and Kim, E. 2021. How are Korean neural language models 'surprised' layer wisely?. Journal of Language Sciences 28(4), 301-317. [https://doi.org/10.14384/kals.2021.28.4.301]
- Choi, S. J. and Park, M. K. 2022a. An L2 neural language model of adaptation to dative alternation in English. The Journal of Modern British & American Language & Literature 40(1), 143-159.
- Choi, S. J. and Park, M. K. 2022b. Syntactic priming by L2 LSTM language models. The Journal of Studies in Language 22, 547-562. [https://doi.org/10.21296/jls.2022.12.103.81]
- Choi, S. J. and Park, M. K. 2022c. An L2 neural Language model of adaptation. Korean Journal of English Language and Linguistics 37(4), 475-489.
- Choi, S. J. and Park, M. K. 2022d. Syntactic priming in the L2 neural language model. The Journal of Linguistic Science 103, 81-104. [https://doi.org/10.21296/jls.2022.12.103.81]
- Clifton, C., Kennison, S. M. and Albrecht, J. E. 1997. Reading the words her, his, him: implications for parsing Binding Conditions based on frequency and on structure. Journal of Memory and language 36(2), 276-292. [https://doi.org/10.1006/jmla.1996.2499]
- Dai, Z., Yang, Z., Yang, Y., Carbonell, J., Le, Q. V. and Salakhutdinov, R. 2019. Transformer-xl: Attentive language models beyond a fixed-length context. https://doi.org/10.48550/arXiv.1901.02860 [https://doi.org/10.18653/v1/P19-1285]
- Davis, F. L. 2022. On the Limitations of Data: Mismatches between Neural Models of Language and Humans. Doctoral dissertation, Cornell University, Ithaca, NY, USA.
- Devlin, J., Chang, M. W., Lee, K. and Toutanova, K. 2019. Bert: Pre-training of deep bidirectional transformers for language understanding. https://doi.org/10.48550/arXiv.1810.04805
- Gulordava, K., Bojanowski, P., Grave, E., Linzen, T. and Baroni, M. 2018. Colorless green recurrent networks dream hierarchically. https://doi.org/10.48550/arXiv.1803.11138 [https://doi.org/10.18653/v1/N18-1108]
- Hale, J. 2001. A probabilistic Earley parser as a psycholinguistic model. Proceedings of Second Meeting of the North American Chapter of the Association for Computational Linguistics, 1-8. [https://doi.org/10.3115/1073336.1073357]
- Hu, J., Gauthier, J., Qian, P., Wilcox, E. and Levy, R. P. 2020. A systematic assessment of syntactic generalization in neural language models. https://doi.org/10.48550/arXiv.2005.03692 [https://doi.org/10.18653/v1/2020.acl-main.158]
- Jumelet, J. and Hupkes, D. 2018. Do language models understand anything? on the ability of LSTMs to understand negative polarity items. https://doi.org/10.48550/arXiv.1808.10627 [https://doi.org/10.18653/v1/W18-5424]
- Kennison, S. M. 2003. Comprehending the pronouns her, him, and his: Implications for theories of referential processing. Journal of memory and language 49(3), 335-352. [https://doi.org/10.1016/S0749-596X(03)00071-8]
- Kim, E. 2020. The ability of L2 LSTM language models to learn the filler-gap dependency. Journal of the Korea Society of Computer and Information 25(11), 27-40.
- Kim, E., Montrul, S. and Yoon, J. 2015. The on-line processing of binding Binding Conditions in second language acquisition: Evidence from eye tracking. Applied Psycholinguistics 36(6), 1317-1374. [https://doi.org/10.1017/S0142716414000307]
- Kush, D. and Dillon, B. 2021. Binding Condition B constrains the processing of cataphora: Evidence for syntactic and discourse predictions. Journal of Memory and Language 120, 104254. [https://doi.org/10.1016/j.jml.2021.104254]
- Levy, R. 2008. Expectation-based syntactic comprehension. Cognition 106(3), 1126-1177. [https://doi.org/10.1016/j.cognition.2007.05.006]
- Linzen, T. and Leonard, B. 2018. Distinct patterns of syntactic agreement errors in recurrent networks and humans. https://doi.org/10.48550/arXiv.1807.06882
- Liu, X., He, P., Chen, W. and Gao, J. 2019. Improving multi-task deep neural networks via knowledge distillation for natural language understanding. https://doi.org/10.48550/arXiv.1904.09482
- Marvin, R. and Linzen, T. 2018. Targeted syntactic evaluation of language models. https://doi.org/10.48550/arXiv.1808.09031 [https://doi.org/10.18653/v1/D18-1151]
- Merity, S., Xiong, C., Bradbury, J. and Socher, R. 2016. Pointer sentinel mixture models. https://doi.org/10.48550/arXiv.1609.07843
- Nicol, J. L. 1988. Coreference Processing during Sentence Comprehension. Doctoral dissertation, Massachusetts Institute of Technology, Cambridge, MA, USA.
- Nicol, J. and Swinney, D. 1989. The role of structure in coreference assignment during sentence comprehension. Journal of psycholinguistic research 18, 5-19. [https://doi.org/10.1007/BF01069043]
- Radford, A., Wu, J., Child, R., Luan, D., Amodei, D. and Sutskever, I. 2019. Language models are unsupervised multitask learners. OpenAI blog 1(8), 9.
- Seo, H. J. and Shin, J. A. 2016. L2 processing of English pronouns and reflexives: Evidence from eye-movements. Korean Journal of English Language and Linguistics 16(4), 879-901. [https://doi.org/10.15738/kjell.16.4.201612.879]
- Smith, N. J. and Levy, R. 2013. The effect of word predictability on reading time is logarithmic. Cognition 128(3), 302-319. [https://doi.org/10.1016/j.cognition.2013.02.013]
- Van Gompel, R. P. and Liversedge, S. P. 2003. The influence of morphological information on cataphoric pronoun assignment. Journal of Experimental Psychology: Learning, Memory, and Cognition 29(1), 128. [https://doi.org/10.1037/0278-7393.29.1.128]
- Van Schijndel, M. and Linzen, T. 2018. Modeling garden path effects without explicit hierarchical syntax. Proceedings of Annual Meeting of the Cognitive Science Society: Changing Minds, CogSci 2018, 2603-2608.
- Van Schijndel, M. and Linzen, T. 2021. Single‐stage prediction models do not explain the magnitude of syntactic disambiguation difficulty. Cognitive science 45(6), 1-31. [https://doi.org/10.1111/cogs.12988]
- Warstadt, A., Parrish, A., Liu, H., Mohananey, A., Peng, W., Wang, S. F. and Bowman, S. R. 2020. BLiMP: The benchmark of linguistic minimal pairs for English. Transactions of the Association for Computational Linguistics 8, 377-392. [https://doi.org/10.1162/tacl_a_00321]
- Wilcox, E., Levy, R., Morita, T. and Futrell, R. 2018. What do RNN language models learn about filler-gap dependencies?. https://doi.org/10.48550/arXiv.1809.00042 [https://doi.org/10.18653/v1/W18-5423]
- Wilcox, E., Levy, R. and Futrell, R. 2019. What syntactic structures block dependencies in RNN language models?. https://doi.org/10.48550/arXiv.1905.10431