Learning the distribution of English -al and -ar suffixes using deep neural networks
© 2023 KASELL All rights reserved
This is an open-access article distributed under the terms of the Creative Commons License, which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
Abstract
This study utilized an ensemble of recurrent and convolution neural networks, labeled deep neural networks (DNN) to learn the variable distribution of English suffixes -al and -ar. The DNN predictions were compared against the predictions of the maximum entropy phonotactic learner (PL). An examination of 1,479 adjectives suffixed with -al and -ar revealed that the suffix and the stem-final segment always underwent liquid dissimilation if the stem-final segment was a liquid (e.g. solar, plural). The suffix was -ar if the stem-final segment was /l/; conversely, the suffix -al occurred after /r/. The suffixes were found to vary if the stem-final segment was not liquid (e.g. local, lunar). The learning results revealed that the DNN exhibited higher classification accuracy (97.3%) than the PL (89.4%). The PL assigned higher or equal probabilities to unattested word forms than to attested ones in 10.5% of the test data. The DNN successfully learned the variable distribution patterns of the suffixes observed in the training data. The probability of the suffix -al being predicted by the DNN also effectively showed the gradual distance effects of liquids on liquid dissimilation and segmental blocking. The DNN model learned the sigmoid curve commonly observed in linguistic data.
Keywords:
English suffix, deep neural networks, liquid dissimilation, lateral dissimilation, classificationReferences
- Albright, A. 2009. Feature-based generalization as a source of gradient acceptability. Phonology 26, 9-41. [https://doi.org/10.1017/S0952675709001705]
- Beguš, G. 2020. Generative adversarial phonology: modeling unsupervised phonetic and phonological learning with neural networks. Frontiers in Artificial Intelligence 3, Article 44. [https://doi.org/10.3389/frai.2020.00044]
- Bennett, W. G. 2015. The Phonology of Consonants: Harmony, Dissimilation, and Correspondence, Cambridge: Cambridge University Press. [https://doi.org/10.1017/CBO9781139683586]
- Breiman, L. 2001. Random forests. Machine Learning 45, 5-32. [https://doi.org/10.1023/A:1010933404324]
- Carstairs-McCarthy, A. 2018. An Introduction to English Morphology. Edinburgh: Edinburgh University Press.
- Cho, H. 2021. Predicting the gender of Korean personal names using fastText. Studies in Phonetics, Phonology and Morphology 27(3), 483-500.
- Chomsky, N. and M. Halle. 1968. The Sound Pattern of English. Cambridge: The MIT Press.
- Cortes, C. and V. Vapnik. 1995. Support-vector networks. Machine Learning 20, 273-297. [https://doi.org/10.1007/BF00994018]
- Daland, R., B. Hayes, J. White, M. Garellek, A. Davis, I. and Normann. 2011. Explaining sonority projection effects. Phonology 28, 197-234. [https://doi.org/10.1017/S0952675711000145]
- Della Pietra, S., V. J. Della Pietra and J. D. Lafferty. 1997. Inducing features of random fields. IEEE Transactions on Pattern Analysis and Machine Intelligence 19, 380-393. [https://doi.org/10.1109/34.588021]
- Devlin, J., M.-W. Chang, K. Lee and K. Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), 4171–4186, Minneapolis, Minnesota. Association for Computational Linguistics.
- Domingos, P.M. 1999. MetaCost: a general method for making classifiers cost-sensitive. In Proceedings of the 5th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 155-164, San Diego, California. [https://doi.org/10.1145/312129.312220]
- Dressler, W. 1971. An alleged case of non-chronological rule insertion. Linguistic Inquiry 2, 597-599.
- Fisher, R.A. 1936. The Use of Multiple Measurements in Taxonomic Problems. Annals of Eugenics 7, 179-188. [https://doi.org/10.1111/j.1469-1809.1936.tb02137.x]
- Goldsmith, J. 1979. Autosegmental Phonology. New York: Garland.
- Goldwater, S. and M. Johnson. 2003. Learning OT constraint rankings using a maximum entropy model. In Proceedings of the Stockholm Workshop on Variation within Optimality Theory, ed. by J. Spenader, A. Eriksson, and O. Dahl, 111–120. Stockholm: Stockholm University, Department of Linguistics.
- Goodfellow, I., Y. Bengio and A. Courville. 2016. Deep Learning. The MIT Press.
- Hayes, B. 2022. Deriving the Wug-shaped curve: A criterion for assessing formal theories of linguistic variation. Annual Review of Linguistics 8, 473-94. [https://doi.org/10.1146/annurev-linguistics-031220-013128]
- Hayes, B. and C. Wilson. 2008. A maximum entropy model of phonotactics and phonotactic learning. Linguistic Inquiry 39, 379-440. [https://doi.org/10.1162/ling.2008.39.3.379]
- Haykin, S. 1994. Neural Networks: A Comprehensive Foundation. 2nd ed. Delhi: Pearson Education.
- Hochreiter, S. and J. Schmidhuber. 1997. Long short-term memory. Neural Computation 9(8), 1735-1780. [https://doi.org/10.1162/neco.1997.9.8.1735]
- Howard, J. and S. Ruder. 2018. Universal Language Model Fine-tuning for Text Classification. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, Volume 1 (Long Papers), 328-339, Melbourne, Australia. Association for Computational Linguistics. [https://doi.org/10.18653/v1/P18-1031]
- Kenstowicz, M. 1994. Phonology in Generative Grammar. Oxford: Blackwell.
- Kim, K. H. 2021. Simple Neural Text Classification. Available online at https://github.com/kh-kim/simple-ntc
- Kim, Y. 2014. Convolutional Neural Networks for Sentence Classification. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), 1746–1751, Doha, Qatar. Association for Computational Linguistics. [https://doi.org/10.3115/v1/D14-1181]
- Krizhevsky, A., I. Sutskever and G. E. Hinton. 2012. ImageNet classification with deep convolutional neural networks. In Proceedings of the 25th International Conference on Neural Information Processing Systems, Volume 1, 1097-1105. Lake Tahoe, Nevada.
- LeCun, Y., L. Bottou, Y. Bensio and P. Haffner. 1998. Gradient based learning applied to document recognition. Pro. IEEE 305. [https://doi.org/10.1109/5.726791]
- Linzen, T. 2019. What can linguistics and deep learning contribute to each other? Response to Pater. Language 95(1), e99-e108. [https://doi.org/10.1353/lan.2019.0015]
- Louviere, J., D. Hensher, J. Swait and W. Adamowicz. 2000. Stated Choice Methods: Analysis and Applications. Cambridge: Cambridge University Press. [https://doi.org/10.1017/CBO9780511753831]
- Mahowald, K. 2023. A discerning several thousand judgments: GPT-3 rates the article + adjective + numeral + noun construction. arXiv:2301.12564v2, [cs.CL] [https://doi.org/10.18653/v1/2023.eacl-main.20]
- Mayer, C. and M. Nelson. 2020. Phonotactic learning with neural language models. In Proceedings of Society for Computation in Linguistics. Volume 3, Issue 1, 149-159. https://aclanthology.org/2020.scil-1.36.pdf
- McCallum, A. and K. Nigam. 1998. A Comparison of Event Models for Naive Bayes Text Classification. Proceedings in Workshop on Learning for Text Categorization, AAAI’98, 41-48.
- McFadden, D. 1974. Conditional logit analysis of qualitative choice behavior. In P. Zarembka, ed., Frontiers in Econometrics, 105-142. New York: Academic Press.
- Mikolov, T., K. Chen, G. Corrado and J. Dean. 2013. Efficient Estimation of Word Representations in Vector Space. arXiv:1301.3781, [cs.CL]
- Mirea, N. and K. Bicknell. 2019. Using LSTMs to assess the obligatoriness of phonological distinctive features for phonotactic learning. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, 1595-1605, Florence, Italy. Association for Computational Linguistics. [https://doi.org/10.18653/v1/P19-1155]
- Schuster, M. and K. K. Paliwal. 1997. Bidirectional recurrent neural networks. IEEE Transactions on Signal Processing 45(11), 2673-2681. [https://doi.org/10.1109/78.650093]
- Park, K., S. You and S. Song. 2021. Not yet as native as native speakers: comparing deep learning predictions and human judgments. English Language and Linguistics 26(1), 199-228.
- Pater, J. 2019. Generative linguistics and neural networks at 60: Foundation, friction, and fusion. Language 95(1), e41-e74. [https://doi.org/10.1353/lan.2019.0009]
- Potts, C., J. Pater, K. Jesney, R. Bhatt and M. Becker. 2010. Harmonic Grammar with linear programming: From linear systems to linguistic typology. Phonology 27(1), 1-41. [https://doi.org/10.1017/S0952675710000047]
- Petersen, E. and C. Potts. 2023. Lexical Semantics with Large Language Models: A Case Study of English “break”. In Findings of the Association for Computational Linguistics: EACL 2023, 490–511, Dubrovnik, Croatia. Association for Computational Linguistics. [https://doi.org/10.18653/v1/2023.findings-eacl.36]
- Stanton, J. 2017. Segmental blocking in dissimilation: an argument for co-occurrence constraints. In Proceedings of the Annual Meetings Phonology 2016, Washington, DC. [https://doi.org/10.3765/amp.v4i0.3972]
- Sundermeyer, M., H. Ney and R. Schlüter. 2015. From feedforward to recurrent LSTM neural networks for language modeling. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 23(3), 517-529. [https://doi.org/10.1109/TASLP.2015.2400218]
- Vaswani, A., N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser and I. Polosukhin. 2017. Attention is all you need. In Proceedings of 31st Conference on Neural Information Processing Systems (NIPS 2017), 5998-6008, Long Beach, CA.
- Wilcox, E. G., R. Futrell and R. Levy. 2023. Using computational models to test syntactic learnability. Linguistic Inquiry 2023, 1-88. [https://doi.org/10.1162/ling_a_00491]
- Zhang, L., S. Wang. and B. Liu. 2018. Deep Learning for Sentiment Analysis: A Survey. arXiv: 1801.07883, [cs.CL] [https://doi.org/10.1002/widm.1253]
- Zymet, J. 2015. Distance-Based Decay in Long-Distance Phonological Processes. In Proceedings of the 32nd West Coast Conference on Formal Linguistics,72-81, Somerville, MA.