The Korean Association for the Study of English Language and Linguistics
[ Article ]
Korea Journal of English Language and Linguistics - Vol. 22, No. 0, pp.19-39
ISSN: 1598-1398 (Print) 2586-7474 (Online)
Print publication date 31 Jan 2022
Received 22 Dec 2021 Revised 24 Jan 2022 Accepted 30 Jan 2022
DOI: https://doi.org/10.15738/kjell.22..202201.19

Assessing Nativelikeness of Korean College Students’ English Writing Using fastText

Hyesun Cho
Associate Professor, Dept. of Education, Graduate School of Education, Dankook Univ. hscho@dankook.ac.kr


© 2022 KASELL All rights reserved
This is an open-access article distributed under the terms of the Creative Commons License, which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.

Abstract

Neural-network models have recently been used to assess nativelikeness of English sentences written by native or nonnative speakers. In this study, nativelikeness of Korean EFL college students’ English writing is assessed using fastText, a neural-network text classifier using subword information. The training data consisted of English sentences from the corpora of native speakers of English and Korean EFL college students. The test sentences consisted of English writing assignments written by Korean EFL college students. fastText performed well for the task of binary classification into native and nonnative sentences, with high accuracy in less than a minute. The sentences that are classified as native with a high probability tend to have fewer grammatical as well as plausibility errors than those classified as nonnative. For the test sentences, correcting grammatical errors (involving articles, number, subject-verb agreement, voice) had weaker effects on the classification of the sentences than correcting plausibility errors (word choices), which conforms to the previous literature. This suggests that fastText is more sensitive to plausibility errors than grammaticality errors which requires knowledge on hierarchical syntactic structures.

Keywords:

English writing, nativelikeness, Korean EFL learners, fastText, deep learning, neural networks, plausibility, grammaticality

References

  • Bojanowski, P., E. Grave, A. Joulin and T. Mikolov. 2017. Enriching word vectors with subword information. Transactions of the Association for Computational Linguistics, 5. 135–146. [https://doi.org/10.1162/tacl_a_00051]
  • Carlstrom, B. and N. Price. 2012-2014. The Gachon Learner Corpus. Available online at http://koreanlearnercorpusblog.blogspot.kr/p/corpus.html, .
  • Chang, Y. 2018. Features of lexical collocations in L2 writing: A case of Korean adult learners of English. English Teaching 73(2), 3-36. [https://doi.org/10.15858/engtea.73.2.201806.3]
  • Cho, K., B. Merrienboer, C. Gehre, F. Bougares, H. Schwenk, H. Schwenk and Y. Bengio. 2014. Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation. CoRR,abs/1406. 1078,2014. URL http://arxiv.org/abs/1406.1078 [https://doi.org/10.3115/v1/D14-1179]
  • Cook, V. 1999. Going beyond the Native Speaker in Language Teaching. TESOL Quarterly, 33(2), 185-209, http://www.viviancook.uk/Writings/Papers/NS1999.htm [https://doi.org/10.2307/3587717]
  • Cummins, J. 2005. Language proficiency, bilingualism and academic achievement. In P. A. Richard-Amato and M. A. Snow, eds., Academic Success for English Learners: Strategies for K-12 Mainstream Teachers, 76-86. White Plains, NY: Pearson.
  • Davies, A. 1991. The Native Speaker in Applied Linguistics. Edinburgh: Edinburgh University Press.
  • Davies, M. 2008-. The Corpus of Contemporary American English (COCA). Available online at https://www.english-corpora.org/coca, /.
  • Folse, K. S., A. Muchmore-Vokoun and E.V. tri Solomon. 2020. College Writing 2: Great Paragraphs. National Geographic Learning.
  • Goldin, G., E. Rabinovich and S. Wintner. 2018. Native language identification with user generated content. Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, 3591–3601. [https://doi.org/10.18653/v1/D18-1395]
  • Goodfellow, I., Y. Bengio and A. Courville. 2016. Deep Learning. The MIT Press.
  • Granger, S. 1998. The computer learner corpus: a versatile new source of data for SLA research. In S. Granger, ed., Learner English on Computer, 3-18, London: Longman. [https://doi.org/10.4324/9781315841342-1]
  • Harust, O., Y. Murawaki and S. Kurohashi. 2020. Native-like expression identification by contrasting native and proficient second language speakers. Proceedings of the 28th International Conference on Computational Linguistics, 5843–5854. [https://doi.org/10.18653/v1/2020.coling-main.514]
  • Nam, D. 2011. Native-speakerness: A case study of two Korean-English bilinguals. Studies in Foreign Language Education 25(2), 171-193. [https://doi.org/10.16933/sfle.2011.25.2.171]
  • Joulin, A., E. Grave, P. Bojanowski and T. Mikolov. 2017. Bag of tricks for efficient text classification. Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics 2, Short Papers, 427-431. [https://doi.org/10.18653/v1/E17-2068]
  • Jung, W. 2005. Attitudes of Korean EFL learners towards varieties of English. English Teaching 60(4), 239-260.
  • Kim, Y. J. 2014. A study of learner's errors in verbs and articles by two different levels of college students, The Journal of English Language and Literature 19(3), 139-164
  • Kim, J. 2020. A corpus-based analysis of syntactic/semantic errors in university level EFL learners’ writing. Lingua Humanities 22(2), 135-156. [https://doi.org/10.16945/2020222135]
  • Kim, S. 2018. Grammatical and lexical errors in Korean college students’ writing at a low level of proficiency. Journal of the Korea English Education Society 17(3), 77-92. [https://doi.org/10.18649/jkees.2018.17.3.77]
  • Lee, J. J. 2005. The native speaker: An achievable model? Asian EFL Journal 7(2), 152-163.
  • Lee, J., J. Kim and H. Kim. 2021. A study on the judgment of nativelikeness of Korean learner corpus by deep learning language model. Language and Culture 17(1), 155-177.
  • Lee, S. 2006. A corpus-based analysis of Korean EFL learners’ use of amplifier collocations, English Teaching 61(1), 3-17.
  • Linzen, T., E. Dupoux and Y. Goldberg. 2016. Assessing the ability of LSTM to learn syntax-sensitive dependencies. Transactions of the Association for Computational Linguistics 4, 521-535. [https://doi.org/10.1162/tacl_a_00115]
  • Morgan-Short, K., I. Finger, S. Grey and M. T. Ullman. 2012. Second language processing shows increased nativelike neural responses after months of no exposure. PLoS ONE 7(3), e32974. [https://doi.org/10.1371/journal.pone.0032974]
  • Nation, I.S.P. 2008. Teaching Vocabulary: Strategies and Techniques. Boston: Heinle ELT.
  • Park, K. 2020. A Comparison of Nativeness Judgments by the Deep Learning Systems and Humans on Korean EFL Learner Sentences. Master’s Thesis, Korea University.
  • Park, K., S. You and S. Song. 2019. Using the deep learning techniques for understanding the nativelikeness of Korean EFL learners. Language Facts and Perspectives 48, 195-227.
  • Park, K., S. You and S. Song. 2020. Not yet as native as native speakers: Comparing deep learning predictions and human judgments. English Language and Linguistics 26(1), 199-228.
  • Pawley, A. and F. H. Syder. 1983. Two puzzles for linguistic theory: nativelike selection and nativelike fluency. In J. C. Richards and R. W. Schmidt, eds., Language and Communication, 191-226. New York: Longman.
  • Rhee, S.-C. and C. K. Jung. 2014. Compilation of the Yonsei English Learner Corpus (YELC) 2011 and its use for understanding current usage of English by Korean pre-university students. The Journal of the Korea Contents Association 14(11), 1019–1029. [https://doi.org/10.5392/JKCA.2014.14.11.1019]
  • Scovel, T. 1988. A Time to Speak: A Psycholinguistic Inquiry into Critical Period for Human Speech. New York: Harper and Row.
  • Seo, H. and J. Shin. 2020. Data preprocessing and transformation in the sentiment analysis using a deep learning technique. Korean Journal of English Language and Linguistics 20, 42-63.
  • Šišić, E. 2016. EFL Learners' Attitudes Towards Native-Like Proficiency as an Achievement Target. Graduation Thesis, University of Zagreb.
  • Weston, J., S. Chopra and K. Adams. 2014. #tagspace: Semantic embeddings from hashtags. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), 1822-1827. [https://doi.org/10.3115/v1/D14-1194]
  • Xiao, Y. and K. Cho. 2016. Efficient character-level document classification by combining convolution and recurrent layers. Available at https://arxiv.org/abs/1602.00367
  • Zareva, A., P. Schwanenflugel and Y. Nikolova 2005. Relationship between lexical competence and language proficiency. Studies in Second Language Acquisition 27(4), 567-595. [https://doi.org/10.1017/S0272263105050254]
  • Zhang, X. and Y. LeCun. 2015. Text understanding from scratch. Available at http://arxiv.org/abs/1502.01710
  • Zhang, X., J. Zhao and Y. LeCun. 2015. Character-level convolutional networks for text classification. Proceedings of the 28th International Conference on Neural Information Processing Systems 1, 649–657.
  • Zhao, J. 2017. Native speaker advantage in academic writing?: Conjunctive realizations in EAP writing by four groups of writers. Ampersand 4, 47-57. [https://doi.org/10.1016/j.amper.2017.07.001]