
Can Linguistic Features Predict Placement Decisions? An Exploratory Study on Integrated Summary and Argumentative Writing Samples
© 2025 KASELL All rights reserved
This is an open-access article distributed under the terms of the Creative Commons License, which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
Abstract
This study explores how integrated summary and argumentative essays written by learners at various placement levels can be classified using computational indices measuring cohesion, lexical sophistication, and syntactic complexity. To do so, essays taken from 989 test-takers, whose previous TOEFL scores ranged between 71 and 99, were analyzed using the computational tool Coh-Metrix 3.0. The essays were categorized into B (lowest-level), C, D, and P (highest-level) groups based on raters’ decisions. A Random Forest analysis was carried out to predict placement levels using selected Coh-Metrix indices as predictor variables. For summary writing, the top five most important variables predicting an individual’s placement level were word familiarity, age of acquisition, word meaningfulness, mean number of words before the main verb, and sentence syntax similarity. For argumentative writing, the five most important predictors were word familiarity, lexical diversity, number of modifiers per noun phrase, word meaningfulness, and age of acquisition. While classification performance was modest overall, the models demonstrated higher precision and recall for P-level essays than for B-, C-, and D-level essays, suggesting that the models experienced difficulty in classifying essays written by students who fall within a narrow proficiency range. Nonetheless, genre-sensitive trends emerged, with argumentative essays showing greater lexical diversity and syntactic elaboration, and summary essays reflecting more gradual increases in sentence complexity. These findings suggest that statistical models may capture linguistic patterns associated with placement levels and offer complementary insights into placement decisions and instructional support.
Keywords:
placement test, Coh-Metrix, second language writing, lexical sophistication, syntactic complexity, cohesion, language features, Random ForestReferences
-
Batista, G. E., R. C. Prati and M. C. Monard. 2004. A study of the behavior of several methods for balancing machine learning training data. ACM SIGKDD Explorations Newsletter 6(1), 20-29.
[https://doi.org/10.1145/1007730.1007735]
-
Biber, D. and B. Gray. 2010. Challenging stereotypes about academic writing: Complexity, elaboration, explicitness. Journal of English for Academic Purposes 9, 2-20.
[https://doi.org/10.1016/j.jeap.2010.01.001]
-
Casal, J. E. and J. J. Lee. 2019. Syntactic complexity and writing quality in assessed first-year L2 writing. Journal of Second Language Writing 44, 51-62.
[https://doi.org/10.1016/j.jslw.2019.03.005]
-
Crossley, S. A. 2020. Linguistic features in writing quality and development: An overview. Journal of Writing Research 11(3), 415-443.
[https://doi.org/10.17239/jowr-2020.11.03.01]
-
Crossley, S. A. and D. S. McNamara. 2009. Computational assessment of lexical differences in L1 and L2 writing. Journal of Second Language Writing 18, 119-135.
[https://doi.org/10.1016/j.jslw.2009.02.002]
-
Crossley, S. A. and D. S. McNamara. 2012a. Detecting the first language of second language writers using automated indices of cohesion, lexical sophistication, syntactic complexity and conceptual knowledge. In S.A. Crossley and S. Jarvis, eds., Approaching Language Transfer through Text Classification: Explorations in the Detection-Based Approach, 106-126. Multilingual Matters.
[https://doi.org/10.2307/jj.27195494.7]
-
Crossley, S. A. and D. S. McNamara. 2012b. Predicting second language writing proficiency: the roles of cohesion and linguistic sophistication. Journal of Research in Reading 35(2), 115-135.
[https://doi.org/10.1111/j.1467-9817.2010.01449.x]
-
Crossley, S. A. and D. S. McNamara. 2014. Does writing development equal writing quality? A computational investigation of syntactic complexity in L2 learners. Journal of Second Language Writing 26, 66-79.
[https://doi.org/10.1016/j.jslw.2014.09.006]
-
Crossley, S. A. and D. S. McNamara. 2016. Say more and be more coherent: How text elaboration and cohesion can increase writing quality. Journal of Writing Research 7(3), 351-370.
[https://doi.org/10.17239/jowr-2016.07.03.02]
-
Crossley, S. A., K. Kyle and D. S. McNamara. 2016. The development and use of cohesive devices in L2 writing and their relations to essay quality. Journal of Second Language Writing 32, 1-16.
[https://doi.org/10.1016/j.jslw.2016.01.003]
-
Crossley, S. A., T. Salsbury and D. S. McNamara. 2011. Predicting the proficiency level of language learners using lexical indices. Language Testing 29(2), 243-263.
[https://doi.org/10.1177/0265532211419331]
- Danis, N. 2019. Variation of Linguistic Markers of Stance in ESL students’ Summary and Argumentative Essays. Master’s thesis, Iowa State University.
-
Goh, T-T., H. Sun and B. Yang. 2020. Microfeatures influencing writing quality: The case of Chinese students’ SAT essays. Computer Assisted Language Learning 33, 455-481.
[https://doi.org/10.1080/09588221.2019.1572017]
-
Guo, L., S. A. Crossley and D. S. McNamara. 2013. Predicting human judgments of essay quality in both integrated and independent second language writing samples: A comparison study. Assessing Writing 18, 218-238.
[https://doi.org/10.1016/j.asw.2013.05.002]
- ISU EPT Corpus of Learner Writing (Release 2.2). 2017. Corpus compiled by the Applied Linguistics and Technology program and Bethany Gray at Iowa State University.
-
Jin, H. 2023. Lexical frames and errors in the use of English definite article in L2 academic writing: A case of English placement test. Korean Journal of English Language and Linguistics 23, 324-341.
[https://doi.org/10.15738/kjell.23..202304.324]
-
Jung, Y., S. A. Crossley and D. S. McNamara. 2019. Predicting second language writing proficiency in learner texts using computational tools. The Journal of Asia TEFL 16(1), 37-52.
[https://doi.org/10.18823/asiatefl.2019.16.1.3.37]
-
Kim, H. 2020. Nominal modifiers in argumentative essays as discriminators for writing course placement decisions. English Teaching 75(3), 3-24.
[https://doi.org/10.15858/engtea.75.3.202009.3]
-
Kyle, K. 2020. The relationship between features of source text use and integrated writing quality. Assessing Writing 45, 100567.
[https://doi.org/10.1016/j.asw.2020.100467]
-
Kyle, K. and S. A. Crossley. 2016. The relationship between lexical sophistication and independent and source-based writing. Journal of Second Language Writing 34, 12-24.
[https://doi.org/10.1016/j.jslw.2016.10.003]
- Li, Z. 2015. An Argument-Based Validation Study of the English Placement Test (EPT): Focusing on The Inferences of Extrapolation and Ramification. Doctoral dissertation, Iowa State University.
-
Ma, H., J. Wang and L. He. 2024. Linguistic features distinguishing students’ writing ability aligned with CEFR levels. Applied Linguistics 45, 637-657.
[https://doi.org/10.1093/applin/amad054]
-
McNamara, D. S., S. A. Crossley, R. D. Roscoe, L. K. Allen and J. Dai. 2015. A hierarchical classification approach to automated essay scoring. Assessing Writing 23, 35-59.
[https://doi.org/10.1016/j.asw.2014.09.002]
-
McNamara, D. S., A. C. Graesser, P. M. McCarthy and Z. Cai. 2014. Automated Evaluation of Text and Discourse with Coh-Metrix. Cambridge.
[https://doi.org/10.1017/CBO9780511894664]
-
Mizumoto, A. 2023. Calculating the relative importance of multiple regression predictor variables using dominance analysis and random forests. Language Learning 73(1), 161-196.
[https://doi.org/10.1111/lang.12518]
-
Nguyen, P. 2024. Noun phrase complexity in English integrated writing placement test responses. Journal of English for Academic Purposes 72, 101452.
[https://doi.org/10.1016/j.jeap.2024.101452]
- R Core Team. 2025. R: A language and environment for statistical computing (version 4.3.1) [Computer software]. Available online at https://www.R-project.org
-
Vo, S. 2019. Use of lexical features in non-native academic writing. Journal of Second Language Writing 44, 1-12.
[https://doi.org/10.1016/j.jslw.2018.11.002]
-
Vӧgelin, C., T. Jansen, S. D. Keller, N. Machts and J. Mӧller. 2019. The influence of lexical features on teacher judgments of ESL argumentative essays. Assessing Writing 39, 50-63.
[https://doi.org/10.1016/j.asw.2018.12.003]
-
Zagata, E., D. Kearns, A. J. Truckenmiller and Z. Zhao. 2023. Using the features of written compositions to understand reading comprehension. Reading Research Quarterly 58(4), 624-654.
[https://doi.org/10.1002/rrq.503]