The Korean Association for the Study of English Language and Linguistics

[ Article ]

Korea Journal of English Language and Linguistics - Vol. 25, No. 0, pp.289-310

ISSN: 1598-1398 (Print) 2586-7474 (Online)

Print publication date 31 Jan 2025

Received 02 Jan 2025 Revised 10 Feb 2025 Accepted 08 Mar 2025

DOI: https://doi.org/10.15738/kjell.25..202503.289

Dynamic Cues in Vowel Classification: A Discriminant Analysis of Conversational Speech Corpus

Hyun Jin Hwangbo

Assistant Professor, Division of English Language and Literature Pukyong National University hjhwangbo@pknu.ac.kr

© 2025 KASELL All rights reserved
This is an open-access article distributed under the terms of the Creative Commons License, which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.

Abstract

This paper asks whether the vowel inherent spectral change (VISC) or the dynamic cues of vowels is an essential feature for vowel classification in natural speech. To answer this question, vowels from the Buckeye Corpus of conversational speech were trained and tested for three models on vowel classification with quadratic discriminant analysis, a machine learning technique. Three models were evaluated: the steady-state model, the one-point model, and two trajectory models, which include the two-point and three-point models. The one-point model samples the spectral features of vowels at one point of vowel duration, while the two-point and three-point models sample the features at two and three points of vowel duration. Various combinations of sampled points and predictors (F0, F1, F2, and F3) were analyzed, and the combinations with the best classification accuracy were compared across the models. The results showed that the steady-state model showed the highest classification accuracy when the spectral features and fundamental frequency were sampled at 50% of vowel duration, while the trajectory models showed the highest classification when sampled at 30% and 70% and 10%, 50%, and 90% for two-point and three-point models, respectively. Classification performance was the highest for all models when all parameters (F0, F1, F2, F3) were included across all models. When compared across the models, the trajectory models perform better than the steady-state model. In addition, vowel duration as a parameter has facilitated the classification accuracy for specific vowels. This paper obtains additional evidence for VISC in vowel classification, including detailed classification results of each vowel, identifying the misclassified vowels, and providing insights for vowel classification models.

Keywords:

vowel classification, vowel dynamics, discriminant analysis, conversational speech corpus

Acknowledgments

This work was supported by a Research Grant of Pukyong National University (2023).

References

Adank, P., R. Van Hout and R. Smits. 2004. An acoustic description of the vowels of Northern and Southern Standard Dutch. The Journal of the Acoustical Society of America 116(3), 1729-1738. [https://doi.org/10.1121/1.1779271]
Almurashi, W., J. Al-Tamimi and G. Khattab. 2020. Static and dynamic cues in vowel production in Hijazi Arabic. The Journal of the Acoustical Society of America 147(4), 2917-2927. [https://doi.org/10.1121/10.0001004]
Almurashi, W., J. Al-Tamimi and G. Khattab. 2024. Dynamic specification of vowels in Hijazi Arabic. Phonetica 81(2), 185-220. [https://doi.org/10.1515/phon-2023-0013]
Boersma, P. and D. Weenink. 2023. Praat: Doing phonetics by computer [Computer program].
Harrington, J. and S. Cassidy. 1994. Dynamic and Target Theories of Vowel Classification: Evidence from Monophthongs and Diphthongs in Australian English. Language and Speech 37(4), 357-373. [https://doi.org/10.1177/002383099403700402]
Hillenbrand, J. and R. T. Gayvert. 1993. Vowel Classification Based on Fundamental Frequency and Formant Frequencies. Journal of Speech, Language, and Hearing Research 36(4), 694-700. [https://doi.org/10.1044/jshr.3604.694]
Hillenbrand, J., L. A. Getty, M. J. Clark and K. Wheeler. 1995. Acoustic characteristics of American English vowels. The Journal of the Acoustical Society of America 97(5), 3099-3111. [https://doi.org/10.1121/1.411872]
Hillenbrand, J. 2013. Static and Dynamic Approaches to Vowel Perception. In G. S. Morrison and P. F. Assmann, eds., Vowel Inherent Spectral Change, 9-30. Springer. [https://doi.org/10.1007/978-3-642-14209-3_2]
Hillenbrand, J., M. J. Clark and T. M. Nearey. 2001. Effects of consonant environment on vowel formant patterns. The Journal of the Acoustical Society of America 109(2), 748-763. [https://doi.org/10.1121/1.1337959]
Hong, S. 2021. Roles of temporal patterns of vowel-intrinsic cues in model identification of Korean vowels in spontaneous speech. Studies in Phonetics, Phonology, and Morphology 27(2), 321-351.
Hong, S. 2023. Roles of dynamic patterns of lower formants, vowel identity, and gender in predicting postvocalic consonant place in Korean spontaneous speech. Studies in Phonetics, Phonology, and Morphology 29(2), 211-246.
Kelleher, J. D., B. Mac Namee and A. D’Arcy. 2020. Fundamentals of Machine Learning for Predictive Data Analytics: Algorithms, Worked Examples, and Case Studies (second edition). The MIT Press.
Kuhn, M. 2008. Building predictive models in R using the caret package. Journal of Statistical Software 28(5), 1-26. [https://doi.org/10.18637/jss.v028.i05]
Morrison, G. S. 2013. Theories of Vowel Inherent Spectral Change. In G. S. Morrison and P. F. Assmann, eds., Vowel Inherent Spectral Change, 31-48. Springer. [https://doi.org/10.1007/978-3-642-14209-3]
Nearey, T. M. and P. F. Assmann. 1986. Modeling the role of inherent spectral change in vowel identification. The Journal of the Acoustical Society of America 80(5), 1297-1308. [https://doi.org/10.1121/1.394433]
Neel, A. T. 2004. Formant detail needed for vowel identification. Acoustics Research Letters Online 5(4), 125-131. [https://doi.org/10.1121/1.1764452]
Peterson, G. E. and H. L. Barney. 1952. Control Methods Used in a Study of the Vowels. The Journal of the Acoustical Society of America 24(2), 175-184. [https://doi.org/10.1121/1.1906875]
Pitt, M.A., L. Dilley, K. Johnson, S. Kiesling, W. Raymond, E. Hume and E. Fosler-Lussier. 2007. Buckeye corpus of conversational speech (2nd release). [www.buckeyecorpus.osu.edu, ].
Pitt, M A., K. Johnson, E. Hume, S. Kiesling and W. Raymond. 2005. The Buckeye corpus of conversational speech: Labeling conventions and a test of transcriber reliability. Speech Communication 45(1), 89-95. [https://doi.org/10.1016/j.specom.2004.09.001]
R Core Team. 2023. R: A language and environment for statistical computing. [www.R-project.org, ].
Watson, C. I. and J. Harrington. 1999. Acoustic evidence for dynamic formant trajectories in Australian English vowels. The Journal of the Acoustical Society of America 106(1), 458-468. [https://doi.org/10.1121/1.427069]
Whalen, D. H., and A. G. Levitt. 1995. The universality of intrinsic F0 of vowels. Journal of Phonetics 23(3), 349-366. [https://doi.org/10.1016/S0095-4470(95)80165-0]
Yoon, K. 2021. Praat & Scripting. Book Kyul.
Zahorian, S. A. and A. J. Jagharghi. 1993. Spectral-shape features versus formants as acoustic correlates for vowels. The Journal of the Acoustical Society of America 94(4), 1966-1982. [https://doi.org/10.1121/1.407520]