Print publication date 31 Jan 2024
Received 29 Jul 2024 Revised 02 Sep 2024 Accepted 23 Sep 2024

AI 기반 영문학 텍스트 분석: 제인 오스틴의 소설 엠마를 중심으로 한 자연어처리 연구

AI-powered text analysis of English literature: A Natural Language Processing (NLP) study centered on Jane Austen’s novel Emma
Young-kyo Oh
This study attempts to conduct data-driven text analysis on the text of Jane Austen’s novel Emma (1815). Text analysis is a research method that utilizes Natural Language Processing (NLP) techniques to extract meaningful content and information from large-scale unstructured text data, and to discover new meaning and insights at the contextual level by considering the relationship between text and words. The main text analysis techniques include text network analysis, topic modeling, and sentiment analysis. In this study, we tried to analyze the text of the novel Emma, which is representative among English literary texts, according to NLP algorithms. To do so, we first analyzed the text of the novel Emma through term frequency (TF) analysis and term frequency-inverse document frequency (TF-IDF) analysis to determine the relative importance of words according to word frequency. Then, to examine the relationships between characters in the novel, we conducted co-occurrence and network centrality analysis through text network analysis. Next, we applied topic modeling using Latent Dirichlet Allocation (LDA) to classify Emma, a novel consisting of three volumes and 55 chapters, into four topics. Finally, sentiment analysis was conducted to calculate the degree of positivity and negativity for each volume to quantify the sentiment score. This study aims to help non-literature majors understand and appreciate English classical texts by objectively quantifying the literary content and character relationships inherent in 19th-century English fiction texts, and furthermore, to gain implications for English education in terms of effective English text comprehension education.


text analysis, Emma, network text analysis, topic modeling, sentiment analysis


