메뉴 건너뛰기
Library Notice
Institutional Access
If you certify, you can access the articles for free.
Check out your institutions.
ex)Hankuk University, Nuri Motors
Log in Register Help KOR
Subject

Improving the performance of text classification models using keyword extraction and data augmentation techniques
Recommendations
Search

핵심어 추출 및 데이터 증강기법을 이용한 텍스트 분류 모델 성능 개선

논문 기본 정보

Type
Academic journal
Author
Lee Kangchul (전북대학교) Ahn Jeong Yong (전북대학교)
Journal
The Korean Data Analysis Society Journal of The Korean Data Analysis Society Journal of The Korean Data Analysis Society 제24권 제5호 KCI Accredited Journals
Published
2022.10
Pages
1,719 - 1,731 (13page)
DOI
10.37727/jkdas.2022.24.5.1719

Usage

cover
Improving the performance of text classification models using keyword extraction and data augmentation techniques
Ask AI
Recommendations
Search

Research history (2)

  • Are you curious about the follow-up research of this article?
  • You can check more advanced research results through related academic papers or academic presentations.
  • Check the research history of this article

Abstract· Keywords

Report Errors
Topic modeling aims to identify and categorize topics latent in documents, and is useful for exploring core topics of each document and the characteristics of the topics. However, a problem with interpreting topics this technique is that common terms often appear near the top of multiple topics, making it hard to extract keywords identifying the topics. Another weakness is that this technique can lead to loss of information when synonyms are excluded from keywords, and high performance often depends on the size and quality of data. To improve these problems, we propose a method that utilizes relevance and word embedding techniques for extracting keywords. In addition, we use the EDA(easy data augmentation) techniques to increase the size of the data, and then apply the KoBERT model for boosting performance on text classification tasks. As a result of data analysis, it was possible to grasp the specific characteristics of the topics based on the discriminating keywords. The results also showed that using the augmented data sets, the text classifier model has higher accuracy than the original data sets with a score of 0.94 and 0.85, respectively.

Contents

No content found

References (0)

Add References

Recommendations

It is an article recommended by DBpia according to the article similarity. Check out the related articles!

Related Authors

Recently viewed articles

Comments(0)

0

Write first comments.