메뉴 건너뛰기
.. 내서재 .. 알림
소속 기관/학교 인증
인증하면 논문, 학술자료 등을  무료로 열람할 수 있어요.
한국대학교, 누리자동차, 시립도서관 등 나의 기관을 확인해보세요
(국내 대학 90% 이상 구독 중)
로그인 회원가입 고객센터 ENG
주제분류

추천
검색

논문 기본 정보

자료유형
학위논문
저자정보

최순영 (고려대학교, 고려대학교 대학원)

지도교수
임희석
발행연도
2019
저작권
고려대학교 논문은 저작권에 의해 보호받습니다.

이용수0

표지
AI에게 요청하기
추천
검색

이 논문의 연구 히스토리 (2)

초록· 키워드

오류제보하기
정보 통신 기술의 발달로 인해 인터넷 법률, 인터넷 기사, 인터넷 상품리뷰 와 같은 인터넷 상의 문서가 증가하였고 그에 따라 자연어처리(Natural Language Processing) 분야에 대한 관심이 많아졌다. 그리고 최근에는 신경망(Neural Network)을 비롯한 머신러닝 알고리즘을 이용하여 자연어처리를 수행하는 태스크에 대한 수요가 높아졌다. 신경망을 비롯한 대부분의 머신러닝 알고리즘에서 텍스트를 적용하기 위해서는, 텍스트를 일련의 벡터의 형태로 표현해야 한다. 이와 같이 텍스트를 의미 있게 표현하는 방법을 워드 임베딩이라고 하는데, 워드 임베딩(word embedding)을 위한 연구로는Word2Vec, Glove,FastText 등이 있다. 그러나 리소스가 부족한 언어의 경우 단일 언어 임베딩(monolingual word embedding)을 적용하면 임베딩의 질이 떨어질 수 있다. 그리고 다중 언어 맥락에서 단어에 대한 의미추론을 하는 데는 어려움이 있다. 이러한 문제들은 교차언어 워드 임베딩을 기법을 적용하게 되면 리소스가 풍부한 언어(resource-rich)와 리소스가 부족한 언어(resource-lean) 간의 지식전달(knowledge transfer)을 가능하게 함으로 해결할 수 있으며, 다중언어 의미 (multilingual semantics)을 가능하게 함으로 해결할 수 있다. 위와 같은 이유들로 최근 이중 언어 임베딩(bilingual word embedding) 관련 연구들이 각광을 받고 있다. 그러나 한국어와 특정 언어로 구성된 병렬(parallel-aligned) 말뭉치로 이중 언어 워드 임베딩을 하는 연구는 질이 높은 많은 양의 말뭉치를 구하기 어려우므로 활발히 이루어지지 않고 있다. 특히, 특정 영역에 사용할 수 있는 로컬 이중 언어 워드 임베딩(local bilingual word embedding)의 경우는 상대적으로 더 희소하다. 또한 이중 언어 워드 임베딩을하는 경우 번역 쌍이 단어의 개수에서 일대일 대응을 이루지 못하는 경우가 많다. 본 논문에서는 로컬 워드 임베딩을 위해 한국어-영어로 구성된 한국 법률 단락 868,163개를 크롤링(crawling)하여 임베딩을 하였고 3가지 연결 전략을 제안하였다. 본 전략은 앞서 언급한 불규칙적 대응 문제를 해결하고 단락 정렬 말뭉치에서 번역 쌍의 질을 향상시켰으며 베이스라인인 글로벌 워드 임베딩 (global bilingual word embedding)과 비교하였을 때 2배의 성능을 확인하였다. 또 한 Word2Vec과 유사하지만 subword 개념이 추가된 임베딩 방법인 FastText를 적용하여 성능 차이를 비교하였다.

목차

1. 서론 ··········································································································· 1
1.1 연구배경 ········································································································ 1
1.2 연구내용 및 기여 ··························································································· 2
1.3 논문의 구성 ··································································································· 3
2. 관련 연구 ··································································································· 4
2.1 워드 임베딩(Word Embedding) ········································································ 4
2.2.1 Word2Vec ······························································································· 4
2.2.2 GloVe ····································································································· 5
2.2.2 Subword Information(FastText) ·································································· 6
2.2 교차 언어 워드 임베딩(Cross-lingual Embedding) ············································ 7
3. 법률 데이터를 이용한 이중 언어 워드 임베딩 시스템 ····························· 10
3.1 로컬 이중 언어 워드 임베딩 시스템 구조 ···················································· 10
3.2 제안되는 모델 ······························································································ 11
3.2.1 Random Match ······················································································· 12
3.2.2 Single Match Greedy Intersect ································································ 13
3.2.3 Multiple Match Greedy Intersect ····························································· 13
4. 실험 환경 ································································································· 15
4.1 실험 데이터 ·································································································· 15
4.2 데이터 전처리 ······························································································ 17
4.3 실험 방법 ····································································································· 18
5. 실험 결과 및 분석 ··················································································· 20
5.1 Word2Vec을 이용한 이중 언어 워드 임베딩 ················································· 20
5.2 FastText를 이용한 이중 언어 워드 임베딩 ···················································· 24
5.3 Word2Vec과 FastText 성능 비교 ······························································ 29
6. 결론 및 향후 과제 ··················································································· 31
참고 문헌 ····································································································· 33

최근 본 자료

전체보기

댓글(0)

0