메뉴 건너뛰기
.. 내서재 .. 알림
소속 기관/학교 인증
인증하면 논문, 학술자료 등을  무료로 열람할 수 있어요.
한국대학교, 누리자동차, 시립도서관 등 나의 기관을 확인해보세요
(국내 대학 90% 이상 구독 중)
로그인 회원가입 고객센터 ENG
주제분류

추천
검색

논문 기본 정보

자료유형
학위논문
저자정보

정다해 (상명대학교, 상명대학교 대학원)

지도교수
정진우
발행연도
2016
저작권
상명대학교 논문은 저작권에 의해 보호받습니다.

이용수5

표지
AI에게 요청하기
추천
검색

이 논문의 연구 히스토리 (2)

초록· 키워드

오류제보하기
전화 녹취 파일에서의 화자 구분

본 논문에서 제안하는 화자 구분(Speaker diarization) 방법은 다음과 같다. 우선 음성파일 (*.wav)에서 음성이 없는 구간(묵음구간)과 음성이 있는 구간(음성구간)을 분리하기 위해 음성파일의 앞 30초 구간의 Time Signal로 Envelop graph를 추출하여 묵음구간(silence)과 음성구간(non-silence)의 경계값(Threshold)을 추출한다. 두 번째로, 경계값(threshold)을 이용하여 전체 음성파일의 음성 구간을 분리하고, 2차에 걸친 세그멘테이션 작업을 한다. 세 번째로, 각각의 세그먼트를 FFT(Fast Fourier Transform)을 하여 Time signal을 Frequency signal로 변환한다. 네 번째로, FFT로 구한 Frequency signal에서 의미있는 주파수 특징 12개를 추출하기 위해 MFCC(Mel-Frequency Coefficient Cepstral)를 수행한다. 다섯 번째로, 12개의 MFCC 특징을 이용하여 각 Segment를 Gaussian Model로 만들고, 각 모델을 합하여 GMM(Gaussian Mixture Model)을 형성한다. 여섯 번째로, GMM을 기반으로 EM-Clustering을 수행하여 두 화자를 구분한다.
화자구분 결과의 오류율을 계산하기 위해 Ground truth로 음성 파일을 들으면서 실제 화자가 발화 하는 시간을 기록하여 알고리즘에 의해 나온 결과와 비교하였다. 그 결과 평균 5%이내의 DER(Diarization Error Rate)을 얻었다.

목차

차 례
차례 ·················································································································· ⅰ
표차례 ·············································································································· ⅱ
그림차례 ·········································································································· ⅲ
국문 요약 ········································································································ ⅴ
1. 서론 ··············································································································· 1
2. 화자 구분에 대한 고찰 ············································································· 5
2.1. 화자 구분을 위한 단위 ········································································ 5
2.2. 음성의 특징 및 표현 ············································································ 6
2.3. DFT vs FFT ··························································································· 7
2.4. 클러스터링 기법 ·················································································· 15
3. 기존의 화자 구분 알고리즘 ··································································· 16
3.1. 화자 구분 알고리즘 ············································································ 16
3.2. 기존 화자 구분 알고리즘의 한계점 ················································ 18
4. 제안하는 화자 구분 알고리즘 ······························································· 19
4.1. 묵음 구간 검출 ···················································································· 21
4.2. 세그멘테이션 ························································································ 24
4.3. MFCC (Mel-Frequency Coeifficient Cepstral) ······························· 26
4.5. GMM (Gaussian Mixture Model) ······················································· 34
4.6. EM(Expectation-Maximization)-Clustering ······································· 41
5. 결과 ············································································································· 43
6. 결론 ············································································································· 50
참고문헌 ·········································································································· 51
ABSTRACT ······································································································ 55

최근 본 자료

전체보기

댓글(0)

0