화자 및 발화 스타일 변이에 강인한 심층 신경망 기반 음성 인식 :Deep neural network-based speech recognition robust to variation of speakers and speaking styles

강병옥

추천

검색

자료유형: 학위논문

저자정보: 강병옥 (충북대학교, 충북대학교 대학원)

지도교수: 권오욱

발행연도: 2018

저작권: 충북대학교 논문은 저작권에 의해 보호받습니다.

이용수0

이 논문의 연구 히스토리 (2)

2018

화자 및 발화 스타일 변이에 강인한 심층 신경망 기반 음성 인식

강병옥 전기·전자·정보·컴퓨터학부 2018.01 학위논문

2017

원어민 및 외국인 화자의 음성인식을 위한 심층 신경망 기반 음향모델링

강병옥 , 권오욱 말소리와 음성과학 2017.06 학술저널

이 논문의 후속연구가 궁금하신가요?
연관 학술논문 또는 학술발표를 통해 보다 발전된 연구결과를 확인하실 수 있습니다.
이 논문의 연구 히스토리 확인하기

초록· 키워드

오류제보하기

This paper describes studies that have been conducted for acoustic models robust to variation of speakers and speaking styles in automatic speech recognition (ASR) systems.
Related to acoustic models robust to variation of speaking styles, we analyse the characteristics of Korean atypical spontaneous speech and attempt to improve the performance of automatic speech recognition. Atypical spontaneous speech is a speaking style uttered in everyday life, which includes various phenomena such as abnormal speaking rate, ambiguous pronunciation, lengthening, repetition/correction, filled pause, aborted articulation, and colloquial expression. These phenomena are believed to be the main cause of speech recognition errors. In this paper, the linguistic and acoustic characteristics that are common in atypical spontaneous speech are identified and the effects of these attributes on speech recognition errors are analysed. Through the quantitative analysis of speech recognition errors in tagged Korean atypical spontaneous speech data, we find that among the various attributes of atypical spontaneous speech, acoustic characteristics such as ambiguous pronunciation and abnormal speaking rate are the main cause of performance degradation in speech recognition systems. Considering these results of analysis, we apply two-step approaches in terms of acoustic model and achieve the average error rate reduction (ERR) of 49.0% compared to the baseline model. Through the stepwise process of improving the performance of the acoustic model, we analyze the change of speech recognition error patterns and the proportion of error rate caused by each tagged attribute of Korean atypical spontaneous speech.
To improve the performance of acoustic models robust to variation of speakers, we propose a new method to combine multiple acoustic models in Gaussian mixture model (GMM) spaces which combines several sub-acoustic models to generate an optimal GMM-based acoustic models for the desired new speech recognition task. After forming a huge pool of states with multiple acoustic characteristics by gathering HMM states from multiple individually trained sub-models, an acoustic model with states that are optimal for a new task is generated by combining pair of states occupying a similar acoustic space with the proposed algorithm. To evaluate the proposed acoustic modeling method, we perform experiments for non-native English speech recognition task for English learning systems as a second language and a noise-robust speech recognition task for car navigation systems. For the non-native speech recognition task, we generated the combined native and non-native acoustic model. From the experimental results, the proposed method of combining native and non-native models in the GMM spaces is shown to achieve an average ERR of 12.2% compared to the conventional method. For the noise-robust speech recognition task for car navigation systems, we apply the proposed method to combine the acoustic model trained with original clean speech data and the acoustic model trained with noise-added speech data. The proposed method is shown to accomplish the ERR of 7.5% ~ 20.6% under various driving conditions compared to the conventional multi-condition training method.
After the success of deep neural network-based acoustic models applied to large vocabulary continuous speech recognition (LVCSR) was first reported, most of speech recognition systems currently in service adopt DNN-based acoustic models. By extending the previous method of combining multiple acoustic models in Gaussian mixture model spaces, we propose a new method to train deep neural network (DNN)-based acoustic models for non-native speakers’ speech recognition. The proposed method consists of determining multi-set state clusters with various acoustic properties, training a DNN-based acoustic model with pre-determined multi-set state clusters as output nodes, and recognizing input speech based on the trained acoustic model. In the proposed method, hidden nodes of DNN are shared, but output nodes are separated to accommodate different acoustic properties for native and non-native speech. In an English speech recognition task, the proposed method shows the ERR of 2.1% and 14.5% for Korean and English speakers, respectively, compared to the conventional DNN-based acoustic model, which already has a high level of speech recognition accuracy.

Ⅰ. 서론 1
1.1 연구 배경 1
1.2 연구 내용 5
1.3 논문 구성 8
Ⅱ. 배경 이론 9
2.1 음성 인식 시스템 개요 9
2.2 음성 인식을 위한 음향 모델 개요 10
2.1.1 GMM-HMM 기반 음향 모델 11
2.1.2 DNN-HMM 기반 음향 모델 16
Ⅲ. 발화 스타일 변이에 강인한 음성 인식 21
3.1 대화체 음성 인식 21
3.2 대화체 음성 특성 22
3.3 대화체 음성 인식 오류 분석 26
3.4 대화체 음성 인식 성능 개선 33
3.5 녹취 데이터 인식을 위한 심층 신경망 기반 음성 인식 시스템 성능 개선 39
3.5.1 심층 신경망 기반 음향 모델 성능 개선 39
3.5.2 다채널 동시 처리를 위한 최적화 44
3.6 요약 46
Ⅳ. 비원어민 화자에 강인한 심층 신경망 기반 음성 인식 48
4.1 연구 개요 및 구성 48
4.2 GMM 공간에서의 다중 음향 모델 결합 방법 51
4.2.1 서브 음향 모델 훈련 52
4.2.2 상태 결합을 위한 로그 우도 계산 53
4.2.3 비슷한 음향 공간을 차지하는 상태 결합 54
4.2.4 상태 및 GMM 파라미터 가중치 조정 58
4.3 다중 집합 상태 클러스터를 기반으로 한 DNN 음향 모델 59
4.3.1 DNN 학습 방법 63
4.3.2 인식 방법 65
4.4 실험 결과 및 토의 65
4.4.1 원어민과 비원어민 화자 대상 음성 인식 65
4.4.2 원어민과 비원어민 화자 대상 DNN 기반 음성 인식 77
4.4.3 자동차 내비게이션 시스템을 위한 음성 인식 84
4.5 요약 88
Ⅴ. 결론 91
5.1 연구내용 요약 91
5.2 향후 연구과제 93
부록 A. DNN 구조 94
부록 B. DNN 학습 98
참고문헌 104

최근 본 자료

전체보기

구분	그룹	데이터 항목
AI 학습용 데이터	원문	원문 PDF 파일
AI 학습용 데이터	원문 + 메타 (기본/상세)	원문 PDF 파일 및 서지정보 CSV
대량 구매용 데이터	B2B 구독 방식	특정 자료 한정으로 원문 접근 권한 부여
대량 구매용 데이터	URL 전달 방식	바로 PDF 뷰어를 열람할 수 있는 URL 제공

구분	그룹	데이터 항목
AI 학습용 데이터	기본 메타	발행기관명, 간행물명, 권호명, 권(vol), 호(issue), 통권, 발행연도, 발행월, 논문명, 저자명, 시작페이지, 종료페이지, 전체페이지, 상세페이지URL
상세 메타 데이터	발행기관 메타	발행기관 이명, 영문명, 창립연도, 홈페이지URL, 발행기관 소개
	간행물 메타	부제목, 간행물 유형, ISSN, ISBN, 최초발행연도, 폐간연도, 간행빈도, 발행주기, 등재사항, 이용수, 피인용수, 권호수, 논문수, 표지이미지
	논문 메타	작성 언어, 부제목, 대등제목, 목차, 키워드, 초록, 이미지, 참고문헌, 이용수, 피인용수, 논문활용도, DBpia통합주제분류, KDC분류, DDC분류, 한국연구재단분류, UCI, DOI
	저자 메타	소속기관, 소속부서, 직급, 연구분야, 연구키워드, 이용수, 피인용수, 저자 논문활용도

구분	그룹	데이터 항목
※ 결합형/맞춤형 메타 데이터는 신청 내용에 따라 다양하게 제공 가능
이용순위 정보	주제분야별 많이 이용된 논문	“인문학”에서 많이 이용된 논문 TOP100
	이용기관별 많이 이용된 논문	“중고등학교”에서 많이 이용된 논문 TOP100
	세부기관별 많이 이용된 논문	“서울대학교”에서 많이 이용된 논문 TOP100
	키워드별 많이 이용된 논문	“Chat GPT”에서 많이 이용된 논문 TOP100
키워드 정보	많이 이용된 키워드	특정기간/분야/저널 내 많이 이용된 키워드
	많이 발행된 키워드	특정기간/분야/저널 내 많이 발행된 키워드
	많이 검색된 키워드	특정기간/분야/저널 내 많이 검색된 키워드
	연구 트렌드 키워드	특정 키워드 연관 연구동향 분석 데이터 키워드

논문 기본 정보

이 논문의 연구 히스토리 (2)

초록· 키워드

목차

최근 본 자료

댓글(0)