인공지능 기술의 구조 및 동향 분석: 특허 및 오픈소스 프로젝트를 중심으로 :Analyzing technological structure and trends of artificial intelligence: Using patent and open source project data

이왕재

추천

검색

자료유형: 학위논문

저자정보: 이왕재 (서울과학기술대학교, 서울과학기술대학교 대학원)

지도교수: 이학연

발행연도: 2021

저작권: 서울과학기술대학교 논문은 저작권에 의해 보호받습니다.

이용수123

이 논문의 연구 히스토리 (3)

2021

인공지능 기술의 구조 및 동향 분석: 특허 및 오픈소스 프로젝트를 중심으로

이왕재 2021.01 학위논문

2020

인공지능 기술의 Nowcasting: 오픈소스 프로젝트 네트워크를 중심으로

이왕재 , 이학연 대한산업공학회 추계학술대회 논문집 2020.11 학술대회자료

깃허브 오픈소스 프로젝트 데이터를 활용한 인공지능 기술 개발 동향 분석

이왕재 , 이학연 대한산업공학회지 2020.10 학술저널

이 논문의 후속연구가 궁금하신가요?
연관 학술논문 또는 학술발표를 통해 보다 발전된 연구결과를 확인하실 수 있습니다.
이 논문의 연구 히스토리 확인하기

초록· 키워드

오류제보하기

존 매카시가 1955년 ‘인공지능(Artificial intelligence, AI)’이라는 용어를 처음 사용한 이래 인공지능은 컴퓨터의 발전과 더불어 빠르게 성장하였으며 현재 산업 전반과 인류 생활에 큰 영향을 미치고 있다. 인류의 미래를 이끌 핵심 기술인 인공지능은 주요 산업과 연계하여 발전하고 있으며 기업과 국가의 중요한 성장 동력으로 주목받고 있다. 글로벌 기업과 세계 각국은 인공지능 기술 확보와 기술 역량 수준을 높이기 위해 연구개발에 집중 투자하고 있으며 제품과 서비스로 시장에서 치열한 경쟁을 벌이고 있다.
따라서 최근 개발되고 있는 인공지능 기술에 대한 관심이 세계적으로 크게 증가하고 있다. 인공지능 기술에 대한 다양한 분석이 시도되었지만 정성적 접근을 통한 연구가 많았으며, 이는 연구자의 주관적 견해에 의존한다는 한계가 있다. 논문과 특허 데이터를 바탕으로 한 정량적 연구도 있었지만 이러한 연구는 최근 새롭게 진행되고 있는 인공지능 기술을 반영하는 데에는 어느 정도 한계가 있다. 왜냐하면 논문과 특허의 경우 게재와 등록에 오랜 시간이 걸리는 제약이 있어 짧은 기간에도 많은 변화가 있는 인공지능 기술을 충분히 반영하기 어렵기 때문이다. 또한 기존의 인공지능 기술 연구 중 다수가 전반적인 기술 동향만을 파악하였고 기업별 또는 기술별로 상세한 분석은 이루어지지 않았다.
이에 본 학위 논문은 인공지능 기술구조를 파악하기 위해 데이터 기반의 정량적 분석을 수행하였고, 이를 통해 최근 연구 개발되고 있는 인공지능 기술을 실증적으로 확인하였다. 기업의 보유한 특허 데이터에 LDA(latent Dirichlet allocation) 토픽 모델링 기법을 적용하여 글로벌 IT기업들이 개발하여 확보하고 있는 인공지능 기술 현황을 분석하였으며 유망 기술과 쇠퇴 기술을 도출하였다. 특히 글로벌 IT기업별로 강점 기술을 분석하여 인공지능관련 우수기업을 확인하였다. 최근에 진행되고 있는 인공지능 기술의 상세 동향을 파악하기 위해 깃허브(GitHub) 프로젝트 데이터를 활용하였으며, 개발자 커플링(developer coupling)이란 새로운 접근 방법으로 네트워크 분석을 적용하였다. 이를 통해 실시간 개발되고 있는 인공지능 기술을 파악하였고, 또한 특성에 따른 기술 매트릭스를 적용해 인공지능 기술의 유형별 특징을 분석하였다.
첫 번째 연구에서는 특허 데이터를 활용하여 인공지능 기술구조를 파악하고 주요 글로벌 IT기업들의 인공지능 기술역량을 분석하였다. 2007년부터 2017년까지 미국 특허청에 등록된 2,589개의 인공지능 기업 특허를 수집하여 활용하였으며, 수집된 특허의 제목과 초록 텍스트 데이터를 정제하여 LDA 토픽 모델링을 수행하였고 인공지능 분야의 20개 기술 토픽을 도출하였다. 도출된 기술 토픽을 분류하여 인공지능 기술구조를 정의하였고 연도별 토픽 비중의 변화를 분석하여 유망토픽과 쇠퇴토픽을 파악하였다. 분석결과 기업들은 인공지능 기술 분야에서 Language Understanding, Speech Technology 보다는 Computer Vision, Data Analysis, Motion Control, 그리고 Machine Learning 분야의 연구개발에 보다 집중하는 것으로 파악되었다. 특히 자율주행, 스마트 홈 등에 관심이 높아짐에 따라 사물 인식(object detection), 기기 제어(device control)와 같은 Computer Vision, Motion Control 기술 성장이 두드러졌으며 검색 기술(search technique), 예측 분석(predictive analytics)과 같은 Data Analysis, Machine Learning 분야 기술도 유망한 것으로 나타났다. 또한 주요 기업의 인공지능 기술 역량을 분석한 결과 Microsoft는 Machine Learning, IBM은 Language Understanding, Google은 Data Analysis, Amazon은 Speech Technology 기술에 강점을 갖고 있었다. 최근 급성장하고 있는 유망 인공지능 기술의 보유 측면에서는 Google이 보다 많은 강점을 갖고 있어 인공지능 분야에서 Google이 새로운 강자임을 확인할 수 있었다.
두 번째 연구에서는 소프트웨어 개발이 실시간으로 진행되는 깃허브 프로젝트 데이터를 활용하였고, 네트워크 분석을 통해 인공지능 기술을 도출하였으며 기술 매트리스를 제시하여 기술 유형을 분석하였다. 먼저 2013년부터 2019년까지 깃허브의 40,122개 인공지능 프로젝트 정보를 수집 후 데이터베이스를 구성하여 연도별 인공지능 프로젝트 생성 추이를 분석하였고, 그 중 참여 개발자 등을 바탕으로 224개의 주요 프로젝트를 선별하였다. 프로젝트에 참여하는 개발자 관계를 기반으로 네트워크를 구축하고 중심성 분석을 통해 인공지능 개발에서 영향력이 큰 프로젝트들을 밝혀냈다. 네트워크 클러스터링 기법을 적용하여 Computer Vision, AI Content Curation, AI Classifier, Machine Learning, Natural Language Processing, AutoML, Lightweight Deep Learning과 같은 7개의 인공지능 기술을 도출하였다. 특히 네트워크 분석 시 프로젝트 간 연관 관계를 분석하는데 개발자 커플링이라는 새로운 접근법을 제시하였다. 네트워크 클러스터링 분석으로 도출된 기술영역 중 Computer Vision이 가장 큰 비중을 차지했으며 AutoML, Lightweight Deep Learning 같은 신기술이 활발하게 개발되고 있음을 확인하였다. 또한 스타(star)와 커밋(commit) 수를 기준으로 인공지능 기술 매트릭스를 구성하고 프로젝트와 기술을 Hot, Maniac, Potential, Mature/Untouchable와 같은 4개의 유형으로 나누어 분석하였다. Hot 유형에서는 Computer Vision과 Natural Language Processing이 인기도 많고 활발하게 개발되는 기술들로 확인되었으며, 이 기술들은 자율주행, 인공지능 비서 등 인기 있는 제품과 서비스에서 많이 활용됨을 알 수 있었다. 인기와 무관하게 개발자 참여가 높은 Maniac 유형에는 Machine Learning, Lightweight Deep Learning이 나타났는데 Google, Intel, IBM 등 기업 프로젝트가 중심에 있었다. 성숙단계에 접어들었거나 수준 높은 기술 유형인 Mature 또는 Untouchable 유형에는 AI Content Curation, AI Classifier, AutoML 기술이 높게 나타났다. 그리고 잠재적 성장 가능성이 많은 Potential 유형에는 여러 인공지능 기술이 높게 나타나 인공지능에 대한 다양한 기술 개발이 이루어지고 있음을 알 수 있었다.
본 논문에서 활용된 데이터 기반의 토픽 모델링과 네트워크 분석 등의 연구 방법은 인공지능 기술 구조와 연구개발 동향을 실증적으로 분석할 수 있는 접근법을 제시했다. 또한 이번 연구에서 수행한 데이터 기반의 기업 기술역량 분석과 기술 매트릭스를 통한 기술 유형 분석 등은 기업의 인공지능 기술 전략 수립뿐만 아니라 국가 차원의 인공지능 연구정책 수립에 유용하게 활용될 수 있을 것으로 기대된다.

Since John McCarthy first used the term “artificial intelligence (AI)” in 1955, this type of technology has rapidly developed alongside advancements in computers, and it currently has a significant influence on industry and human life. AI, a core technology that will lead the future of mankind, is developing in parallel with various industries and is attracting attention as a key growth engine for companies. Globally, industries and countries are intensively investing in research and development (R&D) to secure AI technology and increase their technological capabilities. This has led to a fiercely competitive market of AI products and services.
Accordingly, interest in recently developed practical AI technology is increasing worldwide. Various analyses of AI technology have been conducted, and several studies have adopted a qualitative approach in this regard; however, such an approach has a limitation that it depends only on the subjective opinions of the researcher. Conversely, quantitative studies based on theory and patent data have experienced difficulties in reflecting new and ongoing AI technology. This is because papers and patents take a long time in getting published and registered. As such, there is a limit in sufficiently reflecting the AI technology that has changed in a short period of time. In addition, many AI technology studies identified only overall technology trends, while detailed analyses by company or technology have not been conducted.
In this study, we performed a quantitative analysis based on data to better understand the structure of AI technology. By doing so, we present an empirical understanding of AI technology that has been recently researched and developed. By applying the latent Dirichlet allocation topic-modeling technique to a company’s patent data, we analyzed the state of AI technologies being developed and secured by global corporations along with the established promising new technologies and those indicating a decline in use. In particular, by analyzing strong technologies for each global information technology (IT) company, outstanding companies related to AI technology were identified. In addition, GitHub data were extracted to better understand the detailed trends in AI technology currently being developed. We identified AI technologies that are being developed in real-time in open-source format by conducting network analysis using a newly proposed method called “developer coupling” and analyzed the characteristics of each technology type by presenting a technology matrix.
In the first research, the AI technology structure was identified using patent data. Additionally, the AI technology capabilities of major global IT companies were also analyzed. From 2007 to 2017, a total of 2,589 AI company patents in the United States Patent Office were collected and utilized. Topic modeling was performed with the collected patent titles and abstract text data, and 20 topics in the field of AI were derived. By classifying the derived topics, the technology structure of AI was defined, and change in the portion of each topic by year was analyzed to distinguish between promising and declining topics. On the basis of the analysis, it was found that companies were more focused on R&D in the fields of Computer Vision, Data Analysis, Motion Control, and Machine Learning as compared to Language Understanding and Speech Technology in the field of AI development. In particular, with the growth of interest in autonomous driving and smart homes, Computer Vision and Motion Control technologies, such as object detection and device control, have undergone remarkable development. Data Analysis and Machine Learning technologies, such as search techniques and predictive analytics, have also been identified as promising. In addition, as a result of analyzing the AI technology capabilities of major companies, Microsoft has developed expertise in Machine Learning, International Business Machines (IBM) has developed Language Understanding, Google has developed skills in Data Analysis, and Amazon is a leader in Speech Technology. In terms of having promising AI technology, which has rapidly grown in recent years, Google was found to have several advantages, confirming its status as a new leader in the field of AI.
In the second research, GitHub data, in which software development proceeds in real-time, was used. AI technology was derived through network analysis and technology type was analyzed by presenting a technology matrix. First, GitHub’s 40,122 AI project information from 2013 to 2019 was collected and a database was created to analyze the yearly project creation trends. Among them, 224 major projects were selected according to the number of participating developers and other factors. The network was established based on the relationships of developers participating in the projects, and projects with significant influence on AI development were identified using centrality analysis. By applying the network clustering technique, seven AI technology areas were derived: Computer Vision, AI Content Curation, AI Classifiers, Machine Learning, Natural Language Processing, AutoML(automated machine learning), and Lightweight Deep Learning. In particular, a new measurement method called “developer coupling” is proposed to analyze the relationship between projects in network analysis. Among the technology areas derived from network analysis, Computer Vision occupied the largest portion, and it was confirmed that newly emerging technologies, such as AutoML and Lightweight Deep Learning are actively being developed. Additionally, based on the number of stars and commits, an AI technology matrix was constructed and projects and technologies were analyzed by dividing them into four types: hot, maniac, potential, and mature or untouchable. Computer Vision and Natural Language Processing are classified into hot types and are popular and actively being developed. These technologies are being widely used as core technologies in new products and services such as autonomous driving and AI assistants. Regardless of the popularity, Machine Learning and Lightweight Deep Learning were prevalent in the maniac type (with high developer participation); here, corporate projects such as Google, Intel, and IBM were at the center focus. In the mature or untouchable types, AI Content Curation, AI Classifiers, and AutoML technologies ranked highly (mature or high-level technologies). Additionally, in the potential type (with high growth potential), all technologies ranked highly, indicating that various AI technologies are being developed.
The data-based topic-modeling and network analysis approach used in this thesis presents a method for empirically analyzing the structure of AI technology and R&D trends. In addition, the data-based corporate technology capabilities and technology type analysis using the technology matrix performed in this study are expected to be useful for planning corporate AI technology strategies and establishing AI research policies at national level.

#인공지능 #기술구조 #기술동향 #토픽모델링 #네트워크분석 #특허 #오픈소스

Ⅰ. 서 론 1
1. 연구 배경 및 필요성 1
2. 연구 목적 5
3. 연구 구성 및 범위 7
Ⅱ. 인공지능 기술 배경 10
1. 인공지능의 역사 10
1) 인공지능 연구의 시작 10
2) 인공지능의 발전과 침체 13
3) 인공지능의 도약 17
2. 인공지능의 현황 20
1) 인공지능 개념 20
2) 인공지능 산업 25
3) 인공지능 전략 31
3. 인공지능 기술 분석 연구 43
1) 인공지능 접근방법 43
2) 인공지능 기술에 대한 정성적 연구 45
3) 인공지능 기술에 대한 정량적 연구 47
Ⅲ. 특허 기반의 인공지능 기술 분석 50
1. 개요 50
2. 기존 연구 및 배경 이론 51
1) 기존 연구 51
2) 특허 텍스트 마이닝 52
3) LDA 토픽 모델링 55
3. 연구 방법 58
1) 연구 프레임워크 58
2) 데이터 수집 59
3) 토픽 모델링 60
4. 인공지능 기술 구조 61
1) 인공지능 기술 토픽 61
2) 기술 구조 분석 64
3) 유망 토픽과 쇠퇴 토픽 65
5. 인공지능 기술 역량 68
1) 기업별 중심 기술 분석 70
2) 분야별 우수 기업 분석 74
6. 소결 78
Ⅳ. 프로젝트 기반의 인공지능 기술 분석 80
1. 개요 80
2. 기존 연구 및 배경 이론 81
1) 기존 연구 81
2) 오픈소스 프로젝트 82
3) 서지 커플링 84
4) 네트워크 분석 방법론 85
5) 네트워크 클러스터링 87
3. 연구 방법 88
1) 연구 프레임워크 88
2) 데이터 수집 89
3) 네트워크 구축 및 클러스터링 90
4. 인공지능 프로젝트 네트워크 92
1) 네트워크 구축 92
2) 네트워크 중심성 분석 93
3) 기술 클러스터 분석 97
5. 인공지능 기술 매트릭스 104
1) 매트릭스 개요 104
2) 기술 유형 분석 105
6. 소결 109
Ⅴ. 결론 111
1. 요약 및 시사점 111
2. 한계점 및 추후 연구과제 115
참고문헌 117
부 록 133
부록 1. 글로벌 IT기업의 토픽 비중 133
부록 2. 글로벌 IT기업의 토픽별 특허 수 137
부록 3. 인공지능 기술 매트릭스의 유형별 프로젝트 141
영문초록(Abstract) 147

최근 본 자료

전체보기

구분	그룹	데이터 항목
AI 학습용 데이터	원문	원문 PDF 파일
AI 학습용 데이터	원문 + 메타 (기본/상세)	원문 PDF 파일 및 서지정보 CSV
대량 구매용 데이터	B2B 구독 방식	특정 자료 한정으로 원문 접근 권한 부여
대량 구매용 데이터	URL 전달 방식	바로 PDF 뷰어를 열람할 수 있는 URL 제공

구분	그룹	데이터 항목
AI 학습용 데이터	기본 메타	발행기관명, 간행물명, 권호명, 권(vol), 호(issue), 통권, 발행연도, 발행월, 논문명, 저자명, 시작페이지, 종료페이지, 전체페이지, 상세페이지URL
상세 메타 데이터	발행기관 메타	발행기관 이명, 영문명, 창립연도, 홈페이지URL, 발행기관 소개
	간행물 메타	부제목, 간행물 유형, ISSN, ISBN, 최초발행연도, 폐간연도, 간행빈도, 발행주기, 등재사항, 이용수, 피인용수, 권호수, 논문수, 표지이미지
	논문 메타	작성 언어, 부제목, 대등제목, 목차, 키워드, 초록, 이미지, 참고문헌, 이용수, 피인용수, 논문활용도, DBpia통합주제분류, KDC분류, DDC분류, 한국연구재단분류, UCI, DOI
	저자 메타	소속기관, 소속부서, 직급, 연구분야, 연구키워드, 이용수, 피인용수, 저자 논문활용도

구분	그룹	데이터 항목
※ 결합형/맞춤형 메타 데이터는 신청 내용에 따라 다양하게 제공 가능
이용순위 정보	주제분야별 많이 이용된 논문	“인문학”에서 많이 이용된 논문 TOP100
	이용기관별 많이 이용된 논문	“중고등학교”에서 많이 이용된 논문 TOP100
	세부기관별 많이 이용된 논문	“서울대학교”에서 많이 이용된 논문 TOP100
	키워드별 많이 이용된 논문	“Chat GPT”에서 많이 이용된 논문 TOP100
키워드 정보	많이 이용된 키워드	특정기간/분야/저널 내 많이 이용된 키워드
	많이 발행된 키워드	특정기간/분야/저널 내 많이 발행된 키워드
	많이 검색된 키워드	특정기간/분야/저널 내 많이 검색된 키워드
	연구 트렌드 키워드	특정 키워드 연관 연구동향 분석 데이터 키워드

논문 기본 정보

이 논문의 연구 히스토리 (3)

초록· 키워드

목차

최근 본 자료

댓글(0)