A Comparative study on Data Crawling and Extraction of Climate Change Issues Using Machine Learning Technique
Do-Yeon Kim1, Dae-Yong Jin2, Kuk-Jin Han3, Seong-Taek Park4

1Do-Yeon Kim, Researcher Fellow, Department of Korea Environment Institute,  Sicheong-Daero, Sejong-Si, Republic of Korea, East Asian.

2Dae-Yong Jin, Research Fellow, Department of Korea Environment Institute, Sicheong-Daero, Sejong-Si, Republic of Korea, East Asian.

3Kuk-Jin Han, Researcher Fellow, Department of Korea Environment Institute,  Sicheong-Daero, Sejong-Si, Republic of Korea, East Asian.

4Seong-Taek Park, Professor, Sungkyunkwan University,  Seonggyungwan-Ro, Seoul, Republic of Korea, East Asian.

Manuscript received on 10 June 2019 | Revised Manuscript received on 17 June 2019 | Manuscript Published on 22 June 2019 | PP: 1062-1066 | Volume-8 Issue-8S2 June 2019 | Retrieval Number: H11800688S219/19©BEIESP

Open Access | Editorial and Publishing Policies | Cite | Mendeley | Indexing and Abstracting
© The Authors. Blue Eyes Intelligence Engineering and Sciences Publication (BEIESP). This is an open-access article under the CC-BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/)

Abstract: Many different problems are triggered in different fields such as society, culture, and economy due to constant climate change problems. As time goes by, its influences are mounting and national attention is increasing. Therefore, it is necessary to understand various issues and improve policies on climate change. If it is possible to analyze information from media outlet data created on real time by using text mining technique, various climate change issues can be understood. In this comparative study, therefore will collect news article data related to climate change, identify issues utilizing text mining, and see complex information through the detailed analysis considering the characteristics of the text. We crawled news related to climate change issues and analyzed related keywords in terms of cause, result (phenomenon), and response. First, we extracted news related to climate change by using keyword-based document extraction method and Latent Dirichlet Allocation (LDA)-based document extraction method. In addition, we propose four related keyword analysis methods using Word2Vec, which is one of word embedding methods, and keyword frequency based method. Methods proposed in this comparative study are expected to be used in extracting and analyzing data on other specific issues not upcoming climate change issues.

Keywords: Climate Change, Machine Learning, Natural Language Processing, LDA, Word2Vec.
Scope of the Article: Machine Learning