Analyze K-Value Selected Method of K-Means Clustering Algorithm to Clustering Province Based on Disease Case
Septian Wulandari

Septian Wulandari, Program StudiInformatika, Universitas Indraprasta PGRI Jakarta, Indonesia. 

Manuscript received on 09 January 2020 | Revised Manuscript received on 05 February 2020 | Manuscript Published on 20 February 2020 | PP: 121-124 | Volume-9 Issue-3S January 2020 | Retrieval Number: C10280193S20/2020©BEIESP | DOI: 10.35940/ijitee.C1028.0193S20

Open Access | Editorial and Publishing Policies | Cite | Zenodo | Indexing and Abstracting
© The Authors. Blue Eyes Intelligence Engineering and Sciences Publication (BEIESP). This is an open-access article under the CC-BY-NC-ND license (

Abstract: Disease cases throughout Indonesia has increased as seen from the Indeks Pembangunan Masyarakat (IPKM). Globalization has the effect of increasing human mobility across provinces, thus accelerating the process of spreading epidemics that could pose a threat for Indonesia. The speed of action from government is needed to reducing the level if outbreaks of the disease. For this reason, accuracy from the government is needed to solving this problem. The data were taken from data disease cases in 2015 which consisted of 34 provinces in Indonesia based on the Central Statistics Agency in Indonesian. In K-Means clustering, determining of K-value is needed because it affects in convergence results. To solve this problem, this research analyzes three methods of K-Value, there are Silhouette, Elbow, and Gap Statistics Methods.The result of testing three methods of determining K-value obtained execution times on Silhouette 13.09s, Elbow 14.76s, and Gap Statistics 20.28s. So, choosing Silhouette method produces 2 optimal clusters, there are low cluster level (C1) and high cluster level (C2). The correlation matrix to understand the relationship between each disease is performed and a value of 0.88 is obtained there is the strong linear correlation between Pneumonia and Pulmonary TB. Then, modeling the relationship between these two variables by fitting linear equations. The results of C1 cluster based on disease cases were obtained 32 provinces and for C2 cluster were 2 provinces there areWest Java and East Java. Based on the results of the clustering can be input to the Indonesian government to tackle disease cases in all provinces in Indonesia.

Keywords: Data Mining; Disease; K-Value; K-Means; Clustering.
Scope of the Article: Clustering