Development of Research Proposal Selection Based on Domain Ontology using K-Means Categorical Clustering
Iyswarya E.1, Balamurugan M.2, Vinoth Kumar N.J.3

1Iyswarya E., Research Scholar, School of Computer Science, Engineering and Applications, Khajamalai Campus, Bharathidasan University, Tiruchirappalli, India.
2Balamurugan M., Professor and Head School of Computer Science, Engineering and Applications, Khajamalai Campus, Bharathidasan University, Tiruchirappalli, India.
3Vinoth Kumar N.J., Assistant Professor, Department of Electrical and Electronics Engineering, Government Polytechnic College, Nagercoil, India.
Manuscript received on February 10, 2020. | Revised Manuscript received on February 22, 2020. | Manuscript published on March 10, 2020. | PP: 1193-1199 | Volume-9 Issue-5, March 2020. | Retrieval Number: E2807039520/2020©BEIESP | DOI: 10.35940/ijitee.E2807.039520
Open Access | Ethics and Policies | Cite | Mendeley
© The Authors. Blue Eyes Intelligence Engineering and Sciences Publication (BEIESP). This is an open access article under the CC BY-NC-ND license (

Abstract: With the prompt improvement in research progress of various zones, selection of research proposals became a remarkable methodology in many research funding agencies and organizations. When a less number of research proposals are received, then it is ease to cluster the research proposals and the selection process became as non-problematic way. If a number of research proposals elevated, then the clustering and selecting the proposals became complicated. In current system, proposals grouping is done in manual-based or along with their similarities in subject disciplinaries which yield irrelevant results in some cases. The main goal of this research work is to develop an enhanced system in selection of research proposals based on Domain ontology, where the ontology acts as a searching criteria for the topics of research proposals. This proposed system will help to select the topics of research proposals in well-systematic way without the interference of manual progression. In this paper, an algorithm is proposed as Scikit-learn K-means Multiclass Document Clustering(SKMDC) to group each subject discipline according to their sub-topics and sub-domains. Here, the k-means clustering technique is implemented on categorical data to implement the clustering process. As, the categorical data are not able to applied directly in K-means clustering algorithm, the LabelEncoder method is implemented to encode the text data to numerical values and the dimensions of a dataset are reduced using Principal Component Analysis. This paper also overwhelms the weaknesses of k-means technique in specification of cluster number in initial stage. It is done through the determination of optimal number of clusters by using Elbow Curve method and it is cross-validated through Silhouette Score analysis.
Keywords: k-means Clustering, Principal Component Analysis, Elbow Curve, Silhouette Score, LabelEncoder, Research Proposal Selection, Domain Ontology
Scope of the Article:  Clustering,