Improving the Process of Identifying Internally Displaced Persons Using Big Data Technologies
Hakima Fathi Mahamoud¹, Raja Rajeswari Ponnusamy², Ho Ming Kang³, Jacob Sow Tian You⁴

¹Hakima Fathi Mahamoud, School of Computing, Asia Pacific University of Technology and Innovation, Malaysia.

²Raja Rajeswari Ponnusamy, School of Mathematics, Actuaries and Quantitative Studies, Asia Pacific University of Technology and Innovation, Malaysia.

³Ho Ming Kang, School of Mathematics, Actuaries and Quantitative Studies, Asia Pacific University of Technology and Innovation, Malaysia.

⁴Jacob Sow Tian You, School of Media, Art and Design, Asia Pacific University of Technology and Innovation, Malaysia.

Open Access | Editorial and Publishing Policies | Cite | Mendeley | Indexing and Abstracting
© The Authors. Blue Eyes Intelligence Engineering and Sciences Publication (BEIESP). This is an open-access article under the CC-BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/)

Abstract: This data-driven project is systematically contributing on enhancing the conflict-violence or disaster-related displacement within an internationally recognized state border, namely internal displacement. With the availability of a training set with pre-defined categories, the project tackles document classification and information retrieval applications through supervised machine learning. This research can be divided into three core objectives. Firstly to eradicate non-relevant documents by filtrating documents not in English and not providing information on human mobility related to internal displacement. Secondly, to tag documents relatively to the themes Internal Displacement Monitoring Centre (IDMC) used to monitor the causes behind internal displacement, notably conflict/violence or disasters. Thirdly, to extract vital displacement information reported in online sources, such as location, displacement figures, etc. Documents are further analysed by training them using Support Vector Machine for tagging and Multinomial Naïve Bayes for information extraction, added to the pre-processing operations such as mainly working on natural language processing annotators, since the training set is mainly composed of textual documents. Finally, after having adjusted the parameters and learning, the performance of each of the resulting functions, notably Support Vector Machine and Multinomial Naïve Bayes on the training set, were measured on two different test sets, one for tagging and the other for information retrieval. By evaluating the provided dataset, the results were good with a result of 95.83% for classification and 81% for information retrieval.

Keywords: Document Classification, Information Retrieval, Support Vector Machine, Multinomial Naïve Bayes.
Scope of the Article: Classification

Download PDF

JOURNAL

REQUIREMENTS

PRODUCT

CONTACT US

ES2124017519

JOURNAL

REQUIREMENTS

PRODUCT

CONTACT US