Keyword Extraction from Arabic Text using the Page Rank Algorithm
Meran M. Al Hadidi1, Muath Alzghool2, Hasan Muaidi3
1S.Dhanasekar*, Vellore Institute of Technology , Chennai Campus, Chennai (Tamil Nadu), India.
Manuscript received on September 16, 2019. | Revised Manuscript received on 24 September, 2019. | Manuscript published on October 10, 2019. | PP: 3495-3504 | Volume-8 Issue-12, October 2019. | Retrieval Number: L26141081219/2019©BEIESP | DOI: 10.35940/ijitee.L2614.1081219
Open Access | Ethics and Policies | Cite | Mendeley | Indexing and Abstracting
© The Authors. Blue Eyes Intelligence Engineering and Sciences Publication (BEIESP). This is an open access article under the CC-BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/)
Abstract: This paper describes how keywords are extracted from Arabic text using the page rank algorithm, by constructing a graph whose vertices are formed by candidate words that are extracted from the title and the abstract of a given Arabic text after applying a tagging filter to that text. Next, a co-occurrence relation is applied to draw the edges between the vertices within specified window sizes. Then, the page rank algorithm is applied to the graph to rank the importance of each keyword. Finally, the vertices are sorted in descending order by their page rank scores and the tokens with highest scores are chosen as the keywords. Several experiments were conducted on a dataset that consisted of 100 Arabic academic articles for training and 50 for testing. The results were evaluated by using precision, recall, and the F-measure. The maximum recall achieved on the dataset was 63%, as not all the manually identified keywords and keyphrases existed in the article abstracts and titles. The proposed method achieved 25% of recall, which is acceptable as it is comparable to that of a method in the literature that was applied to an English language testing dataset that consisted of 500 English documents, which achieved 42% of recall where the maximum recall percentage of the testing dataset was 78%. Despite the difficulties and challenges in searching for keywords in the Arabic language and using fewer documents in the Arabic testing dataset than in the English, it can be concluded that the proposed keyword and keyphrase extraction system using the page rank algorithm works well.
Keywords: Text Processing, Arabic Language, TextRank Algorithm, Keyword Extraction
Scope of the Article: Algorithm Engineering