Metric to Determine Language Complexity Using Dictionary Method Percentage Retrieval
Devasish Pal1, N.V. Ganapathi Raju2, Gautam Pal3
1Dr Devasish Pal, Department of IT, MJCET, Hyderabad, India.
2Dr N V Ganapathi Raju, Department of CSE, GRIET, Hyderabad, India.
3Mr Gautam Pal, Department of Intelematics, Melbourne, Australia.
Manuscript received on 30 June 2019 | Revised Manuscript received on 05 July 2019 | Manuscript published on 30 July 2019 | PP: 2547-2551 | Volume-8 Issue-9, July 2019 | Retrieval Number: I8223078919/19©BEIESP | DOI: 10.35940/ijitee.I8223.078919
Open Access | Ethics and Policies | Cite | Mendeley | Indexing and Abstracting
© The Authors. Blue Eyes Intelligence Engineering and Sciences Publication (BEIESP). This is an open access article under the CC-BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/)
Abstract: For communication through computer network, previously only English language using ASCII mode was used. Subsequently when Unicode was introduced, computer communication became a possibility for all language texts. This aspect generated interest in the field of language processing. Various studies have been carried out on language processing and its complexity issues. Various metrics were used to determine language complexity such as lexical density, morphological density, semantics etc. but there was no consistency in results. A language which appears most complex using one metric does not appear the same using other metric. This paper introduces a new metric to determine the complexity of a language which is consistent and with proven results. It introduces the concept of network security where using dictionary method, the percentage retrieval of an encrypted text is calculated using an encryption algorithm, fixed length key, fixed corpus size etc. Lesser is the percentage retrieval, greater is the security and language complexity. Comparison has been made with the results on language complexity independently carried out on various Indian languages by the research scholars of Central University, Hyderabad based on Morphological and lexical density. Pattern observed on their eight Indian languages by the research scholars of Central University and the percentage retrieval on the same Indian languages in my work are identical which proves my work. Hence it can be concluded that lesser is the percentage retrieval, security increases for the sample text data considered and proportionately the complexity of that particular language increases Sample data encryption has been carried out using substitution method.
Keywords: language complexity, Dictionary file, Coded file, Morphology, Lexical, Percentage Retrieval
Scope of the Article: Natural Language Processing