Devnagari Script Categorization by Utilizing CNN and KNN
Sarika T. Deokate1, Nilesh J. Uke2
1Sarika T. Deokate, Research Scholar, Department of Computer Engineering, JJTU (Rajasthan), India.
2Dr. Nilesh J. Uke, Principal, Department of Computer Engineering, TAE Pune (Maharashtra), India.
Manuscript received on 07 March 2019 | Revised Manuscript received on 20 March 2019 | Manuscript published on 30 March 2019 | PP: 1136-1140 | Volume-8 Issue-5, March 2019 | Retrieval Number: E3138038519/19©BEIESP
Open Access | Ethics and Policies | Cite | Mendeley | Indexing and Abstracting
© The Authors. Blue Eyes Intelligence Engineering and Sciences Publication (BEIESP). This is an open access article under the CC-BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/)
Abstract: In the present days of digitized era, the work is carried out on the document analysis, categorization and dealing out it digitally. Therefore, in this work we introduced the model, which works well with the categorization of the Devnagari scriptMarathi. We proposed this investigation for the Type-scripted manuscripts and tested on some handwritten manuscript. As spurious images may not generate the superior ending product, so it necessitated eradicating the noise and so utilized the Gaussian Approach with Otsu’s approach. Once pre-processing is accomplished, the fragmentation of the content of the manuscript image executed. To excerpt the lines, the morphological manoeuvres utilized in this work and generated the superior consequences for the manuscripts. Similarly, the individual words are taken out from the line images, by utilizing the horizontal projection approach. However, to excerpt the character/symbols from the words, we utilized the combination of shirorekha removal with vertical projection scheme. The tolerance factor estimated to fragment the characters perfectly. KNN and CNN categorizers utilized to categorize the characters. The result illustrated for the varying size of the dataset. We got average 96% of precision for the KNN categorizer, when evaluated on the test and trained dataset with k=1 to 5. For CNN categorizer the outcome we got is with 100% of precision for the trained and validated dataset. Then we processed the letters with proper post processing by utilizing the Unicode approach. The transformation simplified through the construction and respective labeling of our created dataset. This study can be utilized further for the visor and disabled people. In further study, concentration will be on the generalized approach for both type-printed and handwritten manuscripts.
Keyword: CNN, Image Enhancement, Document Analysis, KNN .
Scope of the Article: Distributed Mobile Applications Utilizing IoT