Object Detection and Tracking using YOLO v3 Framework for Increased Resolution Video
Shaikh Shakil Abdul Rajjak1, Abdul Kadir Kureshi2

1Shaikh Shakil A*, Department of Electronics Engineering, Pravara Rural Engineering Colege, Loni, India.
2Dr. A. K. Kureshi,, Director, Maulana Mukhtar Ahmad Nadvi Technical Campus, Malegaon, India.
Manuscript received on March 15, 2020. | Revised Manuscript received on March 25, 2020. | Manuscript published on April 10, 2020. | PP: 118-125 | Volume-9 Issue-6, April 2020. | Retrieval Number: E3038039520/2020©BEIESP | DOI: 10.35940/ijitee.E3038.049620
Open Access | Ethics and Policies | Cite | Mendeley
© The Authors. Blue Eyes Intelligence Engineering and Sciences Publication (BEIESP). This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/)

Abstract: The proposed system is used for vehicle detection and tracking from the high-resolution video. It detects the object (vehicles) and recognizes the object comparing its features with the features of the objects stored in the database. If the features match, then object is tracked. There are two steps of implementation, online and offline process. In offline process the data in the form of images are given to feature extractor and then after to the trained YOLO v3 model and weight files is generated form the pre-trained YOLO v3 model. In online phase, real-time video is applied to feature extractor to extract the features and then applied to the pre-trained YOLO v3 model. The other reference to YOLO v3 model pre-trained is the output of weight file. The YOLO v3 model process on the video frame and weight file extracted features, the model output is classified image. In YOLO v3 Darknet-53 is used along with Keras, some libraries with OpenCV, Tensor Flow, and Numpy. The proposed system is implemented on PC Intel Pentium G500, 8GB and operating system Windows 7 is used for processing our system. The system is tested on PASCAL VOC dataset and the results obtained are accuracy 80%, precision 80%, recall 100%, F1-Score 88%, mAP 76.7%, and 0.018%. The system is implemented using python 3.6.0 software and also tested using real-time video having 1280×720 and 1920×1080 resolutions. The execution time for one frame of video having resolution of 1280×720 (HD) and 1920×1080 (FHD) and 1280×720 (HD) are 1.840 second and 4.414808 seconds respectively with accuracy is 80%. 
Keywords: About Four Key Words or Phrases in Alphabetical Order, Separated by Commas.
Scope of the Article: Patterns and frameworks