Body Joints and Trajectory Guided 3D Deep Convolutional Descriptors for Human Activity Identification
N. Srilakshmi1, N. Radha2
1N. Srilakshmi*, Ph.D Scholar, Department of Computer Science, PSGR Krishnammal College for Women, Coimbatore, Tamil Nadu, India.
2Dr. N. Radha, Department of Computer Science, PSGR Krishnammal College for Women, Coimbatore, Tamil Nadu, India.
Manuscript received on September 14, 2019. | Revised Manuscript received on 23 September, 2019. | Manuscript published on October 10, 2019. | PP: 1016-1021 | Volume-8 Issue-12, October 2019. | Retrieval Number: K19850981119/2019©BEIESP | DOI: 10.35940/ijitee.K1985.1081219
Open Access | Ethics and Policies | Cite | Mendeley | Indexing and Abstracting
© The Authors. Blue Eyes Intelligence Engineering and Sciences Publication (BEIESP). This is an open access article under the CC-BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/)
Abstract: Human Activity Identification (HAI) in videos is one of the trendiest research fields in the computer visualization. Among various HAI techniques, Joints-pooled 3D-Deep convolutional Descriptors (JDD) have achieved effective performance by learning the body joint and capturing the spatiotemporal characteristics concurrently. However, the time consumption for estimating the locale of body joints by using large-scale dataset and computational cost of skeleton estimation algorithm were high. The recognition accuracy using traditional approaches need to be improved by considering both body joints and trajectory points together. Therefore, the key goal of this work is to improve the recognition accuracy using an optical flow integrated with a two-stream bilinear model, namely Joints and Trajectory-pooled 3D-Deep convolutional Descriptors (JTDD). In this model, an optical flow/trajectory point between video frames is also extracted at the body joint positions as input to the proposed JTDD. For this reason, two-streams of Convolutional 3D network (C3D) multiplied with the bilinear product is used for extracting the features, generating the joint descriptors for video sequences and capturing the spatiotemporal features. Then, the whole network is trained end-to-end based on the two-stream bilinear C3D model to obtain the video descriptors. Further, these video descriptors are classified by linear Support Vector Machine (SVM) to recognize human activities. Based on both body joints and trajectory points, action recognition is achieved efficiently. Finally, the recognition accuracy of the JTDD model and JDD model are compared.
Keywords: HAI, Body Joints, Optical Flow, JDD, JTDD, C3D, SVM
Scope of the Article: 3D Printing