Efficiency of Various Time-Frequency Representations in Deep Neural Network based Passive Sonar Target Classifiers
Suraj Kamal1, Satheesh Chandran C.2, Supriya M.H.3

1Suraj Kamal*, Department of Electronics, Cochin University of Science and Technology, Kochi, India.
2Satheesh Chandran C., Department of Electronics, Cochin University of Science and Technology, Kochi, India.
3Supriya M.H., Department of Electronics, Cochin University of Science and Technology, Kochi, India.
Manuscript received on January 14, 2020. | Revised Manuscript received on January 27, 2020. | Manuscript published on February 10, 2020. | PP: 1908-1918 | Volume-9 Issue-4, February 2020. | Retrieval Number: D1662029420/2020©BEIESP | DOI: 10.35940/ijitee.D1662.029420
Open Access | Ethics and Policies | Cite | Mendeley
© The Authors. Blue Eyes Intelligence Engineering and Sciences Publication (BEIESP). This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/)

Abstract: Passive acoustic target classification is an exceptionally challenging problem due to the complex phenomena associated with the channel and the relatively low Signal to Noise Ratio (SNR) manifested by the pervasive ambient noise field. Inspired by the overwhelming success of Deep Neural Networks (DNNs) in many such hard problems, a carefully crafted network specifically for target recognition application has been employed in this work. Although deep neural networks can learn characteristic features or representations directly from the raw observations, domain specific intermediate representations can mitigate the computational requirements as well as the sample complexity required to achieve an acceptable error rate in prediction. As the sonar target records are essentially a time series, spectro-temporal representations can make the intricate relationship between time and spectral components more explicit. In a passive sonar target recognition scenario, since most of the defining spectral components reside at the lower part of the spectrum, a nonlinear dilated spectral scale having an emphasis on low frequencies is highly desirable. This can be easily achieved using a filterbank based time-frequency decomposition, which allows more filters to be positioned at the desired frequency ranges of interest. In this work, a rigorous analysis of the performance of time-frequency representations initialized at various frequency scales, is conducted independently as well as in combination. A convolutional neural network based spectro-temporal feature learner has been utilized as the initial layers, while a deep stack of Long Short Term Memories (LSTMs) with residual connections has been used for learning the intricate temporal relationships hidden in the intermediate representations. From the experimental results it can be observed that a linear scale spectrogram achieves an accuracy of 92.4% and 90.2% respectively for validation and test sets in the single feature configuration, whereas the gammatone spectrogram is capable of attaining an accuracy in the order of 96.7% and 96.1% respectively for the same. In a multifeatured setup however, the accuracy reaches up to 97.3% and 96.6% respectively, which reveals that a combination of properly initialized intermediate representations can improve the classification performance significantly. 
Keywords: Deep Neural Network, Passive Sonar, Residual LSTM, Time-Frequency Representations.
Scope of the Article:  Neural Network