An Efficient Model for TV broadcast Audio Classification through InceptionV3 and ResNet50
Kamatchy B.1, P. Dhanalakshmi2

1Kamatchy B. *, Research Scholar, Department of Computer and Information Science, Annamalai University, Indian.
1Dr. P. Dhanalakshmi, Professor, Department of Computer Science and Engineering, Annamalai University, India.
Manuscript received on February 10, 2020. | Revised Manuscript received on February 26, 2020. | Manuscript published on March 10, 2020. | PP: 2234-2238 | Volume-9 Issue-5, March 2020. | Retrieval Number: E2984039520/2020©BEIESP | DOI: 10.35940/ijitee.E2984.039520
Open Access | Ethics and Policies | Cite | Mendeley
© The Authors. Blue Eyes Intelligence Engineering and Sciences Publication (BEIESP). This is an open access article under the CC BY-NC-ND license (

Abstract: In the recent advancements of applications, one of the challenging task in many gadgets are incorporated, which is based on audio classification and recognition. A set of emotion detection after post-surgical issues, classification of various voice sequence, classification of random voice data, surveillance and speaker detection audio data act as a crucial input. Most of the audio data is inherent with the environmental noise or instrumental noise. Extracting the unique features from the audio data is very important to determine the speaker effectively. Such kind of a novel idea is evaluated here. The research focus is based on classification of TV broadcast audios in which the type of audio is being class separated through a novel approach. The design evaluates, the five different categories of audio data such as advertisement, news, songs, cartoon and sports from the data collected using the TV tuner card. The proposed design associated with python as a Development environment. The audio samples are converted to images using Spectrogram and then transfer learning is applied on the pretrained models ResNet50 and Inceptionv3 to extract the deep features and to classify the audio data. Inception V3 is compared here with the ResNet50 to get greater accuracy in classification. The pre-trained models are models that was trained on the ImageNet data set for a certain task and are used here to quick train the audio classification model on training set with high accuracy. The proposed model produces accuracy of 94% for Inceptionv3 which gives greater accuracy when compared with the ResNet50 which gives 93%. accuracy. Keywords: Audio Classification, Spectrograms, Inception modeling, and Board cast audio classification.
Keywords: Internet of Things, Energy Auditing, Cloud Computing, Sugar Industry, Process Control
Scope of the Article: Renewable Energy Technology