Voice Activity Detection Using Weighted K-Means Thresholding Algorithm
Alimi Sheriff1, Yussuff I. O. Abayomi2
1Alimi Sheriff, Department of Computer Science, Babcock University, Ilishan Remo (Ogun State), Nigeria.
2Yussuff I. O. Abayomi, Associate Professor, Department of Electronic and Computer Engineering, Lagos State University, Epe (Lagos), Nigeria.
Manuscript received on 15 January 2025 | First Revised Manuscript received on 18 January 2025 | Second Revised Manuscript received on 17 February 2025 | Manuscript Accepted on 15 March 2025 | Manuscript published on 30 March 2025 | PP: 1-7 | Volume-14 Issue-4, March 2025 | Retrieval Number: 100.1/ijitee.D105114040325 | DOI: 10.35940/ijitee.D1051.14040325
Open Access | Editorial and Publishing Policies | Cite | Zenodo | OJS | Indexing and Abstracting
© The Authors. Blue Eyes Intelligence Engineering and Sciences Publication (BEIESP). This is an open access article under the CC-BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/)
Abstract: Voice activity detection (VAD) separates speech segments from silent segments of an audio signal, and it is valuable for many speech-processing applications because it assists in improving performance and system efficiency; such applications include speech recognition and speaker verification. In this study, K-means, a clustering algorithm, was extended to a thresholding algorithm termed K-means weighted thresholding and was utilized for discriminating voiced/speech segments from silent segments from audio or speech signals. The voice signal was fragmented into frames of 2048 samples, and the spectral power of the frames served as input for computing the threshold value by the extended k-means algorithm; hence, any frame whose spectral power is greater than or equal to the threshold value is considered to part of the voice segments; otherwise, it is tagged as a silent frame. The implemented voice activity detection system achieved outstanding performances with a true acceptance rate (sensitivity), false acceptance rate, true rejection rate (specificity), false rejection rate (miss rate), and a classification accuracy of 100%, 0.025%, 100%, 0%, and 99.97%, respectively.
Keywords: K-Means, Thresholding Algorithm, Voice Activity Detection.
Scope of the Article: Artificial Intelligence and Methods