I30250789S319 - International Journal of Innovative Technology and Exploring Engineering (IJITEE)

Text Independent Speaker Identification with Prosody Features in Presence of Noise
S.M. Jagdale¹, A.A.Shinde², J.S.Chitode³

¹S.M. Jagdale, Ph.D. Research Scholar, Bharati Vidyapeeth (Deemed to be) University COE, Pune, Maharashtra, India.

²A.A.Shinde, Department of Electronics, Bharati Vidyapeeth (Deemed to be University) COE, Pune, Maharashtra, India.

³J.S.Chitode, Department of Electronics Bharati Vidyapeeth (Deemed to be University) COE, Pune, Maharashtra, India.

Open Access | Editorial and Publishing Policies | Cite | Mendeley | Indexing and Abstracting
© The Authors. Blue Eyes Intelligence Engineering and Sciences Publication (BEIESP). This is an open-access article under the CC-BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/)

Abstract: Automatic recognition of Meta data of a speaker apart from recognizing only his or her identity is a challenging task. It gives rich behavioral characteristics of a person.Maximum work have been done in speaker recognitionon low level spectral features. Which gives good accuracy with minimum error, but they ignore other information about the speaker. Also in spectral variations, in session variations and in channel variations these features give degraded performance. State-of-the-art systems for text-independent speaker identification use Mel Frequency cepstral coefficients (MFCCs) as main features. Generally this system performs very good under clean conditions and acceptable under matched conditions. Under mismatched conditions, however, performance significantly deteriorates. One of the principal reasons for poor performance in these conditions is because of the nature of low-level features; being spectral, they are susceptible to spectral variations due to noise and channel effects.Prosodic features are used successfully in these variation conditions as well as in presence of noise.In this paper multi SNR environment is considered. Recognition accuracy has been calculated at different SNR levels i.e. 15 dB, 25 dB and 35 dB.Also results are tested at different types of noise such as Traffic noise, cockpit noise, babble noise and fan noise. It has been found that combining prosodic features such as pitch, energy and formants gives improved performance.

Keywords: Prosodic, Spectral, MFCC, SNR
Scope of the Article: Computer Vision

Download PDF

JOURNAL

REQUIREMENTS

PRODUCT

CONTACT US