Keywords
Speech Recognition, PNCC, MFCC, KNN, DTW, SVM
Document Type
Research Paper
Abstract
Speech recognition is widely used in robot control and automation. Nevertheless, the use of speech recognition in robots is limited due to its susceptibility to background noise. This paper proposes a speech recognition algorithm to control robots in noisy environments. The proposed algorithm is based on Perceptual Linear Predictive Cepstral Coefficients (PNCC), which is a noise-resistant feature extraction technique, and Modified K-Nearest Neighbors (KNN) with Dynamic Time Warping (DTW) as the classifier. A new KNN-DTW classifier is proposed, integrating weighted KNN and DTW. The proposed algorithm results from experiments comparing PNCC and Mel-frequency cepstral coefficients (MFCC) feature extraction techniques with different classifiers, namely KNN-DTW, two types of KNN (weighted KNN and Medium-KNN), and two types of Support Vector Machine SVM (Linear SVM and Quadratic SVM). The database used to investigate the accuracy was the audio-visual data corpus database UOTletters, which includes 30 speakers, 26 English letters, and 1560 utterances. The database is divided into 50% for training and 50% for testing purposes. In a noise-free environment, the accuracy of the proposed algorithm reached 100%. Moreover, the proposed algorithm demonstrates greater noise immunity across all five noise levels, with an average accuracy difference of 13.67% compared to baseline algorithms.
References
G. Le Prell, O. H. Clavier, Effects of noise on speech recognition: Challenges for communication by service members, Hear. Res., 349 (2017) 76–89. https://doi.org/10.1016/j.heares.2016.10.004 I. Abass, M. E. Safi, Speech Recognition Based Microcontroller for Wheelchair Movement, Eng. Tech. J., 32 (2014) Kim, R. M. Stern, Power-Normalized Cepstral Coefficients (PNCC) for Robust Speech Recognition,” IEEE/ACM Trans. Audio Speech Lang. Process., 24 (2016) 1315–1329. https://doi.org/10.1109/TASLP.2016.2545928 Kim, R. M. Stern, Feature extraction for robust speech recognition using a power-law nonlinearity and power-bias subtraction, Proc. Annu. Conf. Int. Speech Commun. Assoc. Brighton, UK, September (2009) 28–31. De-La-Calle-Silos, R. M. Stern, Synchrony-Based Feature Extraction for Robust Automatic Speech Recognition, IEEE Signal Process. Lett., 24 (2017) 1158–1162. https://doi.org/10.1109/LSP.2017.2714192 Fux, D. Jouvet, Evaluation of PNCC and extended spectral subtraction methods for robust speech recognition, 23rd Eur. Signal Process. Conf. (2015) 1416–1420. https://doi.org/10.1109/EUSIPCO.2015.7362617 E. Safi, E. I. Abbas, Isolated word recognition based on PNCC with different classifiers in a noisy environment, Appl. Acoust., 195 (2022) 108848. https://doi.org/10.1016/j.apacoust.2022.108848 Khan, T. Goskula, M. Nasiruddin, R. Quazi, Comparison between k-nn and SVM method for speech emotion recognition, Int. J. Comput. Sci. Eng., 3 (2011) 607–611. Amami, D. B. Ayed, N. Ellouze, An Empirical Comparison of SVM and Some Supervised Learning Algorithms for Vowel Recognition, Int. J. Intell. Inf. Process., 3 (2012). https://doi.org/10.4156/IJIIP.vol3.issue1.6 Chaka, N. Le Thanh, R. Flamary, C. Belleudy, Performance Comparison of the KNN and SVM Classification Algorithms in the Emotion Detection System EMOTICA, Int. J. Sens. Net. Data Commun., 7 (2018) 1–9. https://doi.org/10.4172/2090-4886.1000153 Prabavathy, V. Rathikarani, P. Dhanalakshmi, Classification of Musical Instruments using SVM and KNN, Int. J. Innov. Technol. Explor. Eng., 9 (2020) 1186–1190, https://doi.org/10.35940/ijitee.G5836.059720 A. J. Gnamele, Y. B. Ouattara, T. A. Kobea, G. Baudoin, J. M. Laheurte, KNN and SVM classification for chainsaw sound identification in the forest areas, Int. J. Adv. Comput. Sci. Appl., 10 (2019) 531–536. https://doi.org/10.14569/ijacsa.2019.0101270 L Chen, S Gunduz, M. T. Ozsu, Mixed Type Audio Classification with Support Vector Machine, 2006 IEEE international conference on multimedia and expo. IEEE, (2006) 781–784. https://doi.org/10.1109/ICME.2006.262954 Ali, A. W. Abbas, T. M. Thasleema, B. Uddin, T. Raaz, S. A. R. Abid, Database development and automatic speech recognition of isolated Pashto spoken digits using MFCC and K-NN, Int. J. Speech Technol., 18 (2015) 271–275. https://doi.org/10.1007/s10772-014-9267-z E. Safi, E. I. Abbas, Microcontroller - Controlled security door based on speech recognition, Al-Sadiq Int. Conf. Multidisciplinary in IT and Comm. Sci. Appl., (2016) 1-6. https://doi.org/10.1109/AIC-MITCSA.2016.7759909 A. Imtiaz, G. Raja, Isolated word Automatic Speech Recognition (ASR) System using MFCC, DTW & KNN, Asia Pacific Conf. on Multimedia and Broadcasting (APMediaCast), Bali, Indonesia, (2016) 106-110. https://doi.org/10.1109/APMediaCast.2016.7878163 Anggraeni, W. S. M. Sanjaya, M. Munawwaroh, M. Y. S. Nurasyidiek, I. P. Santika, Control of robot arm based on speech recognition using Mel-Frequency Cepstrum Coefficients (MFCC) and K-Nearest Neighbors (KNN) method, Int. Conf. Advan. Mechatronics, Intelligent Manufacture, and Industrial Automation, Surabaya, Indonesia, (2017) 217-222. https://doi.org/10.1109/ICAMIMIA.2017.8387590 Adiwijaya, M. N. Aulia, M. S. Mubarok, W. Untari Novia, F. Nhita, A comparative study of MFCC-KNN and LPC-KNN for hijaiyyah letters Pronunciation classification system, 5th International Conference on Information and Communication Technology, Melaka, Malaysia, (2017) 1-5. https://doi.org/10.1109/ICoICT.2017.8074689 Shi, J. Bai, P. Xue, D. Shi, Fusion Feature Extraction Based on Auditory and Energy for Noise-Robust Speech Recognition, IEEE Access, 7 (2019) 81911–81922. https://doi.org/10.1109/ACCESS.2019.2918147 Korkmaz, A. Boyacı, T. Tuncer, Turkish vowel classification based on acoustical and decompositional features optimized by Genetic Algorithm, Appl. Acoust., 154 (2019) 28–35. https://doi.org/10.1016/j.apacoust.2019.04.027 A. Alasadi, T. H. Aldhayni, R. R. Deshmukh, A. H. Alahmadi, A. S. Alshebami, Efficient Feature Extraction Algorithms to Develop an Arabic Speech Recognition System, Eng. Technol. Appl. Sci. Res., 10 (2020) 5547–5553. https://doi.org/10.48084/etasr.3465 Tuncer, E. Aydemir, S. Dogan, Automated ambient recognition method based on dynamic center mirror local binary pattern : DCMLBP, Appl. Acoust., 161 (2020) 107165. https://doi.org/10.1016/j.apacoust.2019.107165 Kim, Signal Processing for Robust Speech Recognition Motivated By Auditory Processing, Diss. Johns Hopkins University, 2010. Kim, R. M. Stern, Feature extraction for robust speech recognition based on maximizing the sharpness of the power distribution and on power flooring, IEEE Int. Conf. on Acoustics, Speech and Signal Process., Dallas, TX, USA, (2010) 4574-4577. https://doi.org/10.1109/ICASSP.2010.5495570 Hermansky, N. Morgan, RASTA Processing of Speech, IEEE Trans. Speech Audio Process., 2 (1994) 578–589. https://doi.org/10.1109/89.326616 Gelbart, N. Morgan, Evaluating long-term spectral subtraction for reverberant ASR, IEEE Work. Autom. Speech Recognit. Understanding, Madonna di Campiglio, Italy, (2001) 103-106. https://doi.org/10.1109/ASRU.2001.1034598 Hermansky, S. Sharma, TempoRAl Patterns (TRAPs) in ASR of noisy speech, IEEE Int. Conf. Acoust. Speech Signal Process., Phoenix, AZ, USA, 1 (1999) 289-292 . https://doi.org/10.1109/ICASSP.1999.758119 Thomas, S. Ganapathy, H. Hermansky, Recognition of Reverberant Speech Using Frequency Domain Linear Prediction, IEEE Signal Process. Lett., 15 (2008) 681–684. https://doi.org/10.1109/LSP.2008.2002708 P. Rath, D. Povey, K. Veselý, J. H. Černocký, Improved feature processing for deep neural networks, Proc. Annu. Conf. Int. Speech Commun. Assoc., (2013) 109–113. https://doi.org/10.21437/interspeech.2013-48 Kim, R. M. Stern, Feature extraction for robust speech recognition based on maximizing the sharpness of the power distribution and on power flooring, IEEE Int. Conf. Acoust. Speech Signal Process., Dallas, TX, USA, (2010) 4574-4577. https://doi.org/10.1109/ICASSP.2010.5495570 Ranny, Voice recognition using k nearest neighbor and double distance method, Int. Conf. Ind. Eng. Manag. Sci. Appl., Jeju, Korea (South), (2016) 1-5. https://doi.org/10.1109/ICIMSA.2016.7504045 Cover, P. Hart, Nearest Neighbor Pattern Classification, IEEE Trans. Inf. Theory, 13 (1967) 21-27. https://doi.org/10.1109/TIT.1967.1053964 Bhavsar, A. Ganatra, A Comparative Study of Training Algorithms for Supervised Machine Learning, Int. J. Soft Comput. Eng., 2 (2012) 74–81. Jan, M. Abrar, S. Bashir, A. M. Mirza, Seasonal to Inter-annual Climate Prediction Using Data Mining KNN Technique, Springer-Verlag Berlin Heidelb., (2008) 40–51. E. S. Macleod, A. Luk, D. M. Titterington, A Re-Examination of the Distance-Weighted k-Nearest Neighbor Classification Rule, IEEE Trans. Syst. Man. Cybern., 17 (1987) 689–696. https://doi.org/10.1109/TSMC.1987.289362 Fan, Y. Guo, J. Zheng, W. Hong, Application of the Weighted K-Nearest Neighbor Algorithm for Short-Term Load Forecasting, energies, 12 (2019). https://doi.org/10.3390/en12050916 H. Ali, T. R. Saeed, M. H. Al-Muifraje, FPGA Implementation of Visual Speech Recognition System based on NVGRAM-WNN, Int. Conf. Comput. Sci. Software Eng., Duhok, Iraq, (2020) 132-137. https://doi.org/10.1109/CSASE48920.2020.9142095
Highlights
The proposed algorithm is based on PNCC feature extraction with a new classifier Weighted-KNN-DTW. Weighted-KNN-DTW classifier is a modification of Weighted KNN and DTW. The accuracy of the proposed algorithm was calculated with different levels of white noise (20dB, 15dB, 10dB, and 5dB).
Recommended Citation
Safi, Mohammed and Abbas, Eyad
(2023)
"Speech Recognition Algorithm in a Noisy Environment Based on Power Normalized Cepstral Coefficient and Modified Weighted-KNN,"
Engineering and Technology Journal: Vol. 41:
Iss.
8, Article 6.
DOI: https://doi.org/10.30684/etj.2023.140643.1469
DOI
10.30684/etj.2023.140643.1469
First Page
1107
Last Page
1117





