Abstract:
This work consists of deploying a system for Speaker Identification (SI). SI is a system of recognition of the speaker’s speech signal. The most important thing in SI is to have a system that is able to extract and learn discriminative and relevant features for classification. Most research on SI has shown the effectiveness of Perceptual Linear Predictive (PLP) and Mel-Frequency Cepstral Coefficients (MFCC). Nevertheless, these extraction techniques exhibit identification errors when the speech signal is complex. To overcome this problem, this study proposes two features extraction techniques. The first technique uses Mel-Frequency Energy Coefficients (MFEC), the second technique is a hybrid approach combining MFEC and Convolutional Neural Network (CNN) used as features extractors. SI was performed using the features derived from the speech signals in the Voxforge database by both classifiers, namely CNNs and eXtreme Gradient Boosting (XGBoost). The proposed hybrid model using XGBoost-CNN achieved an accuracy of 99.45% demonstrating the effectiveness of this combination for SI. Moreover, a comparative study was carried out and revealed that the proposed model provided promising results and outperformed the existing methods in the literature using the Voxforge database.