Abstract:
Though the machine learning techniques were being used in Assamese Language Automatic Speech Recognition (ALASR) system over the last five years, but the applications of Convolutional Neural Network (CNN) are very limited in ALASR. The present study introduces a Convolutional Neural Network (CNN) enabled ALASR system for the Assamese language by collecting 35 isolated words in five different prime emotions as Normal, Angry, Happy, Sad, and Fear from five native male and five native female speakers. During the experiment, the Mel Frequency Cepstral Coefficient (MFCCs), Spectral Centroid (SC), zero-crossing rate (ZCR), Chroma Frequencies (CF), spectral roll-off (SRO), and intensity are extracted and analyzed using CNN with convolution layers and max-pooling layers. To examine the consequences, other model such as Feed Forward Artificial Neural Network (FFANN) is likewise applied in ALASR. The evaluating results of CNN with an accuracy of 98.4 % outperformed the ANN accuracy of 86.4 %.