A Syllable-based Speech Recognition system by using Pitch detection on Time-Frequency domain Feature Extraction

Wiriyarattanakul, Sopon; Kaewfoongrungsi, Piroon; Sumonphan, Ekkalak

doi:http://dx.doi.org/10.12785/ijcds/140192

Journals About us Ethics and Policies Objectives Values Contact us

UOB Journals
→
02. International Journal of Computing and Digital Systems
→
Volume 14
→
Issue 01
→
View Item

A Syllable-based Speech Recognition system by using Pitch detection on Time-Frequency domain Feature Extraction

Wiriyarattanakul, Sopon; Kaewfoongrungsi, Piroon; Sumonphan, Ekkalak

DOI: http://dx.doi.org/10.12785/ijcds/140192

ISSN: 2210-142X

Date: 2023-10-01

Abstract:

This research presents the segmentation of single-syllable sounds for speech recognition using an artificial neural network. The network combines key features from speech signals in the time and frequency domains. The approach involves dividing speech signals into frames using the short-time energy waveform. Pitch markers are then extracted from the frames and used as reference points to split them into sections. The sections are further analyzed using window searching to identify positions, amplitudes, local minimum and maximum values, and maximum slope values, which serve as key features in the time domain. In the frequency domain, cepstrum coefficients on the Mel scale are used as additional key features. The two types of key features are combined for speech recognition using the artificial neural network. The study also compares the performance of the combined and separated key features in the time and frequency domains when fed into the neural network. The results demonstrate that using the artificial neural network with two input layers (Mel frequency cepstral coefficient and time domain features) and the same hidden layers yields the highest recognition accuracy of 96.97% and 88.43% for blind tests.

Show full item record