END-TO-END SPEECH RECOGNITION FOR GURAGIGNA LANGUAGE USING DEEP LEARNING TECHNIQUES

No Thumbnail Available

Date

2025-08

Journal Title

Journal ISSN

Volume Title

Publisher

wolkite universty

Abstract

Speech recognition entails converting long sequences of acoustic features into shorter sequences of discrete symbols, such as words or phonemes. This process is complicated by varying sequence lengths and uncertainty in output symbol locations, making traditional classifiers impractical. Current automated systems struggle with speaker-independent continuous speech, particularly inlow-resource languages like Guragigna, where the Cheha dialect poses additional challenges dueto its purely spoken nature and lack of a rigid grammatical structure. To address these issues, this research develops an end-to-end speech recognition model utilizing deep learning techniques, specifically a hybrid CNN-BIGRU architecture combined with CTC and attention mechanisms. This approach aims to enhance alignment and robustness in noisy environments. To train and testthe model, a text and speech corpus was created by compiling dataset from different sources likein Wolkite FM, the Old and New Testaments. Experimental results indicate that the CNN-BIGRU model achieves a Word Error Rate (WER) of 2.5%, showcasing improved generalization capabilities. Additionally, four recurrent neural network models LSTM, Bilstm, GRU, and BIGRUwere evaluated, each configured with 1024 hidden units and optimized using the Adam optimizer over 50 epochs. The BIGRU model outperformed the others, achieving an accuracy of 97.50%,while the LSTM, Bilstm, and GRU models achieved maximum accuracies of 95.99%, 96.92%, and96.25%, respectively. The successful implementation of this end-to-end speech recognition system significantly advances communication technologies for low-resource languages, enhancing accessibility for diverse linguistic communities. The findings underscore the effectiveness of deep learning methods in improving speech recognition performance in challenging linguistic contexts.

Description

Keywords

Automatic Speech Recognition, NLP, Deep learning, LSTM, BILSTM, GRU, BIGRU, RNN, CNN

Citation

Endorsement

Review

Supplemented By

Referenced By