DEVELOPING PART OF SPEECH TAGGINGMODELFORDAWUROOTSUWA LANGUAGE USING DEEP LEARNINGAPPROACH
No Thumbnail Available
Date
2025-08
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
wolkite universty
Abstract
Part-Of-Speech tagger is a program that reads text in given language and assigns parts-of-speech such as noun, verb, adjective, etc. to each word and other token within the text. Previous study done on Part of speech tagging for other language as we reviewed there are several gaps such as high variance in training & testing datasets, when segmenting preposition and conjunction the morphology indicating gender, person, number and other information are missed. By incorporating this issue developing POS tagger forDawurootsuwa language using RNN based deep learning approach such as RNN, LSTM, GRU, Bi-LSTM and Bi-GRU with pre-trained FastText embedding done. To achieve this, we collected a dataset of 1251 sentences comprising 19,897 tokens and 7067 unique words, establishing a robust foundation for the NLP corpus. Our methodology includes comprehensive data collection, corpus development, data preprocessing, and application of deep learning techniques to train and evaluate models. Through experimentation, various deep learning models were evaluated for their ef ectiveness in POST for Dawurotsuwa, including Sequential and non-sequential Recurrent Neural Networks with pre-trained Fast Text embedding’s, as well as models like LSTM, GRU, Bi-LSTM, and Bi-GRU withFastText embedding’s. The Sequential Bi-GRU model delivered the best performance, achieving 97.43% accuracy on training data, 97.57% on validation data, and 97.79%on testing data. The model’s loss values were 0.09 for training, 0.08 for validation, and0.07fortesting, with weighted Average precision, recall, and F1-score reaching 94%, 93%, and93%, underscoring the model’s robustness. These findings indicate that deep learning models, especially Bi-GRU with Fast Text embedding’s, significantly improve POST performance forunder-resourced languages like Dawurotsuwa language.