English↔Twi Parallel-Aligned Bible corpus for
Encoder-Decoder based machine translation
Accepted 18th September,
2020
Michael Adjeisah1, Guohua Liua1,
Richard Nuetey Nortey2 and Jinling
Song3 1School of Computer Science and
Technology, Donghua University, Shanghai, China. 2School of Information Science and
Technology, Donghua University, Shanghai, China. 3School of Mathematics and
Information Technology, Hebei Normal University
of Science & Technology, Qinhuangdao, Hebei,
China
This research presents the first work on the
Deep natural language processing DNLP approach
for translation purposes from English-Twi (a
popular Ghanaian language) language. The study
reports machine translation (MT) experiments on
English-Twi parallel Bible corpus with the
state-of-the-art neural Transformer, a
self-attention encoder-decoder. The first
attempt at engaging MT on English-Twi corpora
was with about 124K parallel-aligned sentences.
We investigated the dynamics of layers and the
number of models to fit this language pair's
demands through the experiment. Furthermore, we
leveraged the maximum posterior-based decoding
for better performance and performed the n-best
beam search on the test data. However, with no
baseline translation system, we engaged the
Moses toolkit, a Statistical MT (SMT), and the
well-known sequence to sequence recurrent neural
network (RNN) to serve as a baseline for
comparison against the neural Transformer. The
results were recorded both in BLEU and TER. This
study's findings presented the possibility of
conveniently adducing one of the most important
directions for MT's future in these language
pairs.
This is an open access article
published under the terms of the
Creative Commons Attribution
License, which permits unrestricted use, distribution, and
reproduction in any medium, provided the original work is
properly cited.
Cite this article as:
Adjeisah M, Liua G, Nortey RN, Song J (2020). English↔Twi Parallel-Aligned Bible corpus
for Encoder-Decoder based machine translation. Acad. J. Sci. Res.
8(12): 371-382.