African languages have received little attention from computer scientists, so few NLP capabilities have been available to large swaths of the continent. But a novel language model, developed by researchers at the University of Waterloo in Canada fills that gap by enabling computers to analyse text in African languages for many useful tasks.
The new neural-network model, which the researchers have dubbed AfriBERTa, uses deep-learning techniques to achieve "state-of-the-art” results for low-resource languages, according to the team.
It works specifically with 11 African languages, including Amharic, Hausa and Swahili, which are spoken collectively by over 400 million people, and achieves output quality comparable to the best existing models despite learning from just one gigabyte of text, while other models require thousands of times more data, the researchers have said.
“Pre-trained language models have transformed the way computers process and analyse...