2019:Languages/The difficulties of Wikipedias in languages that are not taught in school/content

From Wikimania
Jump to navigation Jump to search


Presentation by Gereon K. and Celestinesucess.

This year, 2019, is the International Year of Indigenous Languages.

The commemoration year was proclaimed by the United Nations. One of the aims for this is the following:

“Supporting the revitalisation and maintenance of indigenous languages through: creation of more materials and content and a wider range of services, using language, information and communications technologies.

Now this is part of what we do with languages in Wikipedia (and especially Wiktionary), for every language.

In many countries the language taught in school is the same as the language spoken at home and in the neighborhood.

And some languages have simple letters and are written just the way they are spoken.

To edit Wikipedia you need to know the language and you need to know how to write it correctly. This becomes a major problem when nobody learns the language in everyday school and when the language is a diacritical one.

A diacritic (also called diacritical mark, diacritical point, diacritical sign or accent) is a glyph added to a letter. It mainly changes the sound of a letter. And when the sound of a word is changed, the meaning is mostly changed as well.

Diacritical languages highly rely on diacritical marks to transport meaning. We have some Wikipedias in languages that have millions of speakers, but very few of them can write the language correctly.

But why do we need correct spelling on Wikipedia:

  • The content has to be understandable and correct.
  • In countries were Wikipedia is not well-established, Wikipedia has to build up a good reputation to gain wider acceptance.

We would like to tell you about two Wikipedia language versions that face this challenge: tw.wikipedia, which is the Wikipedia in the Twi language and yo.wikipedia, which is the Wikipedia in the Yoruba (local spelling: Yorùbá) language.

YORUBA[edit | edit source]

is a language spoken in Nigeria and Benin with communities in other countries of West Africa and belongs to the Niger-Congo language family. It has 56 million native speakers.

It is written in a Latin alphabet, but the letters c, q, v, x and z are not used.

Pronunciation and meaning relies heavily on digraphs (which are a pair of characters used to write a single sound like “gb”) and diacritics that look like this:

Á À Ā É È Ē Ẹ / E̩ Ẹ́ / É̩ Ẹ̀ / È̩ Ẹ̄ / Ē̩ Í Ì Ī Ó Ò Ō Ọ / O̩ Ọ́/ Ó̩ Ọ̀ / Ò̩ Ọ̄ / Ō̩ Ú Ù Ū Ṣ / S̩
á à ā é è ē ẹ / e̩ ẹ́ / é̩ ẹ̀ / è̩ ẹ̄ / ē̩ í ì ī ó ò ō ọ / o̩ ọ́ / ó̩ ọ̀ / ò̩ ọ̄ / ō̩ ú ù ū ṣ / s̩

Yoruba speakers speak Yoruba at home and in their everyday live. Yoruba is taught in schools as a second language only, other subjects are taught not in Yoruba. Consequently, most speakers do not have a proper knowledge of how to correctly write in the Yoruba language. Furthermore they tend to replace common Yoruba words, like for example the names of months, with their English equivalent when writing.

Thousands of articles on Yoruba Wikipedia were created by a bot written by the User Demmy, who has been the most active contributor on yo.wikipedia from 2005 to 2016. In 2012 Jimmy Wales named Demmy Wikipedian of the Year for his contributions. The bot articles did not really help promoting yo.wikipedia, neither for meaningful subjects in the Yoruba context nor in correct spelling. Currently (August 2019) there are only 2 administrators and less than 40 active editors to take care of the 31,900 articles.

There have been efforts by the local community to boost contribution to Yoruba Wikipedia. At least 2 edit-a-thons that focus on Yoruba Wikipedia have been held in universities. Wikimedia hubs and fan clubs were opened at universities. There have been radio shows about Wikipedia in Nigeria. But maintaining the quality of language at Yoruba Wikipedia remains very difficult.

To ensure that the quality of Yoruba Wikipedia gets improved the following should be taken into consideration:

  • There is a general lack of awareness of yo.wikipedia. This applies to many African language Wikipedias: many native speakers are not aware that the Yoruba language version of Wikipedia exists. This includes students, teachers, scholars and authors. They are only aware of the English language Wikipedia. To maintain and improve the quality there is the need to launch a serious awareness campaign to get native speakers onboard. No Wikipedia will thrive without involvement of native speakers of that language.
  • There is a serious lack of good understanding of the language. Until recently, many articles on Yoruba Wikipedia did not have diacritical marks assigned to every word. Diacritical marks give words with identical letters different meaning. To give you an example:
Ìgbà (2 graves) = time
igbá (1 acute) = calabash (a fruit)
igba = 200
It takes people who understand the language to assign diacritics. In fact not all native speakers can assign diacritics to Yoruba words, but articles written or translated into Yoruba without diacritics would mostly make no sense to readers.
  • There is poor funding and support. There has been little or no support from government and nongovernmental organizations to promote local languages including Yoruba. So lobbying has to be done.
  • The cost of internet services is too high in many African countries. So people need incentives.
  • Some Yoruba native speakers believe that the Yoruba language is inferior, because of the dominating power of foreign languages.

There are many other factors and nobody will overcome all these obstacles. It would help if some of these inhibitors of growth could be made a little more easy to overcome. Local user groups and chapters should help in that better than lone Wikipedians.

TWI[edit | edit source]

Twi is a dialect of the Akan language, which belongs to the Niger-Congo language family. It is a first and second language of 29% of the population of Ghana and has 9 million native speakers.

Akan is not a diacritical language. But is has two non-standard letters in its latin alphabet:

ɛ and ɔ

On the other hand, the letters C, J, V and Z are not used, only in words that are imported from other languages.

Most Twi speakers mainly learn English in school. They speak Twi at home and in their neighborhood. In everyday life they do not need to write Twi since almost everything they encounter in written form is in English.

There have been many edit-a-thons and we just saw the start of a series of translate-a-thons for the Twi Wikipedia. To make sure that the outcome of the edit-a-thons is spelled correctly, the following measures had to be taken:

  • Having an expert on site
  • The use of online and offline dictionaries to help with the spelling
  • Spell checking of every single article created
  • Stressing that only Twi words should be used, because in everyday language these are often replaced by English words