2019:Languages/Wikispeech - making Wikipedia accessible through speech technology

‎

Languages/Wikispeech - making Wikipedia accessible through speech technology

Menchú · Sunday 12:00 – 12:30

SDGs

Description

During the presentation we will introduce the importance of providing Wikimedia's platforms and the material on them not only in writing, but to ensure that both platforms and material are accessible to all the people that for different reasons can not read. We will discuss and problematize the lack of support for smaller languages to have functioning accessibility tools, due to lack of language data; and the unique possibilities the Wikimedia movement has to support a change by providing robust technical infrastructure and to work to engage people across the world to share language data through crowdsourcing. We believe engagement is possible as people want to contribute to and support Wikipedia in different ways and this provides an easy way to do so.

The Wikispeech extension will make Wikipedia accessible for anyone that faces reading difficulties, whether that is due to vision impairment, dyslexia, they never had an opportunity to learn literacy, or any other number of reasons. Hundreds of millions of people fit those profiles.

In the first Wikispeech project, which went from 2016 to 2018, we created a text-to-speech solution for MediaWiki in the form of an extension with accompanying service. A this time, the solution has support for English, Arabic and Swedish. With this extension, a reader can have the content text of an article read out loud, navigate the text and to configure the experience e.g. by setting the playback speed. The extension is designed to make the server do as much of the heavy lifting as possible to allow as many user devices as possible; if you have a device that can run a modern web browser you should be able to use Wikispeech.

As the project picks up again in the fall of 2019, we will work to finalize the Wikispeech reader, making it ready to run on Wikipedia. Over time we will add more languages to it. We will also start a new, but related, effort in which we will develop tools for collecting speech data through crowdsourcing. With these tools volunteers will be able to easily contribute with their voice or knowledge to create open speech resources. These can be used to develop speech technology solutions, such as text-to-speech and speech recognition; new languages and voices for the Wikispeech reader being among them. This freely licensed (CC0) resource will be promoted also to other open source initiatives, to researchers and other actors with the intention to increase the speed for new languages to be made available in accessibility tools. Targeted efforts can be made to collect data regarding e.g. health to allow the service to work better in an given thematic area. The material will also be connected to e.g. Wikidata, Wiktionary and Wikipedia in different ways. Finally, an interesting use of such tools could also be for collection of oral citations. This is still to be discussed and explored with the community.

Relationship to the theme

This session will address the conference theme — Wikimedia, Free Knowledge and the Sustainable Development Goals — in the following manner:

Wikispeech will make Wikipedia and its sister projects more accessible to people that are having trouble reading (SDG10) allowing more people to learn from our content (SDG4) and over time to provide their knowledge through the tools, either by speech recognition or with oral citations through innovative solutions (SDG9). This is especially true for languages where there are no viable alternatives in terms of commercial products. If target work to collect speech data is done regarding e.g. health terminology this allows for such information to reach more people than before (SDG3, SDG5)

Session outcomes

At the end of the session, the following will have been achieved:

At the end of the session, the following will have been achieved: the participants have gained information about what the Wikispeech project is, how it will make Wikimedia projects more accessible and how they can contribute to collecting speech data in the future.

Session leader(s)

Sebastian Berlin (WMSE)

André Costa (WMSE)

Session type

Each Space at Wikimania 2019 will have specific format requests. The program design prioritises submissions which are future-oriented and directly engage the audience. The format of this submission is a:

Lecture

Requirements

The session will work best with these conditions:

Room: equipment for showing presentation and demo, including speakers.

Audience: no prior knowledge required.

Recording: suitable for recording.

Interest

If you would like to attend this session, please express your interest by signing ~~~~ below

… (you?)

Notes

Notes from the session can be found here.