2022:Submissions/Ninai-Udiron: Using Wikidata Items and Lexemes for Abstract Wikipedia-Like Text Generation
- Language: English
- Status: Pre-recorded
Mahir Morshed (User:Mahir256)
This tutorial goes through the steps needed to set up support for a new language using Ninai/Udiron, a text generation system developed with the goal of being used in the Abstract Wikipedia project. These steps include decisions about which Wikidata lexemes to create, what meanings to add as senses, what links to other Wikidata items and lexemes to make, and how different meaning elements should best get turned into words, parts of words, or phrases—all culminating in the eventual generation of coherent sentences. The language for which examples of these decisions will be gone through may help highlight how different decisions at each step can be handled smoothly within the system.
Learning Outcomes[edit source]
Listeners will gain some better insights into how lexemes they create can be made more usable when Abstract Wikipedia actually launches. They will also see ways they might plan to deal with special linguistic issues with their own languages and may consider looking for Wikidata items for concepts specific to their languages if those concepts do not already have Wikidata items.
Mahir Morshed is a doctoral student at the University of Illinois at Urbana-Champaign. He has been an administrator of Wikidata and the Bengali Wikisource since 2018. Since the announcement of Abstract Wikipedia he has taken an interest in powering natural language generation with Wikidata.