2019:Technology outreach & innovation/Wikidata & ETL
This is an Accepted submission for the Technology space at Wikimania 2019. |
Description
[edit | edit source]Currently, Wikidata, or any other Wikibase instance, is being populated from external data sources mostly manually, by creating ad-hoc data transformation scripts. Usually, these scripts are run once, and that is it. Given the heterogeneity of the source data and languages used to transform them, this means the scripts are hard or impossible to maintain and unable to run periodically in an automated fashion to keep Wikidata up-to-date.
In this session, we would like to demonstrate our work-in-progress in our project utilizing LinkedPipes ETL - a tool for data transformation pipelines - to load data to Wikibases and Wikidata.
Slides
[edit | edit source]Relationship to the theme
[edit | edit source]This session will address the conference theme — Wikimedia, Free Knowledge and the Sustainable Development Goals — in the following manner:
- Industry, innovation, and infrastructure: Enabling volunteers to better automate bulk loading of data into Wikidata using sharable data transformation pipelines helps to make these processes more unified and therefore sustainable.
Session outcomes
[edit | edit source]At the end of the session, the following will have been achieved:
- Our approach to bulk loading data to Wikidata using LinkedPipes ETL will be presented
- Feedback on the method will be gathered
- Interested attendees will have tried loading some data to a demo Wikibase instance
Session leader(s)
[edit | edit source]Contacts
[edit | edit source]- jakub@jakubklimek.com
Session type
[edit | edit source]Each Space at Wikimania 2019 will have specific format requests. The program design prioritises submissions which are future-oriented and directly engage the audience. The format of this submission is a:
One of these options:
- Option 1: Presentation - 20 minutes
- Option 2: Roundtable workshop - 10 minutes of presenting, 30 to 60 minutes of hands-on for interested attendees
Requirements
[edit | edit source]The session will work best with these conditions:
- Room
- Classroom with a projector + screen
- Audience:
- Technically savvy (RDF, SPARQL) contributors to Wikidata or other Wikibases
- Recording:
- The presentation part can be recorded, the hands-on probably not