2019:GLAM/From a messy spreadsheet to Wikidata: start mass data uploads with OpenRefine
This is an Accepted submission for the GLAM space at Wikimania 2019. |
Important to do before the workshop
Please install the latest release of OpenRefine on your computer and make sure you know how to run it.
Title
[edit | edit source]From a messy spreadsheet to Wikidata: start mass data uploads with OpenRefine
Description
[edit | edit source]The awareness of Wikidata outside the Wikimedia community is growing, and more and more GLAMs are interested in getting their data there. What it means for us is that we need skills to work with very diverse datasets: clean them up, match them with Wikidata and upload the data. This is exactly what we are doing in the project FindingGLAMs, where the data comes from different sources and in different formats. There are tools available that make it possible without having to program or run a bot.
In this workshop, we will get to know OpenRefine and QuickStatements as tools for working with (for example) GLAM data. Using a real-world example, we will go trough all the steps involved in a data upload:
- Getting to know the data: assessing its potential and preparing for problems.
- Loading and handling the data in OpenRefine.
- Data clean-up using transformations.
- Reconciliation: how to align what we have with what's already on Wikidata.
- Wikidata schema: shaping all sorts of data into familiar forms.
- Uploading: working with QuickStatements.
OpenRefine is a powerful and flexible tool, and the workflow can be adapted to many different types of data, making it a great addition to a Wikimedian's skill set.
See the presentation. The session suffered technical problems, so it seems the recording was not saved.
Relationship to the theme
[edit | edit source]This session will address the conference theme — Wikimedia, Free Knowledge and the Sustainable Development Goals — in the following manner:
This work will help new and innovative partnerships with GLAM institutions around the world to form where freely licensed data will be shared so that it can be reused in numerous products (SDG 9, 17). These partnerships can help reduce inequality and access to knowledge (SDG 4, 5, 10).
Session outcomes
[edit | edit source]At the end of the session, the following will have been achieved:
The participants will have become familiar with some of the most useful and popular tools for working with large Wikidata uploads. They will be able to clean up messy data and match it to existing Wikidata items. They will have acquired skills that are sought by data owners interested in donating their data to Wikidata, which will empower them in their work with partners.
Session leader(s)
[edit | edit source]- Full name 1: Alicia Fagerving
- Full name 2
Usernames
[edit | edit source]- Wikimedia username 1: Alicia_Fagerving_(WMSE)
- Wikimedia username 2
Affiliation/country
[edit | edit source]- Affiliation 1: Wikimedia Sverige
- Affiliation 2
E-mail contact
[edit | edit source]- Email 1: alicia.fagervingwikimedia.se
- Email 2
Session type
[edit | edit source]Each Space at Wikimania 2019 will have specific format requests. The program design prioritises submissions which are future-oriented and directly engage the audience. The format of this submission is a:
- Computer-based training
Length of session
[edit | edit source]If other than 20 minutes, specify how long
90 minutes
Supporting work
[edit | edit source]Optional:
- Link 1
- Link 2
Requirements
[edit | edit source]The session will work best with these conditions:
- Room:
A classroom with a projector where people can work comfortably on their computers
- Audience:
Max 30 people.
Requirements: Some experience with editing Wikidata and an interest in larger data uploads. No experience of OpenRefine / Quickstatements required, but please do install OpenRefine beforehand.
- Recording:
The presentation part will be appropriate for filming and sharing, but the participants will also be asking questions and sharing knowledge with each other, so that will be harder to record.