2019:Quality/Check your data and find the gaps with OpenRefine
This is a Closed submission for Wikimania 2019. It has been reviewed and was not accepted.
Description[edit | edit source]
Data quality depends on domain expertise (understanding data and coherences) and data expertise (e.g. technology, data management). Mostly domain experts aren't data experts.
OpenRefine could be a knowledge bridge between data and domain expertise: It is a powerful data tool for domain experts.
We can check and improve the quality of large datasets easily, add and compare data from other sources and find and close data gaps in wikidata - without coding knowledge, scripting or SPARQL but with a lot of techniques like coding experts (and with a graphical user interface).
Relationship to the theme[edit | edit source]
This session will address the conference theme — Wikimedia, Free Knowledge and the Sustainable Development Goals — in the following manner:
This session shall encourage domain experts to contribute and check the quality of data in their particular fields. It relates to
- target 4 (quality education): improving the quality of our data, improving the diversity of contributers
- target 5 (gender equality): encouraging non-techies to contribute their data too (bridging the gender-gap in computer science)
- target 16 (peace, justice and strong institutions): encouraging non-techies to work with open data
- target 17 (partnerships for the goals): encourage to link and to use open data
Session outcomes[edit | edit source]
- how to load a dataset in OpenRefine (different ways)
- how to find error patterns and correct them
- how to modify data to link to other data sources (wikidata and others)
- how to reconcile data against wikidata
- how to get values from wikidata
- how to compare data
- how to enrich data
They got suggestions how to use OpenRefine, how to find tutorials and examples and they were invited to contribute.
Session leader(s)[edit | edit source]
 Andrea Knabe-Schönemann
Session type[edit | edit source]
Each Space at Wikimania 2019 will have specific format requests. The program design prioritises submissions which are future-oriented and directly engage the audience. The format of this submission is a:
- Tool or project demonstration: short talks focused on tools
Requirements[edit | edit source]
The session will work best with these conditions:
- Room: a small classroom, a projector + screen
- Audience: depends on the room; no knowledge/skills needed, participants should be interested in data
- Recording: is possible (talk/lecture; there will be slides in Wikimedia Commons)