2023:Program/Open Data/L9ERZY-Reusing Open Data from various sources to assess gender inequality: from street and school eponyms, to researchers and research.

From Wikimania

Title: Reusing Open Data from various sources to assess gender inequality: from street and school eponyms, to researchers and research.

Speakers:

quelet

Member of Amical Wikimedia chapter. Working mainly around Wikidata. Scientist and Science Communicator. Interests range from Quantum Chemistry to Magic and Science or Python as central coding tool (as percieved from a scientific FORTRAN coder).

Pretalx link

Etherpad link

Room: Room 311

Start time: Sat, 19 Aug 2023 14:15:00 +0800

End time: Sat, 19 Aug 2023 14:25:00 +0800

Type: Lightning talk

Track: Open Data

Submission state: confirmed

Duration: 10 minutes

Do not record: false

Presentation language: en


Abstract & description[edit source]

Abstract[edit source]

Using Wikidata and various sources of Open Data is not all straightforward. We have used that procedure to analyze human eponyms in street and school names. Furthermore, we have built up a reasonably sound way to have clean, quality data for researchers. All in all, we will try to pinpoint drawbacks and paths that lead to gender bias analysis and proposals to reduce it.

Description[edit source]

We have assessed the gender bias in school and street eponyms: female eponyms account just for a 15% of total human eponyms. Furthermore, those female eponyms have less items in Wikidata, and far less Wikipedia pages.

Building successful Editathons and Datathons requires reliable sources and sound practices that lead to very good data in Wikidata, so combining public open data, geographic open data (i.e, OpenStreetMap) and other sources requires some discipline and knowledge of the proper tools. For researchers, this is even more complex due to redundant imports e.g. from ORCID database and editors not expert enough in the intricacies of Wikidata.

We have assessed the gender bias in school and street eponyms: female eponyms account just for 15% of total human eponyms. Furthermore, those female eponyms have less items in Wikidata, and far less Wikipedia pages.

Building successful Editathons and Datathons requires reliable sources and sound practices that lead to very good data in Wikidata, so combining public open data, geographic open data (i.e, OpenStreetMap) and other sources requires some discipline and knowledge of the proper tools. For researchers, this is even more complex due to redundant imports e.g. from ORCID database and editors not expert enough in the intricacies of Wikidata.

This communication will focus on the difficulties we have found in building up quality data for our gender bias studies and the opportunities we have discovered in the realm of (un)linked open data. Hopefully other Wikimedians may profit from our experience, e.g., in designing a procedure to have a list of researchers starting from various open data sources or building a meaningful, valuable set of schools. We will address also the long-tail issue on local, noteworthy people in school and street eponyms.

This proposal is related to our #WikiSciW project, which tackles the increase of Women Scientists visibility: they are less visible than women outside scientific fields, besides being far less visible than male scientists.

Further details[edit source]

Qn. How does your session relate to the event themes: Diversity, Collaboration Future?

Indeed our focus has been regional (i.e., Catalonia), but those questions are global and need collboration between editors. Moreover, quality of data has become suddenly of utmost importance due to the fast progression of AI tools. And of course, assessing gender biases continues to be very relevant to plan corrections thata lead to a more equal presence of genders in the public (name)space.

Qn. What is the experience level needed for the audience for your session?

Average knowledge about Wikimedia projects or activities

Qn. What is the most appropriate format for this session?

  • Empty Onsite in Singapore
  • Tick Remote online participation, livestreamed
  • Empty Remote from a satellite event
  • Empty Hybrid with some participants in Singapore and others dialing in remotely
  • Empty Pre-recorded and available on demand