2023:Program/Submissions/How does Wikidata model our gender? Findings from the Wikidata Gender Diversity project - TRKGAQ

From Wikimania


Title: How does Wikidata model our gender? Findings from the Wikidata Gender Diversity project

Speakers:

Daniele Metilli

Chiara Paolini

Chiara Paolini (she/her) is a PhD student in Linguistics at KU Leuven (Belgium). She is mainly interested in usage-based theories of language, variationist sociolinguistics, and corpus-based approaches. Together with her core study on English and Italian grammatical alternations, she also developed a keen interest in queer linguistics, researching on the non-binary gender representation in English and Italian.

Pretalx link

Etherpad link

Room:

Start time:

End time:

Type: Lecture

Track: Equity, Inclusion, and Community Health

Submission state: submitted

Duration: 30 minutes

Do not record: false

Presentation language: en


Abstract & description[edit source]

Abstract[edit source]

This lecture will present results from the Wikidata Gender Diversity (WiGeDi) project, funded through the Wikimedia Research Fund program. The project is studying gender diversity in Wikidata, focusing in particular on marginalized gender identities such as trans and non-binary people.

Description[edit source]

The Wikidata Gender Diversity (WiGeDi) project, funded through the Wikimedia Research Fund program, is studying gender diversity in Wikidata, focusing in particular on marginalized gender identities such as trans and non-binary people. The project is studying gender diversity in Wikidata from three complementary perspectives: *model*, *data*, and *community*. First of all, we are investigating how the current Wikidata ontology model represents gender, attempting to understand the extent to which this representation is fair and inclusive, and its evolution over time. We are also analyzing the data stored in the knowledge base in a quantitative way, performing statistical analyses on biographical data about people with diverse and marginalized gender identities to gather insights and identify possible gaps. Finally, we are looking at how the Wikidata community has evolved from a very narrow interpretation of gender as a binary, towards the inclusion of a wider spectrum of gender identities. Gender representation is often intrinsically connected to language, and this is especially relevant in a multilingual project such as Wikidata. Therefore, we are analyzing user discussions about the topic of gender identities through computational linguistics methods such as critical discourse analysis and topic modeling. By combining all three perspectives, we obtain a comprehensive overview of gender diversity in Wikidata and its history over time, and we make it possible to explore our findings through the web-based Wikidata Gender Dashboard and Wikidata Gender Timeline. The results of the project have an impact beyond Wikidata, opening up further studies on Wikimedia sister projects and third-party projects that reuse data from Wikidata.

Further details[edit source]

Qn. How does your session relate to the event themes: Diversity, Collaboration Future?

Our project is centred on gender diversity, and specifically on marginalized gender identities that have been overlooked in previous research about the gender gap. We are gathering insights on the current Wikidata model of gender, trying to understand how it can be improved to be more fair and inclusive. We are also attempting to understand how the Wikidata community has collaborated to develop a shared understanding of gender. We believe that our research can provide insights that will be useful both for Wikidata itself and for other Wikimedia projects.

Qn. What is the experience level needed for the audience for your session?

Everyone can participate in this session

Qn. What is the most appropriate format for this session?

  • Empty Onsite in Singapore
  • Tick Remote online participation, livestreamed
  • Empty Remote from a satellite event
  • Empty Hybrid with some participants in Singapore and others dialing in remotely
  • Empty Pre-recorded and available on demand