Jump to content

2025:Program/“Wait… women from that half of the world are really only 0.83% of Wikidata humans? I can do something about that!”

From Wikimania
View on Commons

Session title: “Wait… women from that half of the world are really only 0.83% of Wikidata humans? I can do something about that!”

Session type: Lecture
Track: Diversity & Inclusion
Language: en

🎥 Session recording: https://w.wiki/FQrq 🎥

How can we best address global unevenness in Wikidata representation? First, let’s measure it. Let’s divide the world twice, by gender and by geography, to yield four approximately equal population quarters. Now, here’s the bad news: in December 2024, women from under-represented countries - 25% of world population - were only 0.83% of the people on Wikidata with assigned gender and country of citizenship. But here’s the good news: the relatively small numbers involved make it feasible to change that situation!

Description

Gender bias and geographical bias intersect on Wikimedia projects in complicated ways. This talk proposes a simple metric to summarize the Wikidata situation, reports substantial progress in improving this metric, and sketches its wider impact. I share opportunities and challenges encountered along the way, before calling on the community to join in and help. 1. Proposing a metric. It’s relatively easy to give a quantitative summary of the gender gap on Wikimedia projects. It’s much harder to find intuitively interpretable metrics to summarize its uneven geographical distribution. I argue we need such metrics to understand our systemic bias. I propose a measure constructed as follows: (1) for each country in the world, find the number of Wikidata items with that country as country of citizenship, and divide by the country’s population. (2) Ranking by this ratio, identify the under-represented countries collectively amounting to half the world population. In December 2024, they were the following: Sudan, Niger, Ethiopia, South Sudan, Pakistan, Yemen, Chad, Bangladesh, China, Laos, Madagascar, Democratic Republic of the Congo, Somalia, Cambodia, India, Malawi, Tanzania, Burundi, Myanmar, Central African Republic, Turkmenistan, Vietnam, Burkina Faso, Afghanistan, Mauritania, Mozambique, Libya, Tajikistan, Angola, and Sierra Leone. (3) With the world divided geographically into two halves of equal population, now divide it again by gender to approximately quarter the world’s population. Women, and other non-cis-male genders, from the under-represented half of the world amount to ~25% of global population. But in December 2024 they were only 0.83% of people in Wikidata with gender and country of citizenship. (Men from these countries were 2.95%; women from the world's better-represented half were 23.56%; and men from the world's better-represented half were 72.69% of Wikidata humans.) Using data from humaniki, I show the long-run stability of these percentages. 3. Improving the situation. Between mid-December 2024 and late March 2025, dedicated effort increased the number of women from under-represented countries from 34,500 to around 42,500. This work increased the metric from 0.83% to 0.98%. 4. What’s the wider impact? First, this work increased other Wikidata metrics. For example, the number of men from under-represented countries increased from 123,000 to 134,500, and their percentage from 2.95% to 3.12%. Second, moving this Wikidata needle helps other Wikimedia projects. Adding missing Wikidata information about items in other Wikiprojects helps organize those projects. Adding new items also expands ‘red lists’ for project editors to work on. Finally, Wikimedia indirectly affects all kinds of downstream digital technologies, including search and large language models. Wikimedia can’t eliminate bias in the new digital information order, but can avoid exacerbating it. 5. Where next? I share some intersectional data lurking beneath these totals, some tricks and some lessons. I end by identifying future challenges and opportunities, and call on others in the community to join in and help.

How does your session relate to the event theme, Wikimania@20 – Inclusivity. Impact. Sustainability?

Inclusivity: I directly address an inclusivity gap in Wikidata’s global representation of people. To help measure this aspect of systemic bias, I present a simple metric: the proportion of Wikidata people who are (i) women and (ii) from countries in the under-represented half of world population. I show practical techniques to raise this metric.

Impact: I report progress raising this metric, in a few months, from an abysmal 0.83% to over 1%. I show positive spillover on other Wikimedia-internal measures. I also argue that moving the Wikidata needle will ultimately mitigate bias in digital technologies downstream of Wikimedia, such as search and LLMs.

Sustainability: Two different senses of sustainability are relevant here. First, most generally and existentially, I firmly believe it is an environmental priority to ensure that technology does not simply exacerbate global inequality. The work here is a tiny part of that. Second, as a practical matter of sustaining the progress started here, I make a community appeal and some concrete recommendations informed by the work so far.

What is the experience level needed for the audience for your session?

Everyone can participate in this session

Resources

Speakers

  • David Palfrey (User:Dsp13)
I started editing Wikipedia in 2006. My experience has been mostly en.wikipedia (100K edits, 2K new en.wp pages) and wikidata (425K edits, 17K new items). On en.wikipedia I've particularly benefitted from support from participants in the fantastic Women In Red wikiproject. Though I’ve helped organize local meetups in Cambridge UK, but never before been to Wikimania!
A former history lecturer, I'm currently working as Chief Scientist for mindmage.ai, an online RPG technology platform. I've been involved in some research into Wikipedia in the past - I was research assistant on a project at the Oxford Internet Institute which examined wikipedia representation of MENA countries, and have coauthored on knowledge graph bias. I regard global inequality, and its relation with technology, as a crucial matter of human and environmental concern.