2023:Program/Submissions/Collaboratively fixing automatic citations in Wikipedia with Web2Cit - L7B7LL

From Wikimania

Title: Collaboratively fixing automatic citations in Wikipedia with Web2Cit


Diego de la Hera (User:Diegodlh)

I am a scientist and wikimedian from Argentina. I contribute to technical projects, including the development of tools such as Cita, a Wikidata addon for Zotero, and Web2Cit, a tool to collaboratively improve automatic citations in Wikipedia. I am also one of the founders and member of Wikimedistas Calamuchita, a non-recognized user group in the Calamuchita Valley in Córdoba, Argentina, and of Wikitécnica, a community of Spanish-speaking technical wikimedians. Profile picture available from Wikimedia Commons (https://commons.wikimedia.org/wiki/File:Diego_de_la_Hera_en_enero_de_2023.jpg) under the CC0 1.0 Public Domain Dedication.

Pretalx link

Etherpad link


Start time:

End time:

Type: Workshop

Track: Technology

Submission state: submitted

Duration: 60 minutes

Do not record: false

Presentation language: en

Abstract & description[edit source]

Abstract[edit source]

Automatic citations in Wikipedia sometimes fail to retrieve accurate metadata, which may particularly affect non-mainstream sources. Web2Cit is a tool that allows technical and non-technical volunteers alike to participate in improving automatic citations. In this hands-on workshop you will learn how you can engage with the Web2Cit community to collaboratively fix automatic citations for sources relevant to your communities.

Description[edit source]

Wikipedia’s automatic citation generator relies on source webpages’ correct embedding of citation metadata. When this is not the case, which often happens, individual algorithms tailored to specific websites must be programmed. However, some websites take longer to be addressed and algorithms tend to break when websites change. This results in wrong citation metadata retrieved from some of them, which particularly affects non-mainstream websites.

Web2Cit is a tool recently released to help fix automatic citations, requiring much less technical skills than were needed before. In addition, it is collaborative, meaning that community members may contribute according to their skills, enabling a wider diversity of contributors to participate.

Web2Cit documentation and training material are available online. But, as with any tool, starting to use it can be challenging. For this reason, a few demonstrations have been organized in the past. However, most of them were not effectively hands-on, and all of them occurred in the middle of the night in the ESEAP region.

In this hands-on workshop we will particularly focus on one specific and particularly annoying automatic citation error: misrecognition of the source type (e.g., “webpage” instead of “newspaper article”). Specifically, we will learn how to: (1) use the Web2Cit user script to use collaborative automatic citations in addition to regular automatic citations; (2) configure extraction tests to indicate the expected output for a given webpage; (3) configure a simple extraction procedure to match the extraction test; and (4) keep track of test results to fix procedures in case they break.

If possible, bring the Web2Cit user script installed: https://meta.wikimedia.org/wiki/Web2Cit#Install . You may also bring a URL you would like to work with; otherwise, you may choose one from a list to be provided during the session.

Further details[edit source]

Qn. How does your session relate to the event themes: Diversity, Collaboration Future?

Web2Cit is a tool that enables technical and non-technical volunteers alike to collaborate on fixing problems with automatic citations for sources relevant to their communities. On the one hand, this increases the diversity of people involved in solving these problems. In addition, fixing automatic citations may encourage a wider diversity of sources to be included as references in Wikipedia articles. On the other hand, Web2Cit has been designed with collaboration in mind, encouraging volunteers to contribute what they can according to their skills (be that a simple expected extraction output or a full extraction procedure) to collectively improve automatic citations as a community.

Qn. What is the experience level needed for the audience for your session?

Some experience will be needed

Qn. What is the most appropriate format for this session?

  • Tick Onsite in Singapore
  • Empty Remote online participation, livestreamed
  • Empty Remote from a satellite event
  • Tick Hybrid with some participants in Singapore and others dialing in remotely
  • Empty Pre-recorded and available on demand