2023:Program/GLAM, Heritage, and Culture/WVZNL9-Helping Wikisource recognize handwritten documents
Title: Helping Wikisource recognize handwritten documents
Speakers:
Satdeep Gill (WMF)
Senior Program Officer, Culture and Heritage, Wikimedia Foundation
Kinneret Gordon
My name is Kinneret and I am based in Tel Aviv, Israel. I work on the Strategic Partnerships team at the Wikimedia Foundation and I help lead the Transkribus partnership which aims to support Wikisource by providing handwriting recognition models.
Sara Mansutti
I work as Education Manager for READ-COOP, a European Cooperative Society responsible for maintaining and advancing the Transkribus platform. Our goal is to unlock our written past and make historical documents more accessible thanks to AI. Additionally, I am pursuing my PhD in Digital Humanities at University College Cork.
Room: Room 311
Start time: Wed, 16 Aug 2023 15:20:00 +0800
End time: Wed, 16 Aug 2023 15:50:00 +0800
Type: No (pretalx) session type id specified
Track: GLAM, Heritage, and Culture
Submission state: confirmed
Duration: 30 minutes
Do not record: false
Presentation language: en
Abstract & description
[edit source]Abstract
[edit source]The Wikimedia Foundation has partnered with READ-COOP to bring Transkribus, an AI-driven handwriting recognition tool, on Wikisource. This session will introduce Transkribus to the Wikimedia community and share resources around creating new handwriting recognition models to support Wikisource.
Description
[edit source]The Wikimedia Foundation has partnered with READ-COOP to bring Transkribus on Wikisource in order to support the Wikisource Loves Manuscripts project. The project aims to digitize and transcribe more than 20,000 pages of Indonesian manuscripts to foster their preservation and accessibility.
As the first phase of the project is focused in Indonesia, handwriting models for Indonesian languages are being created with the help of our partner IIIT Hyderabad. Because Google OCR and Tesseract, the two OCR engines already integrated in Wikisource, do not support Balinese and Javanese, Transkribus came into play. Transkribus is AI-powered platform for text recognition, transcription and searching of historical documents. With adequate training material, OCR models can be trained for any language or script and produce automatic transcriptions with a typical accuracy of around 95%. In this session, members of READ-COOP and WMF will share a demonstration on how Transkribus works and share resources which will help other Wikimedia communities to create new OCR models. We hope that this collaboration will inspire other Wikimedia communities to engage contributors in preserving and transcribing manuscripts as well as unlocking their written past. *Other relevant tracks: GLAM, Heritage, and Culture
Further details
[edit source]Qn. How does your session relate to the event themes: Diversity, Collaboration Future?
This partnership is helping to improve the diversity of languages and document-types on Wikimedia projects so that the future of free knowledge is rich with historical texts and documents.
Qn. What is the experience level needed for the audience for your session?
Everyone can participate in this session
Qn. What is the most appropriate format for this session?