2019:Transcription/A general annotation service
This is an Accepted submission for the Transcription space at Wikimania 2019. |
Description
[edit | edit source]Currently there are a wide variety of "annotations" performed on our content with a plethora of different ad-hoc tools, using either inline markup or a separate linked database. Inline markup can be an extension tag, template, specialized wikilink syntax, or a combination, such as citations, Semantic MediaWiki, and Proofread Page. Annotations which have successfully been moved out of wikitext include interlanguage links and Wikidata. Annotations performed on media resources include Structured Data on Commons and the FileAnnotations extension (and related projects).
This session discusses a general annotation service, for example Amazing Article Annotations, as it could be applied to transcriptions. In particular, we would like to go through the specialized transcription tools one by one (OCR, Proofread Page, Timed Text, the Inscription template, Content Translation, etc) and determine to what degree they could be simplified and improved with a standard annotation service, built on the W3C Annotation standard. By moving the annotation out of wikitext syntax perhaps certain new features could also be made possible, which were previously held back by the desire not to bloat the original resource with transcription-specific markup. Further, a general annotation service can provide greater editing support by providing common editing tools and allowing annotations to remain anchored even when the resource is edited.
Relationship to the theme
[edit | edit source]This session will address the conference theme — Wikimedia, Free Knowledge and the Sustainable Development Goals — in the following manner:
TDG goal #4: "Ensure inclusive and equitable quality education".
Annotations can also link resources, enabling TDG goal #17, "Partnerships for the goals". For example, a transcription can link a resource created in one project (multimedia commons, for example) to a related text in a different project, or in a different language.
TDG goal #10, "Reduced inequalities" can be addressed when transcriptions/subtitles/captions are created which allow a resource created in one area to be used/read by a different area. For example, majority language resources captioned for minority languages, or content created by a minority group which can be transcribed to fill a knowledge gap in the majority language project.
Session outcomes
[edit | edit source]At the end of the session, the following will have been achieved:
- An understanding of how a general purpose annotation facility could be used to advance transcription tasks.
Session leader(s)
[edit | edit source]- C. Scott Ananian, Wikimedia Foundation
Session type
[edit | edit source]Each Space at Wikimania 2019 will have specific format requests. The program design prioritises submissions which are future-oriented and directly engage the audience. The format of this submission is a:
- Lightning talk
- Workshop to identify and try to solve problem
- Roundtable discussion forum
Requirements
[edit | edit source]The session will work best with these conditions:
- Room:
Small classroom or round-table seating.
- Audience:
10-20 people with deep knowledge of existing transcription tools.
- Recording:
Yes
SESSION SUMMARY
[edit | edit source]ENABLE INDEPENDENT CONTENT CREATION ON TOP OF AN EVER-CHANGING BASE.
bazaar collaboration, there is no ownership as anyone can edit, it's even a bazaar of bazaar
annotation is then interresting for transcription, the promess is that it can be independent to the content
case study: translate extension (1, 2, 3), right now we used markup, translation are important but the system is not liked (to say the least)
(4) interface ot make it nicer and more usable
(6) and there is *another* UI to see the changes to the original text that need update
Annotations of wiki content
Transformative information is transformation, so need to be decoupled from changes to the base content and association with a specific revision (so it's goes away with changes in the history)
"we should not stand in each other way", to avoid edit conflict
Inline markup
Conversion
citations
discussions
proof-reading
{{citation needed}} template
New use cases
C. Scott does a lot of "junk in the wikitext" gesture (almost to comment every slides ;) )
Transcription cases
- a prototype following the W3C standard is working, see phab:T146397 and in theory it could be used for:
- OCR
- Proofread page
- timed text (subtitles)
- inscription
- content translation
- etc
Issues
- name of document
- needed name of current version
- naming images
- URL for most recent version
- hypothes.is
- multi.content revisions
- Wikidata
- Anchor
- wikitext
- rendered HTML
- editable HTML
- Image anchors
Question by UJung: there is SemanticWiki things that you might want to look at.
that something I'd like to look at
by LA2 : what about illustration block on a page in the djvu/pdf?
that something I'd like to look at