2023:Program/Wild Ideas/NTSVND-New'pedias: Rapid, iterative, automagical construction of large reference works

From Wikimania

Title: New'pedias: Rapid, iterative, automagical construction of large reference works

Speakers:

SJ Klein

I have been editing Wikipedia since 2003, and worked on a range of projects to generate snapshots suitable for school servers and offline reading. My team at OLPC shipped semi-autogenerated Wikipedia snapshots in many languages to roughly 2 million children. Profile image generated with midjourney 3 + gimp.

Pretalx link

Etherpad link

Room: Room 311

Start time: Sat, 19 Aug 2023 15:30:00 +0800

End time: Sat, 19 Aug 2023 15:50:00 +0800

Type: Lecture

Track: Wild Ideas

Submission state: confirmed

Duration: 20 minutes

Do not record: false

Presentation language: en


Abstract & description[edit source]

Abstract[edit source]

Modern tools make it possible to automatically construct dictionaries, encyclopedias, and other reference works, with limited supervision. Large language models are rapidly improving the quality and consistency of that work. We will discuss the implications for future reference works that anyone can edit, with examples and an interactive chat with the audience.

Description[edit source]

From the vast expanse of recorded knowledge, modern tools have made it possible to autogenerate simple dictionaries, encyclopedias, and other reference works with limited supervision.  Early efforts to generate directories of sites, companies, and topics led to projects relying on indexes of the entire web such as Common Crawl. In the last decade, a wide range of general & specialized knowledge bases have been generated automatically from large source corpora. (There is even a conference series dedicated to this: AKBC.ws !)

Today, AI models are improving the generality, depth, and consistency of such works.  Translation models such as No Language Left Behind dominate the world of translation dictionaries. Large language models are increasing the quality of unsupervised summaries, the granularity and quality of inline source citations, and the efficacy and scalability of supervision — so illustrated encyclopedias and travel guides may not be far behind.   

Join our interactive panel as we discuss the latest methods for automating reference work construction,  showcase real-world examples for a range of reference types, and engage in a lively fireside chat about the future, with your input.

Topics will include: - How are these advancements reshaping search and discovery online? - How are they changing collaborative knowledge production? - What does this mean for Wikimedia projects, present and future? - + How might this help us to fill gaps in coverage and counter systemic bias? - + What might an automagically-updated wiki look like, edited primarily by different models? - + Could such a wiki make a good draft space or sister project?

Further details[edit source]

Qn. How does your session relate to the event themes: Diversity, Collaboration Future?

Diversity: automagical seeding and updating of drafts can help increase coverage in underrepresented areas, and improve freshness of related articles by reducing the barrier to entry for curators and editors.

Collaboration Future: this reviews an aspect of the future, and explores ways we can actively shape it to our needs rather than treating it as an external challenge to cope with.

Qn. What is the experience level needed for the audience for your session?

Everyone can participate in this session

Qn. What is the most appropriate format for this session?

  • Tick Onsite in Singapore
  • Empty Remote online participation, livestreamed
  • Empty Remote from a satellite event
  • Tick Hybrid with some participants in Singapore and others dialing in remotely
  • Empty Pre-recorded and available on demand