2022:Submissions/LinguaLibre: pivoting to diversity

From Wikimania
Jump to navigation Jump to search
Session Notes (etherpad)


  • Language: English
  • Status: Live

Speaker(s)[edit source]

Hugo Lopez

Abstract[edit source]

LinguaLibre 2022 Paris Surui training-03.jpg

There are 7,000 distinct languages identified among humans, half of them under thread.
LinguaLibre is a Wikimedia subproject, a web application, to rapidly mass record words of those languages.
In past years, we worked to solidly record 30 languages and initiate effort among 160 languages.
Has larger language are doing correct progresses we got to scale up much, much more.
- We developed low cost, crowd sourcing methodology to create and record vocabulary of those thousands languages.
- We want to develop more added-value services to serve, preserve, revitalize minority languages
But resources are lacking.
Here is what we do to solve it, here is how you may help.

Learning Outcomes[edit source]

1) Awareness about lingualibre existence and it's usage (what tool)
2) Awareness about outcomes so far (what achievements)
3) Awareness about experiments and methodology (how)
4) Awareness about ambitions (opportunities)
5) Awareness about obstacles (risks)

Biography[edit source]

Raised in a rural French village where the native language was forcibly disappeared.
Hugo lead several Wikimedia projects for two decades in parallel to Chinese language and Educational Technology and Innovation professional activity. LinguaLibre rapid recording service is a powerful tool to support lesser known and marginalized languages.
Strong supporter of diversity, co-founder and most active coordinator of LinguaLibre, his multi-sided view may bring enlightening insights on the current project's state, role, and what should come ahead.


LinguaLibre and diversity[edit | edit source]

Interactive map (quantitative)[edit | edit source]

Interactive icon.svg
Interactive map of Lingua Libre speakers. (map query)

Communities types (qualitative)[edit | edit source]

Type Population Status Technological access Conservation story
National languages millions State-supported Good
Catalan few million Local institutional support Good
Occitan 4M in 1950. 20,000 in 2020 Late support Partial From repression to "dying or death so now it's protected as a museum asset".
Taiwanese Aboriginal 100,000 Late support Partial From repression to folklorisation to good faith late conservation.
Surui 2000 Precarious relationship Poor Precarious local efforts, LinguaLibre training workshop for Surui (May 20, 2022)

Technological[edit | edit source]


Presentation[edit | edit source]

Authors: Yug

1) Why ? Who are we ?

  • Defending our heritage
  • Fighting back monolinguism

2) How ?

  • Mission: document language and voice diversity
    • by accent (areas), gender, age, languages
    • into audio dictionaries
    • from larger language to lesser documented

3) LinguaLibre rapid recording tool (what)

  • Lingua libre Record Studio
  • Rapid !
  • Solid beginner dictionary : 3 days + 3 hours

4) Résults so far

  • 160 languages / 25+ correctly
  • 800+ contributors

5) Current state: intermediary crisis

  • Observable biase to larger language (map) -> new strategies
  • Diverses usages -> new bugs
  • Diverse community typologies -> new methodologies

6) The plan 2018-2022

  • Larger, richer languages as demonstrators
  • Expansion exploratoire
  • Expansion

7) New plan

  • Larger, richer languages as demonstrators
  • Expansion exploratoire -> unblanced
  • Bugs/different needs: Technological fixes, New methodologies
  • Connect to marginalized languages

See also[edit | edit source]

Key entry points[edit | edit source]

Tutorials[edit | edit source]

Those short videos give a good sense of Lingua Libre's usage.