2023:Program/Submissions/#semanticClimate: Making climate knowledge accessible using Wikidata - SKML87

From Wikimania


Title: #semanticClimate: Making climate knowledge accessible using Wikidata

Speakers:

Shweata N. Hegde

Pretalx link

Etherpad link

Room:

Start time:

End time:

Type: Other

Track: Research, Science, and Medicine

Submission state: submitted

Duration: 30 minutes

Do not record: false

Presentation language: en


Abstract & description[edit source]

Abstract[edit source]

The Intergovernmental Panel On Climate Change (IPCC) published the Synthesis report from its Sixth Assessment Cycle in March 2023. The UN Secretary-General called it a “survival guide to humanity”. But this guide is complex, jargon-rich and comes as a dumb PDF. Our diverse team (https://semanticclimate.github.io/) from India has been collaborating virtually for over two years to semantify IPCC reports to extract knowledge from them. In our Demonstration session, we will showcase the tools and the central role Wikidata-derived ontologies play.

Description[edit source]

The interactive technical session will involve a live demonstration of #semanticClimate’s (Open) Text and Image Mining tools and their use of Wikidata/Wikimedia. These tools can be used by anyone through our JupyterNotebooks/GoogleColab. ([An example](https://semanticclimate.org/p/en/posts/climate-knowledge-hunt/).) We start by exploring the PDF.

A typical sentence from the March 2023 IPCC Synthesis Report Summary: “Least developed countries (LDCs) and Small Island Developing States (SIDS) have much lower per capita emissions (1.7 tCO2-eq and 4.6 tCO2-eq, respectively) than the global average (6.9 tCO2-eq), excluding CO2-LULUCF. {WGIII SPM B.3, WGIII SPM B3.1, WGIII SPM B.3.2, WGIII SPM B.3.3}”

Some questions: 1. What do these terms (e.g. tCO2-eq) mean? 2. What other terms occur frequently in the IPCC literature? 3. What other IPCC Reports (e.g. WGIII SPM B.3.2) is this paragraph referring to?

We’ll explore these questions through our tools, by:

1. Converting the PDFs to Semantic HTML ([`pygetpapers`](https://github.com/petermr/pygetpapers), [`py4ami`](https://github.com/petermr/pyami)) and creating/displaying a knowledge graph framework.

2. Discovering frequently used climate terms and turning them into Wikidata-enhanced ontologies ([`docanalysis`](https://github.com/petermr/docanalysis)) Example terms: - tCO2-eq [tonne of carbon dioxide equivalent, Q57084755] - Least developed countries (LDCs) [Q752401] - Small Island Developing States (SIDS) [Q1434887] - LULUCF [Land use, land-use change and forestry, Q3348639] - WGIII SPM (IPCC WorkingGroup 3 Summary for PolicyMakers)

   Wikidata tells us what these terms mean, not only in English but in many other languages. 

3. annotating, indexing and searching the documents using the newly created ontologies and pre-made ones ([`docanalysis`](https://github.com/petermr/docanalysis), `py4ami`)

We’ll show how our NLP tools in Python can summarize documents and extract data from images such as charts and graphs.

In the second part (10 min.), we will gather feedback (both climate and non-climate). How can the large amount of extracted knowledge be fed back into Wikidata?.

Further details[edit source]

Qn. How does your session relate to the event themes: Diversity, Collaboration Future?

Our team members are Undergrad and Masters from India: both rural and urban. We also have volunteers from across the world: the UK, Brazil, and Germany. We follow OpenNotebook Philosophy; everything we do is collaboratively shared on multiple GitHub repositories. The team continues to collaborate virtually the way it's been for the past two years. We believe that young people on global challenges like climate crisis are the future.

Qn. What is the experience level needed for the audience for your session?

Everyone can participate in this session

Qn. What is the most appropriate format for this session?

  • Empty Onsite in Singapore
  • Tick Remote online participation, livestreamed
  • Empty Remote from a satellite event
  • Empty Hybrid with some participants in Singapore and others dialing in remotely
  • Empty Pre-recorded and available on demand