Jump to content

2025:Program/zelph: Enhancing Wikidata's Consistency Through Semantic Network Analysis

From Wikimania
View on Commons

Session title: zelph: Enhancing Wikidata's Consistency Through Semantic Network Analysis

Session type: Poster
Track: Open Data
Language: en

This session introduces zelph, an innovative semantic network system that analyzes Wikidata's knowledge graph to identify logical contradictions and derive new facts through inference. By representing both facts and inference rules within the same network structure, zelph offers a unique approach to knowledge representation that aligns perfectly with Wikidata's property model. The system has successfully processed the entire 1.4TB Wikidata dataset using only 9.2GB of memory, identifying numerous logical connections and inconsistencies. During this demonstration, attendees will see how zelph's custom scripting language allows for flexible definition of inference rules based on Wikidata's usage guidelines, how contradiction detection works in practice, and how the system's architecture enables efficient processing of massive knowledge bases.

Description

Wikidata represents one of humanity's most ambitious knowledge organization projects, with over 110 million items interconnected through properties and statements. However, this vast scale introduces challenges in maintaining logical consistency across the knowledge graph. This session demonstrates how zelph, a specialized semantic network system, addresses these challenges through innovative representation and inference techniques.

zelph distinguishes itself through several key innovations relevant to Wikidata:

First, zelph treats relations as first-class nodes rather than edge labels - an approach that perfectly mirrors Wikidata's property model. While traditional semantic networks represent relations as simple connections between nodes, zelph elevates them to equal status with entities. This enables sophisticated meta-reasoning about properties themselves, such as transitivity, symmetry, and inverse relationships.

Second, zelph encodes not only facts but also inference rules within the same semantic network. This means rules can reference other rules, creating a deeply integrated knowledge representation system. For example, rules can establish that if relation R is transitive, and X relates to Y through R, and Y relates to Z through R, then X must also relate to Z through R.

Third, zelph employs highly optimized data structures that process the entire 1.4TB Wikidata JSON dataset into merely 9.2GB of memory - requiring only about 80 bytes per item while preserving critical relationship information.

The demonstration will walk through practical examples of zelph's capabilities:

1. **Contradiction Detection**: We'll examine how zelph identifies logical inconsistencies in Wikidata, such as when an item belongs to mutually exclusive categories or when properties with inverse relationships are applied inconsistently.

2. **Inference Demonstration**: Attendees will see how zelph derives new facts from existing ones using its rule system, e.g. through properties like "subclass of," "part of," and others.

3. **Rule Customization**: The session will showcase zelph's scripting language, which allows for flexible definition of inference rules. The current implementation follows Wikidata's usage guidelines but can be refined through community collaboration.

4. **Technical Architecture**: A brief technical overview will explain how zelph's memory-efficient data structures work and how the system processes large datasets.

Interim results of zelph's analysis are available at zelph.org, which contains a navigable tree of 4580 pages representing entities and properties where meaningful deductions or contradictions were found.

Beyond technical demonstration, this session aims to foster discussion about:

- How tools like zelph could support Wikidata's data quality initiatives - Possibilities for integration with existing Wikidata infrastructure - Community collaboration to refine inference rules that reflect consensus about ontological relationships - Potential applications beyond contradiction detection, such as knowledge completion and query enhancement

This presentation targets both technical contributors interested in knowledge representation systems and Wikidata editors concerned with data quality and consistency. Familiarity with the Wikidata property model and basic knowledge of semantic networks will enhance understanding of the zelph approach.

Attendees will understand how semantic network analysis can enhance Wikidata's consistency and how community collaboration could transform zelph from a promising technical demonstration into a practical tool for the Wikidata ecosystem.

How does your session relate to the event theme, Wikimania@20 – Inclusivity. Impact. Sustainability?

My session addresses all three aspects of the event theme:

Inclusivity: By improving Wikidata's logical consistency, zelph makes knowledge more accessible and reliable for all users, regardless of their technical background. A more consistent knowledge base reduces barriers to understanding and using Wikidata.

Impact: zelph has already processed the entire Wikidata dataset and identified numerous logical connections and contradictions. This systematic approach to data quality can significantly impact Wikidata's reliability as a global knowledge resource, improving all downstream applications that rely on its data.

Sustainability: Maintaining data consistency in a project of Wikidata's scale is increasingly challenging as it grows. zelph offers a sustainable approach by automating contradiction detection and inference, reducing the manual effort required from community members. Its memory-efficient design (processing 1.4TB of data in just 9.2GB of memory) also represents a sustainable technical approach to knowledge processing.

By bringing together technical innovation and community collaboration, zelph supports Wikidata's long-term sustainability while maximizing its inclusive impact.

What is the experience level needed for the audience for your session?

This session is for an experienced audience

Resources

Speakers

  • zipproth
I am a senior developer with over 25 years of experience in complex software systems, algorithm design, and data processing. I hold a diploma in Computer Science from the Technical University of Munich and am a certified Project Management Professional (IPMA® Level D).
For most of my career, I worked as a freelance developer for various companies in Germany and Switzerland, specializing in measurement technology, medical imaging, and industrial automation. My expertise spans embedded systems, image processing, and high-performance computing. Three years ago, I founded acrion innovations GmbH, a Swiss technology company focused on developing innovative open-source solutions.
My most recent project, zelph, represents my first major contribution to the Wikimedia ecosystem. This specialized semantic network system can process Wikidata's entire 1.4TB dataset while requiring only 9.2GB of memory, and uniquely treats relations as first-class nodes—mirroring Wikidata's property model.
Notable achievements in my career include the creation of sophisticated concurrency frameworks, the development of a wavelet-based compression library that outperformed JPEG2000, and the development of the first free chess engine to outperform the widely known "Crafty" engine. My work has been utilized across diverse sectors including medical technology, industrial automation, and astronomical imaging.
While I am new to the Wikimedia movement, I bring a deep understanding of data structures and semantic networks essential for knowledge representation systems. Recently, I've been connecting with the wider open source community through a number of events, presenting at the World Lua Workshop 2022 and at FSFE's "I love free software day" in Zurich. I approach the Wikimedia ecosystem with both technical rigor and a collaborative mindset, seeking to contribute tools that can strengthen Wikidata's knowledge infrastructure while integrating with community priorities and standards.