2019:Research/Sockpuppet detection in the English Wikipedia

‎

Research/Sockpuppet detection in the English Wikipedia

Curie · Saturday 14:20 - 14:45

Abstract

Sockpuppetry is the use of more than one account on any social platform. On Wikipedia, sockpuppets are frequently used in a large-scale coordinated mechanism to create undesirable articles or insert point of view into the articles. The intentions can range from financial to political. Current methods to identify sockpuppets in English Wikipedia are largely manual and rely on experience. This is a very challenging task as over 140,000 edits are made every day by over 20,000 editors. Making this weeding process fast, efficient, and accurate is key to making Wikipedia a safe avenue with trustworthy content and encourages collaborative discussions.

Here we demonstrate our on-going work on English Wikipedia to assist checkusers to efficiently surface sockpuppet accounts using machine learning. We create two novel models for two prediction tasks: first, to find if an account belongs to a sockpuppet group and second, to identify if a pair of accounts are controlled by the same entity. These two machine learning models are trained on the public editing activity done by all sockpuppet accounts that were identified and blocked till December, 2018. We use this set of ground truth sockpuppets to train feature-based model and deep-learning based model for the two prediction tasks. These models outperform strong baselines by a significant margin.

Next, we will present the findings from our initial field experiment with checkusers using the system. We will present the feedback from their experience, its strengths, and the points of improvement. We will illustrate the effectiveness of the system in aiding the speed and accuracy during sockpuppet investigations. We will present some case-studies of sockpuppet investigations done using our proposed system. Finally, we will look at the correct cases and the mistakes of the system.

Authors

Srijan Kumar (Stanford University and Georgia Tech University), Jure Leskovec (Stanford University), Leila Zia (WMF)

Session type

22-min presentation.

Participants [subscribe here!]

Nicolas NALLET
Rich Farmbrough (talk)