2023:Program/Submissions/Improve Your Research with Natural Language Processing - VE8KM3

From Wikimania

Title: Improve Your Research with Natural Language Processing

Speakers:

Kevin Chang

Kevin is the founder and CEO of Kai Analytics and he has more than a decade of experience in market research. His work with survey and qualitative data research inspired him to share the science of natural language processing and make qualitative data analysis more accessible. Since founding Kai Analytics in 2018, Kevin and his team have led global surveys focused on climate adaptation, youth empowerment, and economic development.

Pretalx link

Etherpad link

Room:

Start time:

End time:

Type: Workshop

Track: Technology

Submission state: submitted

Duration: 60 minutes

Do not record: false

Presentation language: en


Abstract & description[edit source]

Abstract[edit source]

Natural Language Processing (NLP) or Computational Linguistics is a field of data science that uncovers the nuance and context of everyday language. Through NLP, we can understand underlying sentiments and personas of qualitative research participants. This workshop will help ensure you are fully informed on using this amazing technology.

Description[edit source]

Analyzing open-ended comments from surveys can be daunting. Researchers often spend many hours manually grouping responses into themes before they can even start to perform any type of qualitative analysis. Fortunately, natural language processing (NLP) techniques can help us gain deeper insights with significantly less effort, while also helping researchers more responsibly engage with the data by extracting meaning from all parts of each response—not just what happens to catch our eye while skimming.

NLP is a popular and rapidly growing field of computational linguistics, which focuses on statistically uncovering themes within large bodies of text. Many of us are already familiar with word clouds, which are a common NLP technique for visualizing word frequency. Despite the popularity of word clouds, understanding language requires context, making single-word representations difficult to interpret.

The aim of this workshop is to explore essential text analytics concepts in NLP and linguistics through detailed, hands-on examples. Specifically, our objective is to illustrate how to analyze and visualize over 100,000 Coursera course reviews in Google CoLab using NLP functions written in Python with the popular Natural Language Toolkit (NLTK) package.

This workshop will cover various NLP concepts including pre-processing (tolkenization, stop-word removal, and lemmitization), automatic part-of-speech tagging, n-gram analysis, and visualization as a network graph. We will also explore how researchers can address problematic issues with real-life textual data, including domain-specific terminology, lexical diversity, and unreliable spelling/grammar.

During Q&A, we’ll help attendees address pain points with adapting these techniques for their own analyses.

Further details[edit source]

Qn. How does your session relate to the event themes: Diversity, Collaboration Future?

The problem with bias in machine learning (or AI) technologies is a big problem for ensuring diverse groups are represented in data. Knowing the benefits and also the limitation of such technologies and address issues of social bias in our data is crucial for research involving marginalized groups. We will be using Google Colab during the workshop which will support open-source and collaborative programming and learning.

Qn. What is the experience level needed for the audience for your session?

Everyone can participate in this session

Qn. What is the most appropriate format for this session?

  • Empty Onsite in Singapore
  • Empty Remote online participation, livestreamed
  • Empty Remote from a satellite event
  • Tick Hybrid with some participants in Singapore and others dialing in remotely
  • Empty Pre-recorded and available on demand