Topic Modeling

(semi-)automated content analysis

Workshop Overview:

Researchers often collect open text or narrative data. While these text/narrative data are rich and meaningful, researchers often have limited access to methods of analyzing these data. Traditional approaches such as manual coding require a great deal of resources (e.g., time, money, training, personnel), can be difficult to replicate or reproduce, and cannot be conducted at scale (e.g., larger sample sizes). Topic modeling is an alternative class of methods that predicts and quantifies what a given text/document is about using machine learning/natural language processing. These computational methods enable large-scale content analyses in a fraction of the time typically needed, allow novel analyses of topics as continuous, non-mutually exclusive variables, and (ideally) keep humans in the driver’s seat.

Workshop Objectives:

In this session, we’ll cover (1) the theoretical bases of topic modeling, (2) two open-source, local-environment topic modeling methods: structural topic modeling (STM) and BERTopic, and (3) situations where topic modeling might be particularly appropriate (or inappropriate).

For code and any associated materials: https://github.com/ryancyeung/topic_model_workshop