Topic Modeling

(semi-)automated content analysis

Workshop Overview:

Researchers often collect open text or narrative data. While these text/narrative data are rich and meaningful, researchers often have limited access to methods of analyzing these data. Traditional approaches such as manual coding require a great deal of resources (e.g., time, money, training, personnel), can be difficult to replicate or reproduce, and cannot be conducted at scale (e.g., larger sample sizes). Topic modeling is an alternative class of methods that predicts and quantifies what a given text/document is about using machine learning/natural language processing. These computational methods enable large-scale content analyses in a fraction of the time typically needed, allow novel analyses of topics as continuous, non-mutually exclusive variables, and (ideally) keep humans in the driver’s seat (Yeung et al., 2022; Yeung & Fernandes, 2023).

Workshop Objectives:

In this session, we’ll cover (1) the theoretical bases of topic modeling, (2) two open-source, local-environment topic modeling methods: structural topic modeling (STM) and BERTopic, and (3) situations where topic modeling might be particularly appropriate (or inappropriate).

For code and any associated materials: https://github.com/ryancyeung/topic_model_workshop

References

2023

  1. Specific topics, specific symptoms: Linking the content of recurrent involuntary memories to mental health using computational text analysis
    Ryan C Yeung and Myra A Fernandes
    npj Mental Health Research, 2023

2022

  1. Understanding autobiographical memory content using computational text analysis
    Ryan C Yeung, Marek Stastna, and Myra A Fernandes
    Memory, 2022