BERTopic: What Is So Special About v0.16? | by Maarten Grootendorst | Dec, 2023

Exploring Zero-Shot Topic Modeling, Model Merging, and LLMs

Maarten Grootendorst
Towards Data Science

My ambition for BERTopic is to make it the one-stop shop for topic modeling by allowing for significant flexibility and modularity.

That has been the goal for the last few years and with the release of v0.16, I believe we are a BIG step closer to achieving that.

First, let’s take a small step back. What is BERTopic?

Well, BERTopic is a topic modeling framework that allows users to essentially create their version of a topic model. With many variations of topic modeling implemented, the idea is that it should support almost any use case.

The modular nature of BERTopic allows you to build your topic model however you want. Switching components allows BERTopic to grow with the latest developments in Language AI.

With v0.16, several features were implemented that I believe will take BERTopic to the next level, namely:

  • Zero-Shot Topic Modeling
  • Model Merging
  • More Large Language Model (LLM) Support
Just a few of BERTopic’s capabilities.

In this tutorial, we will go through what these features are and for which use cases they could be helpful.

To start with, you can install BERTopic (with HF datasets) as follows:

pip install bertopic datasets

You can also follow along with the Google Colab Notebook to make sure everything works as intended.

Zero-shot techniques generally refer to having no examples to train your data on. Although you know the target, it is not assigned to your data.

In BERTopic, we use Zero-shot Topic Modeling to find pre-defined topics in large amounts of documents.

Imagine you have ArXiv abstracts about Machine Learning and you know that the topic “Large Language Models” is in there. With Zero-shot Topic Modeling, you can ask BERTopic to find all documents related to…

Source link

This post originally appeared on TechToday.