The process should be similar, especially considering you can download the model and run it through any API that you want. There are many options to choose from though, ranging from GPU-acceleration (e.g., cuML) or CPU-focused applications (e.g., Model2Vec).
Maarten Grootendorst
MaartenGr
AI & ML interests
None yet
Recent Activity
commented on
their
article
6 days ago
Introducing BERTopic Integration with Hugging Face Hub
Organizations
MaartenGr's activity

commented on
Introducing BERTopic Integration with Hugging Face Hub
6 days ago

reacted to
asoria's
post with ❤️
4 months ago
Post
1911
🚀 Exploring Topic Modeling with BERTopic 🤖
When you come across an interesting dataset, you often wonder:
Which topics frequently appear in these documents? 🤔
What is this data really about? 📊
Topic modeling helps answer these questions by identifying recurring themes within a collection of documents. This process enables quick and efficient exploratory data analysis.
I’ve been working on an app that leverages BERTopic, a flexible framework designed for topic modeling. Its modularity makes BERTopic powerful, allowing you to switch components with your preferred algorithms. It also supports handling large datasets efficiently by merging models using the BERTopic.merge_models approach. 🔗
🔍 How do we make this work?
Here’s the stack we’re using:
📂 Data Source ➡️ Hugging Face datasets with DuckDB for retrieval
🧠 Text Embeddings ➡️ Sentence Transformers (all-MiniLM-L6-v2)
⚡ Dimensionality Reduction ➡️ RAPIDS cuML UMAP for GPU-accelerated performance
🔍 Clustering ➡️ RAPIDS cuML HDBSCAN for fast clustering
✂️ Tokenization ➡️ CountVectorizer
🔧 Representation Tuning ➡️ KeyBERTInspired + Hugging Face Inference Client with Meta-Llama-3-8B-Instruct
🌍 Visualization ➡️ Datamapplot library
Check out the space and see how you can quickly generate topics from your dataset: datasets-topics/topics-generator
Powered by @MaartenGr - BERTopic
When you come across an interesting dataset, you often wonder:
Which topics frequently appear in these documents? 🤔
What is this data really about? 📊
Topic modeling helps answer these questions by identifying recurring themes within a collection of documents. This process enables quick and efficient exploratory data analysis.
I’ve been working on an app that leverages BERTopic, a flexible framework designed for topic modeling. Its modularity makes BERTopic powerful, allowing you to switch components with your preferred algorithms. It also supports handling large datasets efficiently by merging models using the BERTopic.merge_models approach. 🔗
🔍 How do we make this work?
Here’s the stack we’re using:
📂 Data Source ➡️ Hugging Face datasets with DuckDB for retrieval
🧠 Text Embeddings ➡️ Sentence Transformers (all-MiniLM-L6-v2)
⚡ Dimensionality Reduction ➡️ RAPIDS cuML UMAP for GPU-accelerated performance
🔍 Clustering ➡️ RAPIDS cuML HDBSCAN for fast clustering
✂️ Tokenization ➡️ CountVectorizer
🔧 Representation Tuning ➡️ KeyBERTInspired + Hugging Face Inference Client with Meta-Llama-3-8B-Instruct
🌍 Visualization ➡️ Datamapplot library
Check out the space and see how you can quickly generate topics from your dataset: datasets-topics/topics-generator
Powered by @MaartenGr - BERTopic
Hi! Thank you for reaching out. I generally like to keep the post either on my newsletter or Medium where I have both gained some followers.
Having said that, I would be open to a collaboration with HF to publish it. Due to the time spent on this guide, it would need to be more than just publishing it as a community blog.
What is the training benchmark for model `BERTopic_Wikipedia`
1
#3 opened 11 months ago
by
benjaminliupenrose
How to merge topics for model `BERTopic_Wikipedia`
1
#2 opened 11 months ago
by
benjaminliupenrose
Inference API err: HfApiJson Deserialize Error
2
#1 opened over 1 year ago
by
ongkn


published
an
article
over 1 year ago
Article
Introducing BERTopic Integration with Hugging Face Hub
•
7