⭐ GLiClass: Generalist and Lightweight Model for Sequence Classification
This is an efficient zero-shot classifier inspired by GLiNER work. It demonstrates the same performance as a cross-encoder while being more compute-efficient because classification is done at a single forward path.
It can be used for topic classification
, sentiment analysis
and as a reranker in RAG
pipelines.
The model was trained on synthetic and licensed data that allow commercial use and can be used in commercial applications.
The backbone model is mdeberta-v3-base. It supports multilingual understanding, making it well-suited for tasks involving texts in different languages.
How to use:
First of all, you need to install GLiClass library:
pip install gliclass
pip install -U transformers>=4.48.0
Than you need to initialize a model and a pipeline:
English
from gliclass import GLiClassModel, ZeroShotClassificationPipeline
from transformers import AutoTokenizer
model = GLiClassModel.from_pretrained("knowledgator/gliclass-x-base")
tokenizer = AutoTokenizer.from_pretrained("knowledgator/gliclass-x-base", add_prefix_space=True)
pipeline = ZeroShotClassificationPipeline(model, tokenizer, classification_type='multi-label', device='cuda:0')
text = "One day I will see the world!"
labels = ["travel", "dreams", "sport", "science", "politics"]
results = pipeline(text, labels, threshold=0.5)[0] #because we have one text
for result in results:
print(result["label"], "=>", result["score"])
Spanish
from gliclass import GLiClassModel, ZeroShotClassificationPipeline
from transformers import AutoTokenizer
model = GLiClassModel.from_pretrained("knowledgator/gliclass-x-base")
tokenizer = AutoTokenizer.from_pretrained("knowledgator/gliclass-x-base", add_prefix_space=True)
pipeline = ZeroShotClassificationPipeline(model, tokenizer, classification_type='multi-label', device='cuda:0')
text = "¡Un día veré el mundo!"
labels = ["viajes", "sueños", "deportes", "ciencia", "política"]
results = pipeline(text, labels, threshold=0.5)[0]
for result in results:
print(result["label"], "=>", result["score"])
Italitan
from gliclass import GLiClassModel, ZeroShotClassificationPipeline
from transformers import AutoTokenizer
model = GLiClassModel.from_pretrained("knowledgator/gliclass-x-base")
tokenizer = AutoTokenizer.from_pretrained("knowledgator/gliclass-x-base", add_prefix_space=True)
pipeline = ZeroShotClassificationPipeline(model, tokenizer, classification_type='multi-label', device='cuda:0')
text = "Un giorno vedrò il mondo!"
labels = ["viaggi", "sogni", "sport", "scienza", "politica"]
results = pipeline(text, labels, threshold=0.5)[0]
for result in results:
print(result["label"], "=>", result["score"])
French
from gliclass import GLiClassModel, ZeroShotClassificationPipeline
from transformers import AutoTokenizer
model = GLiClassModel.from_pretrained("knowledgator/gliclass-x-base")
tokenizer = AutoTokenizer.from_pretrained("knowledgator/gliclass-x-base", add_prefix_space=True)
pipeline = ZeroShotClassificationPipeline(model, tokenizer, classification_type='multi-label', device='cuda:0')
text = "Un jour, je verrai le monde!"
labels = ["voyage", "rêves", "sport", "science", "politique"]
results = pipeline(text, labels, threshold=0.5)[0]
for result in results:
print(result["label"], "=>", result["score"])
German
from gliclass import GLiClassModel, ZeroShotClassificationPipeline
from transformers import AutoTokenizer
model = GLiClassModel.from_pretrained("knowledgator/gliclass-x-base")
tokenizer = AutoTokenizer.from_pretrained("knowledgator/gliclass-x-base", add_prefix_space=True)
pipeline = ZeroShotClassificationPipeline(model, tokenizer, classification_type='multi-label', device='cuda:0')
text = "Eines Tages werde ich die Welt sehen!"
labels = ["Reisen", "Träume", "Sport", "Wissenschaft", "Politik"]
results = pipeline(text, labels, threshold=0.5)[0]
for result in results:
print(result["label"], "=>", result["score"])
Benchmarks:
Below, you can see the F1 score on several text classification datasets. All tested models were not fine-tuned on those datasets and were tested in a zero-shot setting.
Multilingual benchmarks
Dataset | gliclass-x-base | gliclass-base-v3.0 | gliclass-large-v3.0 |
---|---|---|---|
FredZhang7/toxi-text-3M | 0.5972 | 0.5072 | 0.6118 |
SetFit/xglue_nc | 0.5014 | 0.5348 | 0.5378 |
Davlan/sib200_14classes | 0.4663 | 0.2867 | 0.3173 |
uhhlt/GermEval2017 | 0.3999 | 0.4010 | 0.4299 |
dolfsai/toxic_es | 0.1250 | 0.1399 | 0.1412 |
Average | 0.41796 | 0.37392 | 0.4076 |
General benchmarks
Dataset | gliclass-x-base | gliclass-base-v3.0 | gliclass-large-v3.0 |
---|---|---|---|
SetFit/CR | 0.8630 | 0.9127 | 0.9398 |
SetFit/sst2 | 0.8554 | 0.8959 | 0.9192 |
SetFit/sst5 | 0.3287 | 0.3376 | 0.4606 |
AmazonScience/massive | 0.2611 | 0.5040 | 0.5649 |
stanfordnlp/imdb | 0.8840 | 0.9251 | 0.9366 |
SetFit/20_newsgroups | 0.4116 | 0.4759 | 0.5958 |
SetFit/enron_spam | 0.5929 | 0.6760 | 0.7584 |
PolyAI/banking77 | 0.3098 | 0.4698 | 0.5574 |
takala/financial_phrasebank | 0.7851 | 0.8971 | 0.9000 |
ag_news | 0.6815 | 0.7279 | 0.7181 |
dair-ai/emotion | 0.3667 | 0.4447 | 0.4506 |
MoritzLaurer/cap_sotu | 0.3935 | 0.4614 | 0.4589 |
cornell/rotten_tomatoes | 0.7252 | 0.7943 | 0.8411 |
snips | 0.6307 | 0.9474 | 0.9692 |
Average | 0.5778 | 0.6764 | 0.7193 |
Citation
@misc{stepanov2025gliclassgeneralistlightweightmodel,
title={GLiClass: Generalist Lightweight Model for Sequence Classification Tasks},
author={Ihor Stepanov and Mykhailo Shtopko and Dmytro Vodianytskyi and Oleksandr Lukashov and Alexander Yavorskyi and Mykyta Yaroshenko},
year={2025},
eprint={2508.07662},
archivePrefix={arXiv},
primaryClass={cs.LG},
url={https://arxiv.org/abs/2508.07662},
}
- Downloads last month
- 77