Text Classification
Safetensors
GLiClass

⭐ GLiClass: Generalist and Lightweight Model for Sequence Classification

This is an efficient zero-shot classifier inspired by GLiNER work. It demonstrates the same performance as a cross-encoder while being more compute-efficient because classification is done at a single forward path.

It can be used for topic classification, sentiment analysis and as a reranker in RAG pipelines.

The model was trained on synthetic and licensed data that allow commercial use and can be used in commercial applications.

The backbone model is mdeberta-v3-base. It supports multilingual understanding, making it well-suited for tasks involving texts in different languages.

How to use:

First of all, you need to install GLiClass library:

pip install gliclass
pip install -U transformers>=4.48.0

Than you need to initialize a model and a pipeline:

English
from gliclass import GLiClassModel, ZeroShotClassificationPipeline
from transformers import AutoTokenizer

model = GLiClassModel.from_pretrained("knowledgator/gliclass-x-base")
tokenizer = AutoTokenizer.from_pretrained("knowledgator/gliclass-x-base", add_prefix_space=True)
pipeline = ZeroShotClassificationPipeline(model, tokenizer, classification_type='multi-label', device='cuda:0')

text = "One day I will see the world!"
labels = ["travel", "dreams", "sport", "science", "politics"]
results = pipeline(text, labels, threshold=0.5)[0] #because we have one text
for result in results:
 print(result["label"], "=>", result["score"])
Spanish
from gliclass import GLiClassModel, ZeroShotClassificationPipeline
from transformers import AutoTokenizer

model = GLiClassModel.from_pretrained("knowledgator/gliclass-x-base")
tokenizer = AutoTokenizer.from_pretrained("knowledgator/gliclass-x-base", add_prefix_space=True)
pipeline = ZeroShotClassificationPipeline(model, tokenizer, classification_type='multi-label', device='cuda:0')

text = "¡Un día veré el mundo!"
labels = ["viajes", "sueños", "deportes", "ciencia", "política"]
results = pipeline(text, labels, threshold=0.5)[0]
for result in results:
    print(result["label"], "=>", result["score"])
Italitan
from gliclass import GLiClassModel, ZeroShotClassificationPipeline
from transformers import AutoTokenizer

model = GLiClassModel.from_pretrained("knowledgator/gliclass-x-base")
tokenizer = AutoTokenizer.from_pretrained("knowledgator/gliclass-x-base", add_prefix_space=True)
pipeline = ZeroShotClassificationPipeline(model, tokenizer, classification_type='multi-label', device='cuda:0')

text = "Un giorno vedrò il mondo!"
labels = ["viaggi", "sogni", "sport", "scienza", "politica"]
results = pipeline(text, labels, threshold=0.5)[0]
for result in results:
    print(result["label"], "=>", result["score"])
French
from gliclass import GLiClassModel, ZeroShotClassificationPipeline
from transformers import AutoTokenizer

model = GLiClassModel.from_pretrained("knowledgator/gliclass-x-base")
tokenizer = AutoTokenizer.from_pretrained("knowledgator/gliclass-x-base", add_prefix_space=True)
pipeline = ZeroShotClassificationPipeline(model, tokenizer, classification_type='multi-label', device='cuda:0')

text = "Un jour, je verrai le monde!"
labels = ["voyage", "rêves", "sport", "science", "politique"]
results = pipeline(text, labels, threshold=0.5)[0]
for result in results:
    print(result["label"], "=>", result["score"])
German
from gliclass import GLiClassModel, ZeroShotClassificationPipeline
from transformers import AutoTokenizer

model = GLiClassModel.from_pretrained("knowledgator/gliclass-x-base")
tokenizer = AutoTokenizer.from_pretrained("knowledgator/gliclass-x-base", add_prefix_space=True)
pipeline = ZeroShotClassificationPipeline(model, tokenizer, classification_type='multi-label', device='cuda:0')

text = "Eines Tages werde ich die Welt sehen!"
labels = ["Reisen", "Träume", "Sport", "Wissenschaft", "Politik"]
results = pipeline(text, labels, threshold=0.5)[0]
for result in results:
    print(result["label"], "=>", result["score"])

Benchmarks:

Below, you can see the F1 score on several text classification datasets. All tested models were not fine-tuned on those datasets and were tested in a zero-shot setting.

Multilingual benchmarks

Dataset gliclass-x-base gliclass-base-v3.0 gliclass-large-v3.0
FredZhang7/toxi-text-3M 0.5972 0.5072 0.6118
SetFit/xglue_nc 0.5014 0.5348 0.5378
Davlan/sib200_14classes 0.4663 0.2867 0.3173
uhhlt/GermEval2017 0.3999 0.4010 0.4299
dolfsai/toxic_es 0.1250 0.1399 0.1412
Average 0.41796 0.37392 0.4076

General benchmarks

Dataset gliclass-x-base gliclass-base-v3.0 gliclass-large-v3.0
SetFit/CR 0.8630 0.9127 0.9398
SetFit/sst2 0.8554 0.8959 0.9192
SetFit/sst5 0.3287 0.3376 0.4606
AmazonScience/massive 0.2611 0.5040 0.5649
stanfordnlp/imdb 0.8840 0.9251 0.9366
SetFit/20_newsgroups 0.4116 0.4759 0.5958
SetFit/enron_spam 0.5929 0.6760 0.7584
PolyAI/banking77 0.3098 0.4698 0.5574
takala/financial_phrasebank 0.7851 0.8971 0.9000
ag_news 0.6815 0.7279 0.7181
dair-ai/emotion 0.3667 0.4447 0.4506
MoritzLaurer/cap_sotu 0.3935 0.4614 0.4589
cornell/rotten_tomatoes 0.7252 0.7943 0.8411
snips 0.6307 0.9474 0.9692
Average 0.5778 0.6764 0.7193

Citation

@misc{stepanov2025gliclassgeneralistlightweightmodel,
      title={GLiClass: Generalist Lightweight Model for Sequence Classification Tasks}, 
      author={Ihor Stepanov and Mykhailo Shtopko and Dmytro Vodianytskyi and Oleksandr Lukashov and Alexander Yavorskyi and Mykyta Yaroshenko},
      year={2025},
      eprint={2508.07662},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2508.07662}, 
}
Downloads last month
77
Safetensors
Model size
280M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for knowledgator/gliclass-x-base

Quantizations
1 model

Dataset used to train knowledgator/gliclass-x-base

Collection including knowledgator/gliclass-x-base