|
--- |
|
language: |
|
- multilingual |
|
- en |
|
- fr |
|
- es |
|
- de |
|
- el |
|
- bg |
|
- ru |
|
- tr |
|
- ar |
|
- vi |
|
- th |
|
- zh |
|
- hi |
|
- sw |
|
- ur |
|
tags: |
|
- text-classification |
|
- pytorch |
|
- tensorflow |
|
- zero-shot-classification |
|
- xlm-roberta |
|
- multilingual |
|
- nli |
|
- natural-language-inference |
|
datasets: |
|
- multi_nli |
|
- xnli |
|
license: mit |
|
pipeline_tag: zero-shot-classification |
|
library_name: transformers |
|
model-index: |
|
- name: xlm-roberta-large-xnli |
|
results: |
|
- task: |
|
type: zero-shot-classification |
|
name: Zero-Shot Classification |
|
dataset: |
|
name: XNLI |
|
type: xnli |
|
metrics: |
|
- type: accuracy |
|
value: 0.834 |
|
name: Accuracy |
|
- type: f1 |
|
value: 0.833 |
|
name: F1 Score |
|
widget: |
|
- text: "За кого вы голосуете в 2020 году?" |
|
candidate_labels: "politique étrangère, Europe, élections, affaires, politique" |
|
multi_class: true |
|
example_title: "Russian Political Classification" |
|
- text: "لمن تصوت في 2020؟" |
|
candidate_labels: "السياسة الخارجية, أوروبا, الانتخابات, الأعمال, السياسة" |
|
multi_class: true |
|
example_title: "Arabic Political Classification" |
|
- text: "2020'de kime oy vereceksiniz?" |
|
candidate_labels: "dış politika, Avrupa, seçimler, ticaret, siyaset" |
|
multi_class: true |
|
example_title: "Turkish Political Classification" |
|
- text: "I love this movie" |
|
candidate_labels: "positive, negative, neutral" |
|
multi_class: false |
|
example_title: "English Sentiment Analysis" |
|
--- |
|
|
|
# XLM-RoBERTa Large for Zero-Shot Classification (XNLI) |
|
|
|
## Model Description |
|
|
|
This model is based on the excellent work by [joeddav/xlm-roberta-large-xnli](https://huggingface.co/joeddav/xlm-roberta-large-xnli). It takes [xlm-roberta-large](https://huggingface.co/xlm-roberta-large) and fine-tunes it on a combination of NLI data in 15 languages. |
|
|
|
**Original Model Credit**: This model is a copy of [joeddav/xlm-roberta-large-xnli](https://huggingface.co/joeddav/xlm-roberta-large-xnli) by Joe Davison. All credit for the training and development goes to the original author. |
|
|
|
This model is intended to be used for zero-shot text classification, such as with the Hugging Face [ZeroShotClassificationPipeline](https://huggingface.co/docs/transformers/main_classes/pipelines#transformers.ZeroShotClassificationPipeline). |
|
|
|
## Quick Start |
|
|
|
```python |
|
from transformers import pipeline |
|
|
|
# Load the zero-shot classification pipeline |
|
classifier = pipeline("zero-shot-classification", |
|
model="YOUR_USERNAME/zero-shot-classification") |
|
|
|
# Example usage |
|
text = "I love this new smartphone, it's amazing!" |
|
candidate_labels = ["technology", "sports", "politics", "entertainment"] |
|
|
|
result = classifier(text, candidate_labels) |
|
print(result) |
|
``` |
|
|
|
## Intended Usage |
|
|
|
This model is intended to be used for zero-shot text classification, especially in languages other than English. It is fine-tuned on XNLI, which is a multilingual NLI dataset. The model can therefore be used with any of the languages in the XNLI corpus: |
|
|
|
- English |
|
- French |
|
- Spanish |
|
- German |
|
- Greek |
|
- Bulgarian |
|
- Russian |
|
- Turkish |
|
- Arabic |
|
- Vietnamese |
|
- Thai |
|
- Chinese |
|
- Hindi |
|
- Swahili |
|
- Urdu |
|
|
|
Since the base model was pre-trained trained on 100 different languages, the |
|
model has shown some effectiveness in languages beyond those listed above as |
|
well. See the full list of pre-trained languages in appendix A of the |
|
[XLM Roberata paper](https://arxiv.org/abs/1911.02116) |
|
|
|
For English-only classification, it is recommended to use |
|
[bart-large-mnli](https://huggingface.co/facebook/bart-large-mnli) or |
|
[a distilled bart MNLI model](https://huggingface.co/models?filter=pipeline_tag%3Azero-shot-classification&search=valhalla). |
|
|
|
### Using the zero-shot classification pipeline |
|
|
|
The model can be loaded with the `zero-shot-classification` pipeline like so: |
|
|
|
```python |
|
from transformers import pipeline |
|
classifier = pipeline("zero-shot-classification", |
|
model="YOUR_USERNAME/zero-shot-classification") |
|
``` |
|
|
|
You can then classify in any of the above languages. You can even pass the labels in one language and the sequence to |
|
classify in another: |
|
|
|
```python |
|
# we will classify the Russian translation of, "Who are you voting for in 2020?" |
|
sequence_to_classify = "За кого вы голосуете в 2020 году?" |
|
# we can specify candidate labels in Russian or any other language above: |
|
candidate_labels = ["Europe", "public health", "politics"] |
|
classifier(sequence_to_classify, candidate_labels) |
|
# {'labels': ['politics', 'Europe', 'public health'], |
|
# 'scores': [0.9048484563827515, 0.05722189322113991, 0.03792969882488251], |
|
# 'sequence': 'За кого вы голосуете в 2020 году?'} |
|
``` |
|
|
|
The default hypothesis template is the English, `This text is {}`. If you are working strictly within one language, it |
|
may be worthwhile to translate this to the language you are working with: |
|
|
|
```python |
|
sequence_to_classify = "¿A quién vas a votar en 2020?" |
|
candidate_labels = ["Europa", "salud pública", "política"] |
|
hypothesis_template = "Este ejemplo es {}." |
|
classifier(sequence_to_classify, candidate_labels, hypothesis_template=hypothesis_template) |
|
# {'labels': ['política', 'Europa', 'salud pública'], |
|
# 'scores': [0.9109585881233215, 0.05954807624220848, 0.029493311420083046], |
|
# 'sequence': '¿A quién vas a votar en 2020?'} |
|
``` |
|
|
|
### Using with manual PyTorch |
|
|
|
```python |
|
# pose sequence as a NLI premise and label as a hypothesis |
|
from transformers import AutoModelForSequenceClassification, AutoTokenizer |
|
nli_model = AutoModelForSequenceClassification.from_pretrained('YOUR_USERNAME/zero-shot-classification') |
|
tokenizer = AutoTokenizer.from_pretrained('YOUR_USERNAME/zero-shot-classification') |
|
|
|
premise = sequence |
|
hypothesis = f'This example is {label}.' |
|
|
|
# run through model pre-trained on MNLI |
|
x = tokenizer.encode(premise, hypothesis, return_tensors='pt', |
|
truncation_strategy='only_first') |
|
logits = nli_model(x.to(device))[0] |
|
|
|
# we throw away "neutral" (dim 1) and take the probability of |
|
# "entailment" (2) as the probability of the label being true |
|
entail_contradiction_logits = logits[:,[0,2]] |
|
probs = entail_contradiction_logits.softmax(dim=1) |
|
prob_label_is_true = probs[:,1] |
|
``` |
|
|
|
## Training |
|
|
|
This model was pre-trained on set of 100 languages, as described in |
|
[the original paper](https://arxiv.org/abs/1911.02116). It was then fine-tuned on the task of NLI on the concatenated |
|
MNLI train set and the XNLI validation and test sets. Finally, it was trained for one additional epoch on only XNLI |
|
data where the translations for the premise and hypothesis are shuffled such that the premise and hypothesis for |
|
each example come from the same original English example but the premise and hypothesis are of different languages. |
|
|
|
## Model Performance |
|
|
|
This model achieves excellent performance on multilingual zero-shot classification tasks. For detailed performance metrics, please refer to the [original model](https://huggingface.co/joeddav/xlm-roberta-large-xnli). |
|
|
|
## Limitations and Bias |
|
|
|
- The model may have biases inherited from the training data (MNLI and XNLI datasets) |
|
- Performance may vary across different languages and domains |
|
- The model works best with the 15 languages explicitly included in the XNLI training data |
|
- For English-only tasks, consider using specialized English models like `facebook/bart-large-mnli` |
|
|
|
## Citation |
|
|
|
If you use this model, please cite the original work: |
|
|
|
```bibtex |
|
@misc{davison2020zero, |
|
title={Zero-Shot Learning in Modern NLP}, |
|
author={Joe Davison}, |
|
year={2020}, |
|
howpublished={\url{https://joeddav.github.io/blog/2020/05/29/ZSL.html}}, |
|
} |
|
``` |
|
|
|
## License |
|
|
|
This model is released under the MIT License, following the original model's licensing. |
|
|
|
## Contact |
|
|
|
This is a copy of the original model by Joe Davison. For questions about the model architecture and training, please refer to the [original repository](https://huggingface.co/joeddav/xlm-roberta-large-xnli). |
|
|