File size: 8,096 Bytes
099e31f
94028e0
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
099e31f
94028e0
 
 
 
 
 
 
 
099e31f
94028e0
 
099e31f
 
94028e0
 
 
 
 
 
 
 
 
 
762e268
 
 
 
 
 
 
099e31f
94028e0
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
099e31f
 
94028e0
099e31f
 
 
94028e0
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
099e31f
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
94028e0
099e31f
 
 
 
 
 
94028e0
099e31f
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
94028e0
099e31f
 
 
 
94028e0
 
099e31f
 
 
 
 
 
 
 
 
 
94028e0
099e31f
 
 
 
 
 
 
 
 
 
 
 
94028e0
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
---
language:
  - multilingual
  - en
  - fr
  - es
  - de
  - el
  - bg
  - ru
  - tr
  - ar
  - vi
  - th
  - zh
  - hi
  - sw
  - ur
tags:
  - text-classification
  - pytorch
  - tensorflow
  - zero-shot-classification
  - xlm-roberta
  - multilingual
  - nli
  - natural-language-inference
datasets:
  - multi_nli
  - xnli
license: mit
pipeline_tag: zero-shot-classification
library_name: transformers
model-index:
  - name: xlm-roberta-large-xnli
    results:
      - task:
          type: zero-shot-classification
          name: Zero-Shot Classification
        dataset:
          name: XNLI
          type: xnli
        metrics:
          - type: accuracy
            value: 0.834
            name: Accuracy
          - type: f1
            value: 0.833
            name: F1 Score
widget:
  - text: "За кого вы голосуете в 2020 году?"
    candidate_labels: "politique étrangère, Europe, élections, affaires, politique"
    multi_class: true
    example_title: "Russian Political Classification"
  - text: "لمن تصوت في 2020؟"
    candidate_labels: "السياسة الخارجية, أوروبا, الانتخابات, الأعمال, السياسة"
    multi_class: true
    example_title: "Arabic Political Classification"
  - text: "2020'de kime oy vereceksiniz?"
    candidate_labels: "dış politika, Avrupa, seçimler, ticaret, siyaset"
    multi_class: true
    example_title: "Turkish Political Classification"
  - text: "I love this movie"
    candidate_labels: "positive, negative, neutral"
    multi_class: false
    example_title: "English Sentiment Analysis"
---

# XLM-RoBERTa Large for Zero-Shot Classification (XNLI)

## Model Description

This model is based on the excellent work by [joeddav/xlm-roberta-large-xnli](https://huggingface.co/joeddav/xlm-roberta-large-xnli). It takes [xlm-roberta-large](https://huggingface.co/xlm-roberta-large) and fine-tunes it on a combination of NLI data in 15 languages.

**Original Model Credit**: This model is a copy of [joeddav/xlm-roberta-large-xnli](https://huggingface.co/joeddav/xlm-roberta-large-xnli) by Joe Davison. All credit for the training and development goes to the original author.

This model is intended to be used for zero-shot text classification, such as with the Hugging Face [ZeroShotClassificationPipeline](https://huggingface.co/docs/transformers/main_classes/pipelines#transformers.ZeroShotClassificationPipeline).

## Quick Start

```python
from transformers import pipeline

# Load the zero-shot classification pipeline
classifier = pipeline("zero-shot-classification",
                      model="YOUR_USERNAME/zero-shot-classification")

# Example usage
text = "I love this new smartphone, it's amazing!"
candidate_labels = ["technology", "sports", "politics", "entertainment"]

result = classifier(text, candidate_labels)
print(result)
```

## Intended Usage

This model is intended to be used for zero-shot text classification, especially in languages other than English. It is fine-tuned on XNLI, which is a multilingual NLI dataset. The model can therefore be used with any of the languages in the XNLI corpus:

- English
- French
- Spanish
- German
- Greek
- Bulgarian
- Russian
- Turkish
- Arabic
- Vietnamese
- Thai
- Chinese
- Hindi
- Swahili
- Urdu

Since the base model was pre-trained trained on 100 different languages, the
model has shown some effectiveness in languages beyond those listed above as
well. See the full list of pre-trained languages in appendix A of the
[XLM Roberata paper](https://arxiv.org/abs/1911.02116)

For English-only classification, it is recommended to use
[bart-large-mnli](https://huggingface.co/facebook/bart-large-mnli) or
[a distilled bart MNLI model](https://huggingface.co/models?filter=pipeline_tag%3Azero-shot-classification&search=valhalla).

### Using the zero-shot classification pipeline

The model can be loaded with the `zero-shot-classification` pipeline like so:

```python
from transformers import pipeline
classifier = pipeline("zero-shot-classification",
                      model="YOUR_USERNAME/zero-shot-classification")
```

You can then classify in any of the above languages. You can even pass the labels in one language and the sequence to
classify in another:

```python
# we will classify the Russian translation of, "Who are you voting for in 2020?"
sequence_to_classify = "За кого вы голосуете в 2020 году?"
# we can specify candidate labels in Russian or any other language above:
candidate_labels = ["Europe", "public health", "politics"]
classifier(sequence_to_classify, candidate_labels)
# {'labels': ['politics', 'Europe', 'public health'],
#  'scores': [0.9048484563827515, 0.05722189322113991, 0.03792969882488251],
#  'sequence': 'За кого вы голосуете в 2020 году?'}
```

The default hypothesis template is the English, `This text is {}`. If you are working strictly within one language, it
may be worthwhile to translate this to the language you are working with:

```python
sequence_to_classify = "¿A quién vas a votar en 2020?"
candidate_labels = ["Europa", "salud pública", "política"]
hypothesis_template = "Este ejemplo es {}."
classifier(sequence_to_classify, candidate_labels, hypothesis_template=hypothesis_template)
# {'labels': ['política', 'Europa', 'salud pública'],
#  'scores': [0.9109585881233215, 0.05954807624220848, 0.029493311420083046],
#  'sequence': '¿A quién vas a votar en 2020?'}
```

### Using with manual PyTorch

```python
# pose sequence as a NLI premise and label as a hypothesis
from transformers import AutoModelForSequenceClassification, AutoTokenizer
nli_model = AutoModelForSequenceClassification.from_pretrained('YOUR_USERNAME/zero-shot-classification')
tokenizer = AutoTokenizer.from_pretrained('YOUR_USERNAME/zero-shot-classification')

premise = sequence
hypothesis = f'This example is {label}.'

# run through model pre-trained on MNLI
x = tokenizer.encode(premise, hypothesis, return_tensors='pt',
                     truncation_strategy='only_first')
logits = nli_model(x.to(device))[0]

# we throw away "neutral" (dim 1) and take the probability of
# "entailment" (2) as the probability of the label being true
entail_contradiction_logits = logits[:,[0,2]]
probs = entail_contradiction_logits.softmax(dim=1)
prob_label_is_true = probs[:,1]
```

## Training

This model was pre-trained on set of 100 languages, as described in
[the original paper](https://arxiv.org/abs/1911.02116). It was then fine-tuned on the task of NLI on the concatenated
MNLI train set and the XNLI validation and test sets. Finally, it was trained for one additional epoch on only XNLI
data where the translations for the premise and hypothesis are shuffled such that the premise and hypothesis for
each example come from the same original English example but the premise and hypothesis are of different languages.

## Model Performance

This model achieves excellent performance on multilingual zero-shot classification tasks. For detailed performance metrics, please refer to the [original model](https://huggingface.co/joeddav/xlm-roberta-large-xnli).

## Limitations and Bias

- The model may have biases inherited from the training data (MNLI and XNLI datasets)
- Performance may vary across different languages and domains
- The model works best with the 15 languages explicitly included in the XNLI training data
- For English-only tasks, consider using specialized English models like `facebook/bart-large-mnli`

## Citation

If you use this model, please cite the original work:

```bibtex
@misc{davison2020zero,
    title={Zero-Shot Learning in Modern NLP},
    author={Joe Davison},
    year={2020},
    howpublished={\url{https://joeddav.github.io/blog/2020/05/29/ZSL.html}},
}
```

## License

This model is released under the MIT License, following the original model's licensing.

## Contact

This is a copy of the original model by Joe Davison. For questions about the model architecture and training, please refer to the [original repository](https://huggingface.co/joeddav/xlm-roberta-large-xnli).