---
license: apache-2.0
datasets:
- uonlp/CulturaX
language:
- de
tags:
- german
- electra
- teams
- culturax
- gerturax-3
---

# 🇩🇪 GERTuraX-3

This repository hosts the GERTuraX-3 model:

* GERTuraX-3 is a pretrained German encoder-only model, based on ELECTRA and pretrained with the [TEAMS](https://aclanthology.org/2021.findings-acl.219/) approach.
* It was trained on 1.1TB of plain text from the [CulturaX](https://huggingface.co/datasets/uonlp/CulturaX) corpus.

# Pretraining

The [TensorFlow Model Garden LMs](https://github.com/stefan-it/model-garden-lms) repo was used to train an ELECTRA
model using the very efficient [TEAMS](https://aclanthology.org/2021.findings-acl.219/) approach.

As pretraining corpus, 1.1TB of plain text was extracted from the [CulturaX](https://huggingface.co/datasets/uonlp/CulturaX) corpus.

GERTuraX-3 uses a 64k vocab corpus (cased) and was trained for 1M steps with a batch size of 1024 and a sequence length of 512 on a v3-32 TPU Pod.

The pretraining took 5.4 days and the TensorBoard can be found [here](../../tensorboard).

# Evaluation

GERTuraX-3 was tested on GermEval 2014 (NER), GermEval 2018 (Sentiment analysis), CoNLL-2003 (NER) and on the ScandEval benchmark.

We use the same hyper-parameters for GermEval 2014, GermEval 2018 and CoNLL-2003 as used in the [GeBERTa](https://arxiv.org/abs/2310.07321) paper (cf. Table 5) using 5 runs with different seed and report the averaged score, conducted with the awesome Flair library.

The fine-tuning code repository can be found [here](https://github.com/stefan-it/gerturax-fine-tuner).

## GermEval 2014

### GermEval 2014 - Original version

| Model Name                                                                          | Avg. Development F1-Score | Avg. Test F1-Score |
| ----------------------------------------------------------------------------------- | ------------------------- | ------------------ |
| [GBERT Base](https://huggingface.co/deepset/gbert-base)                             | 87.53 ± 0.22              | 86.81 ± 0.16       |
| [GERTuraX-1](https://huggingface.co/gerturax/gerturax-1) (147GB)                    | 88.32 ± 0.21              | 87.18 ± 0.12       |
| [GERTuraX-2](https://huggingface.co/gerturax/gerturax-2) (486GB)                    | 88.58 ± 0.32              | 87.58 ± 0.15       |
| [GERTuraX-3](https://huggingface.co/gerturax/gerturax-3) (1.1TB)                    | 88.90 ± 0.06              | 87.84 ± 0.18       |
| [GeBERTa Base](https://huggingface.co/ikim-uk-essen/geberta-base)                   | 88.79 ± 0.16              | 88.03 ± 0.16       |

### GermEval 2014 - [Without Wikipedia](https://huggingface.co/datasets/stefan-it/germeval14_no_wikipedia)

| Model Name                                                                          | Avg. Development F1-Score | Avg. Test F1-Score |
| ----------------------------------------------------------------------------------- | ------------------------- | ------------------ |
| [GBERT Base](https://huggingface.co/deepset/gbert-base)                             | 90.48 ± 0.34              | 89.05 ± 0.21       |
| [GERTuraX-1](https://huggingface.co/gerturax/gerturax-1) (147GB)                    | 91.27 ± 0.11              | 89.73 ± 0.27       |
| [GERTuraX-2](https://huggingface.co/gerturax/gerturax-2) (486GB)                    | 91.70 ± 0.28              | 89.98 ± 0.22       |
| [GERTuraX-3](https://huggingface.co/gerturax/gerturax-3) (1.1TB)                    | 91.75 ± 0.17              | 90.24 ± 0.27       |
| [GeBERTa Base](https://huggingface.co/ikim-uk-essen/geberta-base)                   | 91.74 ± 0.23              | 90.28 ± 0.21       |

## GermEval 2018

### GermEval 2018 - Fine Grained

| Model Name                                                                          | Avg. Development F1-Score | Avg. Test F1-Score |
| ----------------------------------------------------------------------------------- | ------------------------- | ------------------ |
| [GBERT Base](https://huggingface.co/deepset/gbert-base)                             | 63.66 ± 4.08              | 51.86 ± 1.31       |
| [GERTuraX-1](https://huggingface.co/gerturax/gerturax-1) (147GB)                    | 62.87 ± 1.95              | 50.61 ± 0.36       |
| [GERTuraX-2](https://huggingface.co/gerturax/gerturax-2) (486GB)                    | 64.37 ± 1.31              | 51.02 ± 0.90       |
| [GERTuraX-3](https://huggingface.co/gerturax/gerturax-3) (1.1TB)                    | 66.39 ± 0.85              | 49.94 ± 2.06       |
| [GeBERTa Base](https://huggingface.co/ikim-uk-essen/geberta-base)                   | 65.81 ± 3.29              | 52.45 ± 0.57       |

### GermEval 2018 - Coarse Grained

| Model Name                                                                          | Avg. Development F1-Score | Avg. Test F1-Score |
| ----------------------------------------------------------------------------------- | ------------------------- | ------------------ |
| [GBERT Base](https://huggingface.co/deepset/gbert-base)                             | 83.15 ± 1.83              | 76.39 ± 0.64       |
| [GERTuraX-1](https://huggingface.co/gerturax/gerturax-1) (147GB)                    | 83.72 ± 0.68              | 77.11 ± 0.59       |
| [GERTuraX-2](https://huggingface.co/gerturax/gerturax-2) (486GB)                    | 84.51 ± 0.88              | 78.07 ± 0.91       |
| [GERTuraX-3](https://huggingface.co/gerturax/gerturax-3) (1.1TB)                    | 84.33 ± 1.48              | 78.44 ± 0.74       |
| [GeBERTa Base](https://huggingface.co/ikim-uk-essen/geberta-base)                   | 83.54 ± 1.27              | 78.36 ± 0.79       |

## CoNLL-2003 - German, Revised

| Model Name                                                                          | Avg. Development F1-Score | Avg. Test F1-Score |
| ----------------------------------------------------------------------------------- | ------------------------- | ------------------ |
| [GBERT Base](https://huggingface.co/deepset/gbert-base)                             | 92.15 ± 0.10              | 88.73 ± 0.21       |
| [GERTuraX-1](https://huggingface.co/gerturax/gerturax-1) (147GB)                    | 92.32 ± 0.14              | 90.09 ± 0.12       |
| [GERTuraX-2](https://huggingface.co/gerturax/gerturax-2) (486GB)                    | 92.75 ± 0.20              | 90.15 ± 0.14       |
| [GERTuraX-3](https://huggingface.co/gerturax/gerturax-3) (1.1TB)                    | 92.77 ± 0.28              | 90.83 ± 0.16       |
| [GeBERTa Base](https://huggingface.co/ikim-uk-essen/geberta-base)                   | 92.87 ± 0.21              | 90.94 ± 0.24       |

## ScandEval

We use v12.10.5 of [ScandEval](https://github.com/ScandEval/ScandEval) to evaluate on the following tasks:

* SB10k
* ScaLA-De
* GermanQuAD

The package can be installed via:

```bash
$ pip3 install "scandeval[all]==12.10.5"
```

### Results

#### SB10k

Evaluations on the SB10k dataset can be started like:

```bash
$ scandeval --model "deepset/gbert-base" --task sentiment-classification --language de
$ scandeval --model "ikim-uk-essen/geberta-base" --task sentiment-classification --language de
$ scandeval --model "gerturax/gerturax-1" --task sentiment-classification --language de
$ scandeval --model "gerturax/gerturax-2" --task sentiment-classification --language de
$ scandeval --model "gerturax/gerturax-3" --task sentiment-classification --language de
```

| Model Name                                                                          | Matthew's CC              | Macro F1-Score     |
| ----------------------------------------------------------------------------------- | ------------------------- | ------------------ |
| [GBERT Base](https://huggingface.co/deepset/gbert-base)                             | 59.58 ± 1.80              | 72.98 ± 1.20       |
| [GERTuraX-1](https://huggingface.co/gerturax/gerturax-1) (147GB)                    | 61.56 ± 2.58              | 74.18 ± 1.77       |
| [GERTuraX-2](https://huggingface.co/gerturax/gerturax-2) (486GB)                    | 65.24 ± 1.77              | 76.55 ± 1.22       |
| [GERTuraX-3](https://huggingface.co/gerturax/gerturax-3) (1.1TB)                    | 64.33 ± 2.17              | 75.99 ± 1.40       |
| [GeBERTa Base](https://huggingface.co/ikim-uk-essen/geberta-base)                   | 59.52 ± 2.14              | 72.76 ± 1.50       |

#### ScaLA-De

Evaluations on the ScaLA-De dataset can be started like:

```bash
$ scandeval --model "deepset/gbert-base" --task linguistic-acceptability --language de
$ scandeval --model "ikim-uk-essen/geberta-base" --task linguistic-acceptability --language de
$ scandeval --model "gerturax/gerturax-1" --task linguistic-acceptability --language de
$ scandeval --model "gerturax/gerturax-2" --task linguistic-acceptability --language de
$ scandeval --model "gerturax/gerturax-3" --task linguistic-acceptability --language de
```

| Model Name                                                                          | Matthew's CC              | Macro F1-Score     |
| ----------------------------------------------------------------------------------- | ------------------------- | ------------------ |
| [GBERT Base](https://huggingface.co/deepset/gbert-base)                             | 52.23 ± 4.34              | 73.90 ± 2.68       |
| [GERTuraX-1](https://huggingface.co/gerturax/gerturax-1) (147GB)                    | 74.55 ± 1.28              | 86.88 ± 0.75       |
| [GERTuraX-2](https://huggingface.co/gerturax/gerturax-2) (486GB)                    | 75.83 ± 2.85              | 87.59 ± 1.57       |
| [GERTuraX-3](https://huggingface.co/gerturax/gerturax-3) (1.1TB)                    | 78.24 ± 1.25              | 88.83 ± 0.63       |
| [GeBERTa Base](https://huggingface.co/ikim-uk-essen/geberta-base)                   | 59.70 ± 11.64             | 78.44 ± 6.12       |

#### GermanQuAD

```bash
$ scandeval --model "deepset/gbert-base" --task question-answering --language de
$ scandeval --model "ikim-uk-essen/geberta-base" --task question-answering --language de
$ scandeval --model "gerturax/gerturax-1" --task question-answering --language de
$ scandeval --model "gerturax/gerturax-2" --task question-answering --language de
$ scandeval --model "gerturax/gerturax-3" --task question-answering --language de
```

| Model Name                                                                          | Em                        | F1-Score           |
| ----------------------------------------------------------------------------------- | ------------------------- | ------------------ |
| [GBERT Base](https://huggingface.co/deepset/gbert-base)                             | 12.62 ± 2.20              | 29.62 ± 3.86       |
| [GERTuraX-1](https://huggingface.co/gerturax/gerturax-1) (147GB)                    | 27.24 ± 1.05              | 52.01 ± 1.10       |
| [GERTuraX-2](https://huggingface.co/gerturax/gerturax-2) (486GB)                    | 29.54 ± 1.05              | 55.12 ± 0.92       |
| [GERTuraX-3](https://huggingface.co/gerturax/gerturax-3) (1.1TB)                    | 28.49 ± 1.21              | 54.83 ± 1.26       |
| [GeBERTa Base](https://huggingface.co/ikim-uk-essen/geberta-base)                   | 28.81 ± 1.77              | 53.27 ± 1.92       |

# ❤️ Acknowledgements

GERTuraX is the outcome of the last 12 months of working with TPUs from the awesome [TRC program](https://sites.research.google/trc/about/)
and the [TensorFlow Model Garden](https://github.com/tensorflow/models) library.

Many thanks for providing TPUs!

Made from Bavarian Oberland with ❤️ and 🥨.