--- license: apache-2.0 datasets: - uonlp/CulturaX language: - de tags: - german - electra - teams - culturax - gerturax-3 --- # 🇩🇪 GERTuraX-3 This repository hosts the GERTuraX-3 model: * GERTuraX-3 is a pretrained German encoder-only model, based on ELECTRA and pretrained with the [TEAMS](https://aclanthology.org/2021.findings-acl.219/) approach. * It was trained on 1.1TB of plain text from the [CulturaX](https://huggingface.co/datasets/uonlp/CulturaX) corpus. # Pretraining The [TensorFlow Model Garden LMs](https://github.com/stefan-it/model-garden-lms) repo was used to train an ELECTRA model using the very efficient [TEAMS](https://aclanthology.org/2021.findings-acl.219/) approach. As pretraining corpus, 1.1TB of plain text was extracted from the [CulturaX](https://huggingface.co/datasets/uonlp/CulturaX) corpus. GERTuraX-3 uses a 64k vocab corpus (cased) and was trained for 1M steps with a batch size of 1024 and a sequence length of 512 on a v3-32 TPU Pod. The pretraining took 5.4 days and the TensorBoard can be found [here](../../tensorboard). # Evaluation GERTuraX-3 was tested on GermEval 2014 (NER), GermEval 2018 (Sentiment analysis), CoNLL-2003 (NER) and on the ScandEval benchmark. We use the same hyper-parameters for GermEval 2014, GermEval 2018 and CoNLL-2003 as used in the [GeBERTa](https://arxiv.org/abs/2310.07321) paper (cf. Table 5) using 5 runs with different seed and report the averaged score, conducted with the awesome Flair library. The fine-tuning code repository can be found [here](https://github.com/stefan-it/gerturax-fine-tuner). ## GermEval 2014 ### GermEval 2014 - Original version | Model Name | Avg. Development F1-Score | Avg. Test F1-Score | | ----------------------------------------------------------------------------------- | ------------------------- | ------------------ | | [GBERT Base](https://huggingface.co/deepset/gbert-base) | 87.53 ± 0.22 | 86.81 ± 0.16 | | [GERTuraX-1](https://huggingface.co/gerturax/gerturax-1) (147GB) | 88.32 ± 0.21 | 87.18 ± 0.12 | | [GERTuraX-2](https://huggingface.co/gerturax/gerturax-2) (486GB) | 88.58 ± 0.32 | 87.58 ± 0.15 | | [GERTuraX-3](https://huggingface.co/gerturax/gerturax-3) (1.1TB) | 88.90 ± 0.06 | 87.84 ± 0.18 | | [GeBERTa Base](https://huggingface.co/ikim-uk-essen/geberta-base) | 88.79 ± 0.16 | 88.03 ± 0.16 | ### GermEval 2014 - [Without Wikipedia](https://huggingface.co/datasets/stefan-it/germeval14_no_wikipedia) | Model Name | Avg. Development F1-Score | Avg. Test F1-Score | | ----------------------------------------------------------------------------------- | ------------------------- | ------------------ | | [GBERT Base](https://huggingface.co/deepset/gbert-base) | 90.48 ± 0.34 | 89.05 ± 0.21 | | [GERTuraX-1](https://huggingface.co/gerturax/gerturax-1) (147GB) | 91.27 ± 0.11 | 89.73 ± 0.27 | | [GERTuraX-2](https://huggingface.co/gerturax/gerturax-2) (486GB) | 91.70 ± 0.28 | 89.98 ± 0.22 | | [GERTuraX-3](https://huggingface.co/gerturax/gerturax-3) (1.1TB) | 91.75 ± 0.17 | 90.24 ± 0.27 | | [GeBERTa Base](https://huggingface.co/ikim-uk-essen/geberta-base) | 91.74 ± 0.23 | 90.28 ± 0.21 | ## GermEval 2018 ### GermEval 2018 - Fine Grained | Model Name | Avg. Development F1-Score | Avg. Test F1-Score | | ----------------------------------------------------------------------------------- | ------------------------- | ------------------ | | [GBERT Base](https://huggingface.co/deepset/gbert-base) | 63.66 ± 4.08 | 51.86 ± 1.31 | | [GERTuraX-1](https://huggingface.co/gerturax/gerturax-1) (147GB) | 62.87 ± 1.95 | 50.61 ± 0.36 | | [GERTuraX-2](https://huggingface.co/gerturax/gerturax-2) (486GB) | 64.37 ± 1.31 | 51.02 ± 0.90 | | [GERTuraX-3](https://huggingface.co/gerturax/gerturax-3) (1.1TB) | 66.39 ± 0.85 | 49.94 ± 2.06 | | [GeBERTa Base](https://huggingface.co/ikim-uk-essen/geberta-base) | 65.81 ± 3.29 | 52.45 ± 0.57 | ### GermEval 2018 - Coarse Grained | Model Name | Avg. Development F1-Score | Avg. Test F1-Score | | ----------------------------------------------------------------------------------- | ------------------------- | ------------------ | | [GBERT Base](https://huggingface.co/deepset/gbert-base) | 83.15 ± 1.83 | 76.39 ± 0.64 | | [GERTuraX-1](https://huggingface.co/gerturax/gerturax-1) (147GB) | 83.72 ± 0.68 | 77.11 ± 0.59 | | [GERTuraX-2](https://huggingface.co/gerturax/gerturax-2) (486GB) | 84.51 ± 0.88 | 78.07 ± 0.91 | | [GERTuraX-3](https://huggingface.co/gerturax/gerturax-3) (1.1TB) | 84.33 ± 1.48 | 78.44 ± 0.74 | | [GeBERTa Base](https://huggingface.co/ikim-uk-essen/geberta-base) | 83.54 ± 1.27 | 78.36 ± 0.79 | ## CoNLL-2003 - German, Revised | Model Name | Avg. Development F1-Score | Avg. Test F1-Score | | ----------------------------------------------------------------------------------- | ------------------------- | ------------------ | | [GBERT Base](https://huggingface.co/deepset/gbert-base) | 92.15 ± 0.10 | 88.73 ± 0.21 | | [GERTuraX-1](https://huggingface.co/gerturax/gerturax-1) (147GB) | 92.32 ± 0.14 | 90.09 ± 0.12 | | [GERTuraX-2](https://huggingface.co/gerturax/gerturax-2) (486GB) | 92.75 ± 0.20 | 90.15 ± 0.14 | | [GERTuraX-3](https://huggingface.co/gerturax/gerturax-3) (1.1TB) | 92.77 ± 0.28 | 90.83 ± 0.16 | | [GeBERTa Base](https://huggingface.co/ikim-uk-essen/geberta-base) | 92.87 ± 0.21 | 90.94 ± 0.24 | ## ScandEval We use v12.10.5 of [ScandEval](https://github.com/ScandEval/ScandEval) to evaluate on the following tasks: * SB10k * ScaLA-De * GermanQuAD The package can be installed via: ```bash $ pip3 install "scandeval[all]==12.10.5" ``` ### Results #### SB10k Evaluations on the SB10k dataset can be started like: ```bash $ scandeval --model "deepset/gbert-base" --task sentiment-classification --language de $ scandeval --model "ikim-uk-essen/geberta-base" --task sentiment-classification --language de $ scandeval --model "gerturax/gerturax-1" --task sentiment-classification --language de $ scandeval --model "gerturax/gerturax-2" --task sentiment-classification --language de $ scandeval --model "gerturax/gerturax-3" --task sentiment-classification --language de ``` | Model Name | Matthew's CC | Macro F1-Score | | ----------------------------------------------------------------------------------- | ------------------------- | ------------------ | | [GBERT Base](https://huggingface.co/deepset/gbert-base) | 59.58 ± 1.80 | 72.98 ± 1.20 | | [GERTuraX-1](https://huggingface.co/gerturax/gerturax-1) (147GB) | 61.56 ± 2.58 | 74.18 ± 1.77 | | [GERTuraX-2](https://huggingface.co/gerturax/gerturax-2) (486GB) | 65.24 ± 1.77 | 76.55 ± 1.22 | | [GERTuraX-3](https://huggingface.co/gerturax/gerturax-3) (1.1TB) | 64.33 ± 2.17 | 75.99 ± 1.40 | | [GeBERTa Base](https://huggingface.co/ikim-uk-essen/geberta-base) | 59.52 ± 2.14 | 72.76 ± 1.50 | #### ScaLA-De Evaluations on the ScaLA-De dataset can be started like: ```bash $ scandeval --model "deepset/gbert-base" --task linguistic-acceptability --language de $ scandeval --model "ikim-uk-essen/geberta-base" --task linguistic-acceptability --language de $ scandeval --model "gerturax/gerturax-1" --task linguistic-acceptability --language de $ scandeval --model "gerturax/gerturax-2" --task linguistic-acceptability --language de $ scandeval --model "gerturax/gerturax-3" --task linguistic-acceptability --language de ``` | Model Name | Matthew's CC | Macro F1-Score | | ----------------------------------------------------------------------------------- | ------------------------- | ------------------ | | [GBERT Base](https://huggingface.co/deepset/gbert-base) | 52.23 ± 4.34 | 73.90 ± 2.68 | | [GERTuraX-1](https://huggingface.co/gerturax/gerturax-1) (147GB) | 74.55 ± 1.28 | 86.88 ± 0.75 | | [GERTuraX-2](https://huggingface.co/gerturax/gerturax-2) (486GB) | 75.83 ± 2.85 | 87.59 ± 1.57 | | [GERTuraX-3](https://huggingface.co/gerturax/gerturax-3) (1.1TB) | 78.24 ± 1.25 | 88.83 ± 0.63 | | [GeBERTa Base](https://huggingface.co/ikim-uk-essen/geberta-base) | 59.70 ± 11.64 | 78.44 ± 6.12 | #### GermanQuAD ```bash $ scandeval --model "deepset/gbert-base" --task question-answering --language de $ scandeval --model "ikim-uk-essen/geberta-base" --task question-answering --language de $ scandeval --model "gerturax/gerturax-1" --task question-answering --language de $ scandeval --model "gerturax/gerturax-2" --task question-answering --language de $ scandeval --model "gerturax/gerturax-3" --task question-answering --language de ``` | Model Name | Em | F1-Score | | ----------------------------------------------------------------------------------- | ------------------------- | ------------------ | | [GBERT Base](https://huggingface.co/deepset/gbert-base) | 12.62 ± 2.20 | 29.62 ± 3.86 | | [GERTuraX-1](https://huggingface.co/gerturax/gerturax-1) (147GB) | 27.24 ± 1.05 | 52.01 ± 1.10 | | [GERTuraX-2](https://huggingface.co/gerturax/gerturax-2) (486GB) | 29.54 ± 1.05 | 55.12 ± 0.92 | | [GERTuraX-3](https://huggingface.co/gerturax/gerturax-3) (1.1TB) | 28.49 ± 1.21 | 54.83 ± 1.26 | | [GeBERTa Base](https://huggingface.co/ikim-uk-essen/geberta-base) | 28.81 ± 1.77 | 53.27 ± 1.92 | # ❤️ Acknowledgements GERTuraX is the outcome of the last 12 months of working with TPUs from the awesome [TRC program](https://sites.research.google/trc/about/) and the [TensorFlow Model Garden](https://github.com/tensorflow/models) library. Many thanks for providing TPUs! Made from Bavarian Oberland with ❤️ and 🥨.