Description
This tokenizer is designed for Moroccan Darija, a dialectal variety of Arabic (ISO code: ary
).
It has been trained using the Byte Pair Encoding (BPE) algorithm on the dataset: atlasia/AL-Atlas-Moroccan-Darija-Pretraining-Dataset.
Features
- Tokenizes Moroccan Darija text efficiently, see Moroccan darija leaderboard.
- Provides robust handling of dialectal variations and specific features of Moroccan Darija.
Inference Providers
NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API:
The model has no pipeline_tag.