Moroccan Darija Embeddings
Collection
3 items
•
Updated
•
1
This repository contains word embedding models trained for Moroccan Darija, a widely spoken Arabic dialect in Morocco. Currently, it includes FastText-based embeddings trained on the curated Al Atlas dataset composed of Moroccan Darija text.
Clone the Github repository and install the required dependencies:
git clone https://github.com/BounharAbdelaziz/Moroccan-Darija-Embedding.git
cd Moroccan-Darija-Embedding
pip install -r requirements.txt
You can load the trained FastText model using gensim
:
import fasttext
model = fasttext.load_model("fasttext_cbow_v0.bin") # download the models from the hub https://huggingface.co/atlasia/Moroccan-Darija-Embedding
word_vector = model.get_word_vector("كلمة")
Contributions are welcome! Feel free to open issues or submit pull requests to improve the models and codebase.