ImranzamanML
/

GEFS-language-detector

Text Classification

Inference Endpoints

Model card Files Files and versions Metrics Training metrics Community

GEFS-language-detector / README.md

ImranzamanML's picture

Update README.md

b726268 verified 10 months ago

|

history blame contribute delete

3.1 kB

	---
	license: apache-2.0
	datasets:
	- papluca/language-identification
	language:
	- en
	- de
	- fr
	- es
	metrics:
	- precision
	- recall
	- f1
	- accuracy
	pipeline_tag: text-classification
	---
	# German, English, French and Spanish Language Detector

	The GEFS-language-detector model outperformed by achieving an impressive F1 score close to 100%. This result significantly exceeds typical benchmarks and underscores the model's accuracy and reliability in identifying languages.
	This is a fined tuned model by using the dataset of papluca [Language Identification](https://huggingface.co/datasets/papluca/language-identification#additional-information) and the base model [xlm-roberta-base](https://huggingface.co/xlm-roberta-base) .


	## Predicted output:

	Model will return the language detection in the language codes like:
	```
	- de as German
	- en as English
	- fr as French
	- es as Spanish
	```

	## Supported languages
	Currently this model support 4 languages but in future more languages will be added.

	Following languages supported by the model:
	- German (de)
	- English (en)
	- French (fr)
	- Spanish (es)

	# Use a pipeline as a high-level helper

	```python
	from transformers import pipeline

	text=["Mir gefällt die Art und Weise, Sprachen zu erkennen",
	"I like the way to detect languages",
	"Me gusta la forma de detectar idiomas",
	"J'aime la façon de détecter les langues"]
	pipe = pipeline("text-classification", model="ImranzamanML/GEFS-language-detector")
	lang_detect=pipe(text, top_k=1)
	print("The detected language is", lang_detect)
	```

	# Load model directly

	```python
	from transformers import AutoTokenizer, AutoModelForSequenceClassification

	tokenizer = AutoTokenizer.from_pretrained("ImranzamanML/GEFS-language-detector")
	model = AutoModelForSequenceClassification.from_pretrained("ImranzamanML/GEFS-language-detector")

	```

	## Model Training

	Epoch Training Loss Validation Loss
	1 0.002600 0.000148
	2 0.001000 0.000015
	3 0.000000 0.000011
	4 0.001800 0.000009
	5 0.002700 0.000016
	6 0.001600 0.000012
	7 0.001300 0.000009
	8 0.001200 0.000008
	9 0.000900 0.000007
	10 0.000900 0.000007


	## Testing Results
	```
	Language Precision Recall F1 Accuracy
	de 0.9997 0.9998 0.9998 0.9999
	en 1.0000 1.0000 1.0000 1.0000
	fr 0.9995 0.9996 0.9996 0.9996
	es 0.9994 0.9996 0.9995 0.9996
	```



	## About Author

	Name: Muhammad Imran Zaman
	Company: [Theum AG](https://theum.com/en/index.htm?t=)
	Role: Lead Machine Learning Engineer

	Professional Links:
	- Kaggle: [Profile](https://www.kaggle.com/muhammadimran112233)
	- LinkedIn: [Profile](linkedin.com/in/muhammad-imran-zaman)
	- Google Scholar: [Profile](https://scholar.google.com/citations?user=ulVFpy8AAAAJ&hl=en)
	- YouTube: [Channel](https://www.youtube.com/@consolioo)
	- GitHub: [Channel](https://github.com/Imran-ml)