AventIQ-AI
/

Ai-Translate-Model-Eng-German

Safetensors

marian

Model card Files Files and versions

xet

Community

AmanSengar commited on May 28

Commit

21afc66

verified ·

1 Parent(s): 0869a24

Create README.md

Browse files

Files changed (1) hide show

README.md +108 -0

README.md ADDED Viewed

	@@ -0,0 +1,108 @@

+# 🧠 MarianMT-Text-Translation-AI-Model-"en-de"
+A sequence-to-sequence translation model fine-tuned on English–German sentence pairs. This model translates English text into German and is built using the Hugging Face MarianMTModel. It’s suitable for general-purpose translation, language learning, and formal or semi-formal communication across English and German.
+---
+## ✨ Model Highlights
+- 📌 Base Model: Helsinki-NLP/opus-mt-en-de
+- 📚 Fine-tuned on a cleaned and tokenized parallel English-German dataset
+- 🌍 Direction: English → German
+- 🔧 Framework: Hugging Face Transformers + PyTorch
+---
+## 🧠 Intended Uses
+- ✅ Translating English content (emails, documentation, support text) into German
+- ✅ Use in educational platforms for learning German
+- ✅ Supporting cross-lingual customer service, product documentation, or semi-formal communications
+---
+## 🚫 Limitations
+- ❌ Not optimized for informal, idiomatic, or slang expressions
+- ❌ Not ideal for legal, medical, or sensitive content translation
+- 📏 Sentences longer than 128 tokens are truncated
+- ⚠️ Domain-specific accuracy may vary (e.g., legal, technical)
+---
+## 🏋️‍♂️ Training Details
+| Attribute          | Value                            |
+|--------------------|----------------------------------|
+| Base Model         | `Helsinki-NLP/opus-mt-en-de`     |
+| Dataset            |  WMT14 English-German            |
+| Task Type          | Translation                      |
+| Max Token Length   | 128                              |
+| Epochs             | 3                                |
+| Batch Size         | 16                               |
+| Optimizer          | AdamW                            |
+| Loss Function      | CrossEntropyLoss                 |
+| Framework          | PyTorch + Transformers           |
+| Hardware           | CUDA-enabled GPU                 |
+---
+## 📊 Evaluation Metrics
+| Metric     | Score   |
+|------------|---------|
+| BLEU Score | 30.42   |
+---
+## 🔎 Output Details
+- Input: English text string
+- Output: Translated German text string
+---
+## 🚀 Usage
+```python
+from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
+import torch
+model_name = "AventIQ-AI/Ai-Translate-Model-Eng-German"
+tokenizer = AutoTokenizer.from_pretrained(model_name)
+model = AutoModelForSeq2SeqLM.from_pretrained(model_name)
+model.eval()
+def translate(text):
+    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
+    model.to(device)
+    inputs = tokenizer(text, return_tensors="pt", padding=True, truncation=True).to(device)
+    outputs = model.generate(**inputs)
+    return tokenizer.decode(outputs[0], skip_special_tokens=True)
+# Example
+print(translate("How are you doing today?"))
+```
+---
+## 📁 Repository Structure
+```
+finetuned-model/
+├── config.json               ✅ Model architecture & config
+├── pytorch_model.bin         ✅ Model weights
+├── tokenizer_config.json     ✅ Tokenizer settings
+├── tokenizer.json            ✅ Tokenizer vocabulary (JSON format)
+├── source.spm                ✅ SentencePiece model for source language
+├── target.spm                ✅ SentencePiece model for target language
+├── special_tokens_map.json   ✅ Special tokens mapping
+├── generation_config.json    ✅ (Optional) Generation defaults
+├── README.md                 ✅ Model card
+```
+## 🤝 Contributing
+Contributions are welcome! Feel free to open an issue or pull request to improve the model, training scripts, or documentation.