AventIQ-AI
/

MarianMT-Text-Translation-AI-Model-en-fr

Safetensors

marian

Model card Files Files and versions

xet

Community

vishal1364 commited on May 27

Commit

0a3e976

verified ·

1 Parent(s): 49aa0d8

Create README.md

Browse files

Files changed (1) hide show

README.md +102 -0

README.md ADDED Viewed

	@@ -0,0 +1,102 @@

+# 🧠 MarianMT-Text-Translation-AI-Model-"en-fr"
+A **sequence-to-sequence translation model** fine-tuned on English–French sentence pairs. This model translates English text into French and is built using the Hugging Face `MarianMTModel`. It’s ideal for general-purpose translation, educational use, and light regulatory or formal communication tasks between English and French.
+---
+## ✨ Model Highlights
+- 📌 Based on [`Helsinki-NLP/opus-mt-en-fr`](https://huggingface.co/Helsinki-NLP/opus-mt-en-fr)
+- 🔍 Fine-tuned on a cleaned parallel corpus of English-French sentence pairs
+- ⚡ Translates from **English → French**
+- 🧠 Built using **Hugging Face Transformers** and **PyTorch**
+---
+## 🧠 Intended Uses
+- ✅ Translating English feedback, emails, or documents into French
+- ✅ Cross-lingual support for customer service or regulatory communication
+- ✅ Educational platforms and language learning
+---
+## 🚫 Limitations
+- ❌ Not suitable for informal slang or code-mixed inputs
+- 📏 Inputs longer than 128 tokens will be truncated
+- 🤔 May produce less accurate translations for highly specialized or domain-specific language
+- ⚠️ Not intended for legal, medical, or safety-critical translations without expert review
+---
+## 🏋️‍♂️ Training Details
+| Attribute          | Value                              |
+|--------------------|----------------------------------|
+| Base Model         | `Helsinki-NLP/opus-mt-en-fr`     |
+| Dataset            | Parallel English-French corpus   |
+| Task Type          | Translation                      |
+| Max Token Length   | 128                              |
+| Epochs             | 3                                |
+| Batch Size         | 16                               |
+| Optimizer          | AdamW                            |
+| Loss Function      | CrossEntropyLoss                 |
+| Framework          | PyTorch + Transformers           |
+| Hardware           | CUDA-enabled GPU                 |
+---
+## 📊 Evaluation Metrics
+| Metric     | Score   |
+|------------|---------|
+| BLEU Score | 27.82   |
+---
+## 🔎 Output Details
+- Input: English text string
+- Output: Translated French text string
+---
+## 🚀 Usage
+```python
+from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
+import torch
+model_name = "AventIQ-AI/MarianMT-Text-Translation-AI-Model-en-fr"
+tokenizer = AutoTokenizer.from_pretrained(model_name)
+model = AutoModelForSeq2SeqLM.from_pretrained(model_name)
+model.eval()
+def translate(text):
+    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
+    finetuned_model.to(device)
+    inputs = tokenizer(text, return_tensors="pt", padding=True, truncation=True).to(device)
+    outputs = finetuned_model.generate(**inputs)
+    return tokenizer.decode(outputs[0], skip_special_tokens=True)
+# Example
+print(translate("Hello, how are you?"))
+```
+---
+## 📁 Repository Structure
+```
+finetuned-model/
+├── config.json               ✅ Model architecture & config
+├── pytorch_model.bin         ✅ Model weights
+├── tokenizer_config.json     ✅ Tokenizer settings
+├── tokenizer.json            ✅ Tokenizer vocabulary (JSON format)
+├── source.spm                ✅ SentencePiece model for source language
+├── target.spm                ✅ SentencePiece model for target language
+├── special_tokens_map.json   ✅ Special tokens mapping
+├── generation_config.json    ✅ (Optional) Generation defaults
+├── README.md                 ✅ Model card
+```
+## 🤝 Contributing
+Contributions are welcome! Feel free to open an issue or pull request to improve the model, training scripts, or documentation.