Bengali Colloquial Translator Fine-tuned model for English to Bengali (colloquial) translation

Model Details Model Description This model is fine-tuned on a Bengali colloquial language dataset to translate English text into Bengali written in phonetic English (Banglish). It helps in understanding informal Bengali conversations from social media, YouTube comments, and daily conversations.

Developed by: Bugs Bunnies team Shared by: Akshayana26 Model type: Seq2Seq Translation Model Language(s) (NLP): English β†’ Bengali (Colloquial) Fine-tuned from model: unsloth/TinyLlama/TinyLlama-1.1B-Chat-v1.0

Uses Direct Use This model is useful for machine translation of informal Bengali text in digital communication spaces, such as: βœ… Social media comment translation βœ… Chatbot and virtual assistant development βœ… Informal speech-to-text processing

Out-of-Scope Use ❌ Not suitable for formal Bengali translation ❌ Not designed for literary or technical Bengali translations ❌ May not handle complex sentence structures or dialect variations

Bias, Risks, and Limitations While the model is trained on informal language data, it may still have biases due to the following:

Bias from social media text (slang, informal speech may not always be representative) Limitations in formal Bengali translation Spelling inconsistencies in phonetic Banglish Recommendations Use in informal settings rather than formal translations Expand the dataset to include more diverse sentence structures Monitor model outputs for potential biases in colloquial expressions How to Get Started with the Model

Training Details Training Data Dataset Name: Bengali_bhasinifinal_Dataset Size: 8,297 sentences Sources: YouTube comments Social media posts Conversational datasets Wikipedia (extracted informal content)

Training Procedure Preprocessing Normalized text (removed extra spaces, handled special characters) Removed non-relevant characters like \x96, \x92 Ensured balanced dataset for different sentence types

Training Hyperparameters Batch Size: 4 Epochs: 15 Learning Rate: 5e-5 Optimizer: AdamW (torch) LoRA Rank (r): 32 Gradient Accumulation Steps: 8 Warmup Steps: 200 Evaluation Steps: 250 Mixed Precision: FP16 Speeds, Sizes, Times Training Steps: 2715 step Checkpoint Size: Approx. 2GB Total Training Time: Varied based on hardware( in google colab2 hrs 5 mins)

Evaluation Testing Data, Factors & Metrics Testing Data A subset of colloquial Bengali-English sentences was used Evaluation done on 1,000+ informal sentences Factors

Results βœ… Training Loss: 4.558550 (final) βœ… Validation Loss: 15.905164 (saturated)

Model Examination This model was evaluated for colloquial translation accuracy, and future improvements could include handling more dialect variations.

Hardware Type: GPU T4 in google colab Training Hours: Approx. 2 hrs

Model Architecture and Objective Base Model: unsloth/TinyLlama/TinyLlama-1.1B-Chat-v1.0 Objective: Seq2Seq Translation (English β†’ Bengali Colloquial) Compute Infrastructure Hardware: GPU T4 Software: Python, Hugging Face Transformers, Unsloth

Citation If you use this model, please cite it: @misc{bengali_colloquial_translator, author = {Akshayana26}, title = {Bengali Colloquial Translator}, year = {2025}, howpublished = {\url{https://huggingface.co/Akshayana26/bengali-colloquial-translator}}, }

Glossary Colloquial Bengali: Informal, spoken Bengali often mixed with English Phonetic English (Banglish): Writing Bengali words using English letters

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.

Model tree for Akshayana26/trial2-bunnies

Base model

unsloth/tinyllama
Finetuned
(13)
this model

Dataset used to train Akshayana26/trial2-bunnies