Bengali Colloquial Translator Fine-tuned model for English to Bengali (colloquial) translation
Model Details Model Description This model is fine-tuned on a Bengali colloquial language dataset to translate English text into Bengali written in phonetic English (Banglish). It helps in understanding informal Bengali conversations from social media, YouTube comments, and daily conversations.
Developed by: Bugs Bunnies team Shared by: Akshayana26 Model type: Seq2Seq Translation Model Language(s) (NLP): English β Bengali (Colloquial) Fine-tuned from model: unsloth/TinyLlama/TinyLlama-1.1B-Chat-v1.0
Uses Direct Use This model is useful for machine translation of informal Bengali text in digital communication spaces, such as: β Social media comment translation β Chatbot and virtual assistant development β Informal speech-to-text processing
Out-of-Scope Use β Not suitable for formal Bengali translation β Not designed for literary or technical Bengali translations β May not handle complex sentence structures or dialect variations
Bias, Risks, and Limitations While the model is trained on informal language data, it may still have biases due to the following:
Bias from social media text (slang, informal speech may not always be representative) Limitations in formal Bengali translation Spelling inconsistencies in phonetic Banglish Recommendations Use in informal settings rather than formal translations Expand the dataset to include more diverse sentence structures Monitor model outputs for potential biases in colloquial expressions How to Get Started with the Model
Training Details Training Data Dataset Name: Bengali_bhasinifinal_Dataset Size: 8,297 sentences Sources: YouTube comments Social media posts Conversational datasets Wikipedia (extracted informal content)
Training Procedure Preprocessing Normalized text (removed extra spaces, handled special characters) Removed non-relevant characters like \x96, \x92 Ensured balanced dataset for different sentence types
Training Hyperparameters Batch Size: 4 Epochs: 15 Learning Rate: 5e-5 Optimizer: AdamW (torch) LoRA Rank (r): 32 Gradient Accumulation Steps: 8 Warmup Steps: 200 Evaluation Steps: 250 Mixed Precision: FP16 Speeds, Sizes, Times Training Steps: 2715 step Checkpoint Size: Approx. 2GB Total Training Time: Varied based on hardware( in google colab2 hrs 5 mins)
Evaluation Testing Data, Factors & Metrics Testing Data A subset of colloquial Bengali-English sentences was used Evaluation done on 1,000+ informal sentences Factors
Results β Training Loss: 4.558550 (final) β Validation Loss: 15.905164 (saturated)
Model Examination This model was evaluated for colloquial translation accuracy, and future improvements could include handling more dialect variations.
Hardware Type: GPU T4 in google colab Training Hours: Approx. 2 hrs
Model Architecture and Objective Base Model: unsloth/TinyLlama/TinyLlama-1.1B-Chat-v1.0 Objective: Seq2Seq Translation (English β Bengali Colloquial) Compute Infrastructure Hardware: GPU T4 Software: Python, Hugging Face Transformers, Unsloth
Citation If you use this model, please cite it: @misc{bengali_colloquial_translator, author = {Akshayana26}, title = {Bengali Colloquial Translator}, year = {2025}, howpublished = {\url{https://huggingface.co/Akshayana26/bengali-colloquial-translator}}, }
Glossary Colloquial Bengali: Informal, spoken Bengali often mixed with English Phonetic English (Banglish): Writing Bengali words using English letters
Model tree for Akshayana26/trial2-bunnies
Base model
unsloth/tinyllama