Model Card: banT5-Base

Model Details

The banT5-Base model is a Bangla adaptation of the T5 (Text-To-Text Transfer Transformer) model, originally introduced by researchers at Google. T5 is a unified language model designed to frame all natural language processing (NLP) tasks as text-to-text problems. This allows the model to handle a variety of tasks by simply altering the input and output formats.

banT5-Base is specifically trained on a curated Bangla text corpus to deliver state-of-the-art performance in tasks like Named Entity Recognition (NER), Part-of-Speech (POS) tagging, Question Answering,Paraphrase Identification,etc.

Training Data

The banT5-Base model was pre-trained on a large-scale Bangla text dataset, amounting to 27 GB of raw data. After cleaning and normalization, the processed dataset increased to 36 GB. Below is an overview of the data cardinalities:

  • Total Words: 1,646,252,743 (1.65 billion)
  • Unique Words: 15,223,848 (15.23 million)
  • Total Sentences: 131,412,177 (131.4 million)
  • Total Documents: 7,670,661 (7.67 million)

Model Architecture and Training

The banT5 model was trained using the Hugging Face Transformers library, leveraging the T5ForConditionalGeneration class. The model is configured with a vocabulary size of 50,100 tokens, 12 hidden layers in both the encoder and decoder, and 768 hidden dimensions. It uses multi-head attention with 12 attention heads and an intermediate feed-forward layer size of 3,072. The training setup includes 16-bit precision (fp16) for faster computation, a maximum sequence length of 256, and a batch size of 108 per device for both training and evaluation. Optimization is performed using the AdamW optimizer with β1 = 0.9, β2 = 0.98, ε = 1e-6, and a weight decay of 0.01. A learning rate of 0.00005 is used with a warmup ratio of 10%, and gradients are accumulated over one step. Dropout is applied at a rate of 0.1 for regularization. Training spans 1,000,000 steps, with memory pinning and last-batch dropping enabled in the data loaders for efficient data handling. Relative attention mechanisms, including 32 attention buckets and a maximum distance of 128 for longer sequences, are also incorporated to handle positional information effectively.

Using this model in transformers

from transformers import T5Tokenizer, T5ForConditionalGeneration

model_name = "banglagov/banT5-Base" 
tokenizer = T5Tokenizer.from_pretrained(model_name)
model = T5ForConditionalGeneration.from_pretrained(model_name)

# Example input text
input_text = "এর ফলে আগামী বছর বেকারত্বের হার বৃদ্ধি এবং অর্থনৈতিক মন্দার আশঙ্কায় ইউরোপীয় ইউনিয়ন ।"

input_ids = tokenizer.encode(input_text, return_tensors="pt")

print("input_ids :", input_ids)

Experimental Results

The banT5 model demonstrated strong performance on downstream tasks, as summarized below:

Task Precision Recall F1
Named Entity Recognition (NER) 0.8882 0.8563 0.8686
Part-of-Speech (POS) Tagging 0.8813 0.8813 0.8791

Here we used banT5-Base model with Noisy Label architecture.

Downloads last month
173
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no library tag.