Results

This model is a fine-tuned version of facebook/mbart-large-50-many-to-many-mmt on an unknown dataset.

Model description

This model is fine-tuned to translate text from English to Telugu. It is based on the mBART architecture, which is a multilingual sequence-to-sequence model pre-trained on a large corpus of text in multiple languages. The model has been fine-tuned on a dataset containing English sentences and their corresponding Telugu translations.

Intended uses & limitations

Intended usees:

Translating English text to Telugu.
Assisting in language learning and translation tasks.
Enhancing multilingual applications and services.

Limitations:

The model may not perform well on domain-specific or highly technical text.
The quality of translation may vary depending on the context and complexity of the input text.

Training and evaluation data

Training data The training data consists of a custom dataset containing English sentences and their corresponding Telugu translations. The dataset includes a variety of common phrases and sentences used in everyday conversations.

Evaluation data The evaluation data is a subset of the training data, split into training and testing sets. The evaluation data is used to assess the model's performance on unseen examples.

Training procedure

Data Preparation Dataset: The dataset used for training consists of English sentences and their corresponding Telugu translations. This dataset includes a variety of common phrases and sentences used in everyday conversations. Splitting the Data: The dataset is split into training and testing sets. Typically, 80% of the data is used for training, and 20% is used for testing.
Model and Tokenizer Initialization Model: The mBART model (facebook/mbart-large-50-many-to-many-mmt) is used as the base model. This model is pre-trained on a large corpus of text in multiple languages and is well-suited for multilingual translation tasks. Tokenizer: The mBART tokenizer (MBart50TokenizerFast) is used to tokenize the input and output text. The tokenizer is configured to use English (en_XX) as the source language and Telugu (te_IN) as the target language.
Data Tokenization Tokenization: The input English sentences and the target Telugu sentences are tokenized using the mBART tokenizer. The tokenized inputs are padded and truncated to a maximum length of 128 tokens to ensure uniformity. Labels: The tokenized target sentences are used as labels for the model. These labels are also padded and truncated to a maximum length of 128 tokens.
Training Arguments Learning Rate: A learning rate of 2e-5 is used for training. Batch Size: The batch size is set to 4 for both training and evaluation to manage GPU memory usage. Gradient Accumulation: Gradient accumulation steps are set to 4, allowing the model to simulate a larger batch size by accumulating gradients over several smaller batches. Weight Decay: A weight decay of 0.01 is applied to prevent overfitting. Number of Epochs: The model is trained for 3 epochs.
Training the Model Trainer Initialization: The Seq2SeqTrainer class from the transformers library is used to handle the training process. The trainer is initialized with the model, training arguments, training dataset, evaluation dataset, and tokenizer. Training: The model is trained using the training dataset. During training, the model learns to map English sentences to their corresponding Telugu translations.
Evaluation Evaluation Metrics: The model is evaluated on the test set using metrics such as BLEU score and accuracy. These metrics help assess the model's performance on unseen examples. Evaluation Process: The evaluation process involves generating translations for the test set and comparing them with the actual translations to calculate the evaluation metrics.
Saving and Pushing the Model Saving the Model: After training, the model is saved using the trainer.save_model() method. Pushing to Hub: The trained model is pushed to the Hugging Face Hub using the trainer.push_to_hub() method, making it accessible for others to use.

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 2e-05
train_batch_size: 4
eval_batch_size: 4
seed: 42
gradient_accumulation_steps: 4
total_train_batch_size: 16
optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: linear
num_epochs: 3
mixed_precision_training: Native AMP

Framework versions

Transformers 4.48.3
Pytorch 2.5.1+cu124
Tokenizers 0.21.0

VudhanthiNeeraja
/

results