phucnn
/

mt5-full-v6.1

text2text-generation

Generated from Trainer

Model card Files Files and versions Community

mt5-full-v6.1 / README.md

phucnn's picture

End of training

2ea33be verified over 1 year ago

|

history blame contribute delete

2.93 kB

	---
	language:
	- vie
	- lao
	license: apache-2.0
	base_model: google/mt5-xl
	tags:
	- generated_from_trainer
	metrics:
	- bleu
	model-index:
	- name: mt5-full-v6.1
	results: []
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	# mt5-full-v6.1

	This model is a fine-tuned version of [google/mt5-xl](https://huggingface.co/google/mt5-xl) on an unknown dataset.
	It achieves the following results on the evaluation set:
	- Loss: 0.8711
	- Bleu: 19.5743
	- Gen Len: 41.77

	## Model description

	More information needed

	## Intended uses & limitations

	More information needed

	## Training and evaluation data

	More information needed

	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 5e-06
	- train_batch_size: 2
	- eval_batch_size: 2
	- seed: 3435
	- gradient_accumulation_steps: 2
	- total_train_batch_size: 4
	- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
	- lr_scheduler_type: cosine
	- lr_scheduler_warmup_ratio: 0.1
	- num_epochs: 3

	### Training results

	\| Training Loss \| Epoch \| Step \| Bleu \| Gen Len \| Validation Loss \|
	\|:-------------:\|:-----:\|:------:\|:------:\|:-------:\|:---------------:\|
	\| 1.5671 \| 0.14 \| 5000 \| 6.5196 \| 18.9699 \| 1.1691 \|
	\| 1.2277 \| 0.28 \| 10000 \| 7.082 \| 18.9724 \| 1.0592 \|
	\| 1.1316 \| 0.42 \| 15000 \| 7.3283 \| 18.9825 \| 1.0112 \|
	\| 1.0833 \| 0.56 \| 20000 \| 7.4462 \| 18.977 \| 0.9728 \|
	\| 1.0339 \| 0.7 \| 25000 \| 8.0126 \| 18.982 \| 0.9546 \|
	\| 1.025 \| 0.83 \| 30000 \| 7.7648 \| 18.9805 \| 0.9337 \|
	\| 0.9733 \| 0.97 \| 35000 \| 7.9496 \| 18.9815 \| 0.9228 \|
	\| 0.9035 \| 1.11 \| 40000 \| 7.689 \| 18.9795 \| 0.9162 \|
	\| 0.9386 \| 1.25 \| 45000 \| 7.6781 \| 18.9825 \| 0.9039 \|
	\| 0.9073 \| 1.39 \| 50000 \| 7.8607 \| 18.9805 \| 0.8986 \|
	\| 0.8928 \| 1.53 \| 55000 \| 8.0666 \| 18.981 \| 0.8942 \|
	\| 0.884 \| 1.67 \| 60000 \| 8.1679 \| 18.9785 \| 0.8874 \|
	\| 0.8786 \| 1.81 \| 65000 \| 7.8516 \| 18.9805 \| 0.8831 \|
	\| 0.8899 \| 1.95 \| 70000 \| 0.8789 \| 7.9392 \| 18.9785 \|
	\| 0.8638 \| 2.09 \| 75000 \| 0.8781 \| 8.1623 \| 18.979 \|
	\| 0.8293 \| 2.22 \| 80000 \| 0.8752 \| 8.0989 \| 18.98 \|
	\| 0.8625 \| 2.36 \| 85000 \| 0.8743 \| 8.176 \| 18.979 \|
	\| 0.8605 \| 2.5 \| 90000 \| 0.8721 \| 8.0117 \| 18.9805 \|
	\| 0.8479 \| 2.64 \| 95000 \| 0.8711 \| 8.1008 \| 18.978 \|
	\| 0.8391 \| 2.78 \| 100000 \| 0.8708 \| 8.2041 \| 18.9795 \|
	\| 0.8649 \| 2.92 \| 105000 \| 0.8710 \| 8.1488 \| 18.9785 \|


	### Framework versions

	- Transformers 4.37.1
	- Pytorch 2.1.2+cu121
	- Datasets 2.16.1
	- Tokenizers 0.15.1