Visualize in Weights & Biases

qwen2.5-0.5b-expo-DPO-25-2

This model is a fine-tuned version of hZzy/qwen2.5-0.5b-sft-25-2 on the hZzy/train_pairwise_all_new dataset. It achieves the following results on the evaluation set:

  • Loss: 0.6575
  • Logps: -92.0019
  • Logits: -1.5710
  • Objective: 0.6607
  • Dpo Loss: 0.6607
  • Ranking Simple: 0.5576

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-06
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 3
  • gradient_accumulation_steps: 12
  • total_train_batch_size: 144
  • total_eval_batch_size: 12
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Logps Logits Objective Dpo Loss Ranking Simple
0.6788 0.0719 50 0.6864 -101.0843 -1.4425 0.6873 0.6873 0.5170
0.6396 0.1438 100 0.6796 -96.6355 -1.6394 0.6831 0.6831 0.5284
0.6428 0.2157 150 0.6731 -90.4612 -1.3249 0.6741 0.6741 0.5422
0.6108 0.2876 200 0.6792 -85.5977 -1.4063 0.6795 0.6795 0.5468
0.5856 0.3595 250 0.6623 -87.8055 -1.3858 0.6615 0.6615 0.5459
0.5701 0.4314 300 0.6633 -88.7819 -1.4784 0.6662 0.6662 0.5449
0.5296 0.5034 350 0.6642 -89.5468 -1.5418 0.6664 0.6664 0.5457
0.5055 0.5753 400 0.6579 -89.5613 -1.4127 0.6593 0.6593 0.5535
0.5493 0.6472 450 0.6599 -90.8471 -1.5076 0.6628 0.6628 0.5557
0.5056 0.7191 500 0.6561 -91.5462 -1.4783 0.6591 0.6591 0.5541
0.475 0.7910 550 0.6573 -90.8545 -1.5168 0.6599 0.6599 0.5581
0.4881 0.8629 600 0.6582 -91.8983 -1.5757 0.6610 0.6610 0.5565
0.4496 0.9348 650 0.6576 -91.9025 -1.5710 0.6608 0.6608 0.5573

Framework versions

  • Transformers 4.42.0
  • Pytorch 2.3.0+cu121
  • Datasets 3.2.0
  • Tokenizers 0.19.1
Downloads last month
9
Safetensors
Model size
494M params
Tensor type
F32
·
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no library tag.

Model tree for hZzy/qwen2.5-0.5b-expo-DPO-25-2

Base model

Qwen/Qwen2.5-0.5B
Finetuned
(5)
this model

Dataset used to train hZzy/qwen2.5-0.5b-expo-DPO-25-2