Zacktree's picture
Model save
f6c5302 verified
|
raw
history blame
5.37 kB
metadata
library_name: peft
license: gemma
base_model: google/codegemma-7b
tags:
  - trl
  - sft
  - generated_from_trainer
model-index:
  - name: code-bench-CodeGemma-7B-cg-nv9n
    results: []

code-bench-CodeGemma-7B-cg-nv9n

This model is a fine-tuned version of google/codegemma-7b on the None dataset. It achieves the following results on the evaluation set:

  • Loss: 0.0676

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 1
  • eval_batch_size: 3
  • seed: 42
  • distributed_type: multi-GPU
  • gradient_accumulation_steps: 8
  • total_train_batch_size: 8
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.03
  • num_epochs: 4
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss
0.623 0.0530 50 0.5961
0.473 0.1061 100 0.4669
0.4018 0.1591 150 0.3573
0.3249 0.2121 200 0.2696
0.2703 0.2652 250 0.2265
0.2066 0.3182 300 0.1859
0.1725 0.3713 350 0.1526
0.1588 0.4243 400 0.1341
0.1467 0.4773 450 0.1278
0.1349 0.5304 500 0.1229
0.1541 0.5834 550 0.1170
0.1262 0.6364 600 0.1137
0.1273 0.6895 650 0.1118
0.1322 0.7425 700 0.1091
0.1244 0.7955 750 0.1069
0.1134 0.8486 800 0.1043
0.1206 0.9016 850 0.1030
0.117 0.9547 900 0.1016
0.101 1.0077 950 0.1003
0.1095 1.0607 1000 0.0999
0.0989 1.1138 1050 0.0991
0.1054 1.1668 1100 0.0972
0.1073 1.2198 1150 0.0964
0.1057 1.2729 1200 0.0951
0.111 1.3259 1250 0.0953
0.0946 1.3789 1300 0.0935
0.0907 1.4320 1350 0.0926
0.0989 1.4850 1400 0.0919
0.1037 1.5381 1450 0.0905
0.0955 1.5911 1500 0.0899
0.0893 1.6441 1550 0.0887
0.1015 1.6972 1600 0.0881
0.0952 1.7502 1650 0.0874
0.0918 1.8032 1700 0.0868
0.0926 1.8563 1750 0.0860
0.0886 1.9093 1800 0.0852
0.0989 1.9623 1850 0.0841
0.0863 2.0154 1900 0.0839
0.0821 2.0684 1950 0.0836
0.0964 2.1215 2000 0.0830
0.0755 2.1745 2050 0.0824
0.0777 2.2275 2100 0.0817
0.0761 2.2806 2150 0.0807
0.0777 2.3336 2200 0.0802
0.0857 2.3866 2250 0.0795
0.0883 2.4397 2300 0.0793
0.0784 2.4927 2350 0.0785
0.0774 2.5457 2400 0.0779
0.0776 2.5988 2450 0.0772
0.0788 2.6518 2500 0.0770
0.0853 2.7064 2550 0.0768
0.0836 2.7595 2600 0.0764
0.0822 2.8125 2650 0.0758
0.0862 2.8656 2700 0.0755
0.0753 2.9186 2750 0.0750
0.0798 2.9716 2800 0.0744
0.0762 3.0247 2850 0.0741
0.0884 3.0777 2900 0.0736
0.0753 3.1307 2950 0.0731
0.0774 3.1838 3000 0.0727
0.0753 3.2368 3050 0.0725
0.0853 3.2898 3100 0.0723
0.0723 3.3429 3150 0.0718
0.0762 3.3959 3200 0.0713
0.0737 3.4490 3250 0.0712
0.0751 3.5020 3300 0.0705
0.0737 3.5550 3350 0.0700
0.069 3.6081 3400 0.0701
0.0696 3.6611 3450 0.0697
0.0725 3.7141 3500 0.0692
0.074 3.7672 3550 0.0686
0.0655 3.8202 3600 0.0684
0.0671 3.8732 3650 0.0679
0.0642 3.9263 3700 0.0676
0.07 3.9793 3750 0.0676

Framework versions

  • PEFT 0.12.0
  • Transformers 4.44.2
  • Pytorch 2.5.1+cu121
  • Datasets 2.21.0
  • Tokenizers 0.19.1