Jellywibble commited on
Commit
51c1150
·
1 Parent(s): 4617606

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -1
README.md CHANGED
@@ -67,4 +67,5 @@ The original dataset contains over 50 million rows of completions (chatbot respo
67
  </figure>
68
 
69
  ### Training procedure
70
- The `gpt2_base_retry_and_continue_5m_reward_model` was trained using a [gpt2](https://huggingface.co/gpt2) base model and a classification head with single output. Binary Cross Entropy loss was used. The model was trained on 4xA40 GPUs, 16 per device batch size and gradient accumulation of 1 (therefore the effective batch size is 64), with 1e-5 learning rate for 2 epochs for a total of 156,240 steps. Tensor parallelism and pipeline parallelism were used to distribute the model across GPUs.
 
 
67
  </figure>
68
 
69
  ### Training procedure
70
+ The `gpt2_base_retry_and_continue_5m_reward_model` was trained using a [gpt2](https://huggingface.co/gpt2) base model and a classification head with single output. Binary Cross Entropy loss was used. The model was trained on 4xA40 GPUs, 16 per device batch size and gradient accumulation of 1 (therefore the effective batch size is 64), with 1e-5 learning rate for 2 epochs for a total of 156,240 steps. Tensor parallelism and pipeline parallelism were used to distribute the model across GPUs.
71
+ [Weights and Biases Log](https://wandb.ai/jellywibble/reward)