efederici commited on
Commit
83eeb95
·
verified ·
1 Parent(s): de89191

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +0 -138
README.md CHANGED
@@ -11,144 +11,6 @@ model-index:
11
  <!-- This model card has been generated automatically according to the information the Trainer had access to. You
12
  should probably proofread and complete it, then remove this comment. -->
13
 
14
- [<img src="https://raw.githubusercontent.com/OpenAccess-AI-Collective/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/OpenAccess-AI-Collective/axolotl)
15
- <details><summary>See axolotl config</summary>
16
-
17
- axolotl version: `0.4.0`
18
- ```yaml
19
- base_model: meta-llama/Meta-Llama-3-8B
20
- model_type: LlamaForCausalLM
21
- tokenizer_type: AutoTokenizer
22
-
23
- load_in_8bit: false
24
- load_in_4bit: false
25
- strict: false
26
-
27
- datasets:
28
- - path: efederici/alpha_v3.1
29
- type: sharegpt
30
- conversation: chatml
31
-
32
- dataset_prepared_path: ./alpha_v3
33
- val_set_size: 0.0002
34
- output_dir: ./llama3
35
-
36
- sequence_len: 8192
37
- sample_packing: true
38
- pad_to_sequence_len: true
39
-
40
- wandb_project: maestrale
41
- wandb_entity: mii-llm
42
- wandb_watch:
43
- wandb_name: maestrale_llama3
44
- wandb_log_model:
45
-
46
- gradient_accumulation_steps: 8
47
- micro_batch_size: 2
48
- num_epochs: 3
49
-
50
- optimizer: adamw_8bit
51
- lr_scheduler: cosine
52
- learning_rate: 2e-5
53
-
54
- train_on_inputs: false
55
- group_by_length: false
56
-
57
- bf16: true
58
- fp16: false
59
- tf32: false
60
-
61
- gradient_checkpointing: true
62
- gradient_checkpointing_kwargs:
63
- use_reentrant: false
64
-
65
- early_stopping_patience:
66
- resume_from_checkpoint:
67
- logging_steps: 1
68
- xformers_attention:
69
- flash_attention: true
70
-
71
- warmup_steps: 550
72
-
73
- save_safetensors: true
74
- ddp_timeout: 14400
75
-
76
- evals_per_epoch: 4
77
- eval_sample_packing: False
78
- eval_table_size:
79
- saves_per_epoch: 5
80
- save_total_limit: 3
81
-
82
- debug:
83
- deepspeed: deepspeed_configs/zero3_bf16.json
84
- weight_decay: 0.05
85
- fsdp:
86
- fsdp_config:
87
- special_tokens:
88
- eos_token: "<|im_end|>"
89
- pad_token: "<|end_of_text|>"
90
- tokens:
91
- - "<|im_start|>"
92
- - "<|im_end|>"
93
-
94
- ```
95
-
96
- </details><br>
97
-
98
- [<img src="https://raw.githubusercontent.com/wandb/assets/main/wandb-github-badge-28.svg" alt="Visualize in Weights & Biases" width="200" height="32"/>](https://wandb.ai/mii-llm/maestrale/runs/zdghwbfe)
99
- # llama3
100
-
101
- This model is a fine-tuned version of [meta-llama/Meta-Llama-3-8B](https://huggingface.co/meta-llama/Meta-Llama-3-8B) on the None dataset.
102
- It achieves the following results on the evaluation set:
103
- - Loss: 0.6122
104
-
105
- ## Model description
106
-
107
- More information needed
108
-
109
- ## Intended uses & limitations
110
-
111
- More information needed
112
-
113
- ## Training and evaluation data
114
-
115
- More information needed
116
-
117
- ## Training procedure
118
-
119
- ### Training hyperparameters
120
-
121
- The following hyperparameters were used during training:
122
- - learning_rate: 2e-05
123
- - train_batch_size: 2
124
- - eval_batch_size: 2
125
- - seed: 42
126
- - distributed_type: multi-GPU
127
- - num_devices: 2
128
- - gradient_accumulation_steps: 8
129
- - total_train_batch_size: 32
130
- - total_eval_batch_size: 4
131
- - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
132
- - lr_scheduler_type: cosine
133
- - lr_scheduler_warmup_steps: 550
134
- - num_epochs: 3
135
-
136
- ### Training results
137
-
138
- | Training Loss | Epoch | Step | Validation Loss |
139
- |:-------------:|:------:|:----:|:---------------:|
140
- | 1.0734 | 0.0003 | 1 | 1.2164 |
141
- | 0.7852 | 0.2501 | 787 | 0.7175 |
142
- | 0.7538 | 0.5001 | 1574 | 0.6812 |
143
- | 0.8083 | 0.7502 | 2361 | 0.6669 |
144
- | 0.7832 | 1.0003 | 3148 | 0.6431 |
145
- | 0.5858 | 1.2312 | 3935 | 0.6371 |
146
- | 0.5811 | 1.4813 | 4722 | 0.6185 |
147
- | 0.5568 | 1.7314 | 5509 | 0.5966 |
148
- | 0.5758 | 1.9815 | 6296 | 0.5824 |
149
- | 0.3457 | 2.2124 | 7083 | 0.6227 |
150
- | 0.3379 | 2.4625 | 7870 | 0.6171 |
151
- | 0.3398 | 2.7126 | 8657 | 0.6122 |
152
 
153
 
154
  ### Framework versions
 
11
  <!-- This model card has been generated automatically according to the information the Trainer had access to. You
12
  should probably proofread and complete it, then remove this comment. -->
13
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
14
 
15
 
16
  ### Framework versions