Crystalcareai commited on
Commit
aa5c5d0
·
verified ·
1 Parent(s): 333d17c

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -186
README.md CHANGED
@@ -1,186 +1 @@
1
- ---
2
- library_name: transformers
3
- license: apache-2.0
4
- base_model: arcee-ai/Arcee-Blitz
5
- tags:
6
- - axolotl
7
- - generated_from_trainer
8
- datasets:
9
- - arcee-ai/toolcalling-llmjudge-hermes-sharegpt
10
- - chargoddard/toolcalling-llmjudge-hermes-sharegpt-scrumbled
11
- - chargoddard/toolace-sharegpt
12
- model-index:
13
- - name: blitz-caller
14
- results: []
15
- ---
16
-
17
- <!-- This model card has been generated automatically according to the information the Trainer had access to. You
18
- should probably proofread and complete it, then remove this comment. -->
19
-
20
- [<img src="https://raw.githubusercontent.com/axolotl-ai-cloud/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/axolotl-ai-cloud/axolotl)
21
- <details><summary>See axolotl config</summary>
22
-
23
- axolotl version: `0.8.0.dev0`
24
- ```yaml
25
- base_model: arcee-ai/Arcee-Blitz
26
-
27
- load_in_8bit: false
28
- load_in_4bit: false
29
- strict: false
30
-
31
- plugins:
32
- - axolotl.integrations.liger.LigerPlugin
33
- liger_rope: true
34
- liger_rms_norm: true
35
- liger_glu_activation: true # Changed from liger_swiglu
36
- liger_fused_linear_cross_entropy: true
37
-
38
- datasets:
39
- - path: arcee-ai/toolcalling-llmjudge-hermes-sharegpt
40
- type: chat_template
41
- field_messages: conversations
42
- message_property_mappings: # Changed from message_field_role/content
43
- role: from
44
- content: value
45
- roles:
46
- system:
47
- - system
48
- user:
49
- - human
50
- assistant:
51
- - gpt
52
- tool:
53
- - tool
54
- - path: chargoddard/toolcalling-llmjudge-hermes-sharegpt-scrumbled
55
- type: chat_template
56
- field_messages: conversations
57
- message_property_mappings: # Changed from message_field_role/content
58
- role: from
59
- content: value
60
- roles:
61
- system:
62
- - system
63
- user:
64
- - human
65
- assistant:
66
- - gpt
67
- tool:
68
- - tool
69
- - path: chargoddard/toolace-sharegpt
70
- type: chat_template
71
- field_messages: conversations
72
- message_property_mappings: # Changed from message_field_role/content
73
- role: from
74
- content: value
75
- roles:
76
- system:
77
- - system
78
- user:
79
- - human
80
- - user
81
- assistant:
82
- - gpt
83
- - assistant
84
- tool:
85
- - tool
86
- dataset_prepared_path: /workspace/data/prepared_datasets
87
-
88
- chat_template: chatml
89
- shuffle_merged_datasets: true
90
- output_dir: blitz-caller-v1
91
-
92
- sequence_len: 8192
93
- sample_packing: true
94
- eval_sample_packing: false
95
- pad_to_sequence_len: true
96
-
97
- wandb_project: blitz-caller-v1
98
- wandb_entity:
99
- wandb_watch:
100
- wandb_name:
101
- wandb_log_model:
102
-
103
- gradient_accumulation_steps: 1
104
- micro_batch_size: 8
105
- num_epochs: 2
106
- optimizer: paged_adamw_8bit
107
- lr_scheduler: cosine
108
- learning_rate: 0.00002
109
- max_grad_norm: 3
110
-
111
- train_on_inputs: true
112
- group_by_length: false
113
- bf16: auto
114
- fp16:
115
- tf32: false
116
-
117
- gradient_checkpointing: "unsloth"
118
- early_stopping_patience:
119
- resume_from_checkpoint:
120
- local_rank:
121
- logging_steps: 1
122
- xformers_attention:
123
- flash_attention: true
124
-
125
- warmup_ratio: 0.05
126
- saves_per_epoch: 4
127
- save_safetensors: true
128
- hub_model_id: blitz-caller
129
- hub_strategy: every_save
130
- debug:
131
- deepspeed: deepspeed_configs/zero3_bf16.json
132
- weight_decay: 0.1
133
-
134
- seed: 496083530
135
- tokens:
136
- - <|im_start|>
137
- special_tokens:
138
- eos_token: <|im_end|>
139
- ```
140
-
141
- </details><br>
142
-
143
- # blitz-caller
144
-
145
- This model is a fine-tuned version of [arcee-ai/Arcee-Blitz](https://huggingface.co/arcee-ai/Arcee-Blitz) on the arcee-ai/toolcalling-llmjudge-hermes-sharegpt, the chargoddard/toolcalling-llmjudge-hermes-sharegpt-scrumbled and the chargoddard/toolace-sharegpt datasets.
146
-
147
- ## Model description
148
-
149
- More information needed
150
-
151
- ## Intended uses & limitations
152
-
153
- More information needed
154
-
155
- ## Training and evaluation data
156
-
157
- More information needed
158
-
159
- ## Training procedure
160
-
161
- ### Training hyperparameters
162
-
163
- The following hyperparameters were used during training:
164
- - learning_rate: 2e-05
165
- - train_batch_size: 8
166
- - eval_batch_size: 8
167
- - seed: 496083530
168
- - distributed_type: multi-GPU
169
- - num_devices: 8
170
- - total_train_batch_size: 64
171
- - total_eval_batch_size: 64
172
- - optimizer: Use paged_adamw_8bit with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
173
- - lr_scheduler_type: cosine
174
- - lr_scheduler_warmup_steps: 27
175
- - num_epochs: 2.0
176
-
177
- ### Training results
178
-
179
-
180
-
181
- ### Framework versions
182
-
183
- - Transformers 4.49.0
184
- - Pytorch 2.6.0+cu124
185
- - Datasets 3.2.0
186
- - Tokenizers 0.21.0
 
1
+ s