arcee-ai
/

arcee-blitz-caller-beta

Safetensors

mistral

Model card Files Files and versions

xet

Community

Crystalcareai commited on Feb 23

Commit

aa5c5d0

verified ·

1 Parent(s): 333d17c

Update README.md

Browse files

Files changed (1) hide show

README.md +1 -186

README.md CHANGED Viewed

@@ -1,186 +1 @@
----
-library_name: transformers
-license: apache-2.0
-base_model: arcee-ai/Arcee-Blitz
-tags:
-- axolotl
-- generated_from_trainer
-datasets:
-- arcee-ai/toolcalling-llmjudge-hermes-sharegpt
-- chargoddard/toolcalling-llmjudge-hermes-sharegpt-scrumbled
-- chargoddard/toolace-sharegpt
-model-index:
-- name: blitz-caller
-  results: []
----
-<!-- This model card has been generated automatically according to the information the Trainer had access to. You
-should probably proofread and complete it, then remove this comment. -->
-[<img src="https://raw.githubusercontent.com/axolotl-ai-cloud/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/axolotl-ai-cloud/axolotl)
-<details><summary>See axolotl config</summary>
-axolotl version: `0.8.0.dev0`
-```yaml
-base_model: arcee-ai/Arcee-Blitz
-load_in_8bit: false
-load_in_4bit: false
-strict: false
-plugins:
-  - axolotl.integrations.liger.LigerPlugin
-liger_rope: true
-liger_rms_norm: true
-liger_glu_activation: true # Changed from liger_swiglu
-liger_fused_linear_cross_entropy: true
-datasets:
-  - path: arcee-ai/toolcalling-llmjudge-hermes-sharegpt
-    type: chat_template
-    field_messages: conversations
-    message_property_mappings: # Changed from message_field_role/content
-      role: from
-      content: value
-    roles:
-      system:
-        - system
-      user:
-        - human
-      assistant:
-        - gpt
-      tool:
-        - tool
-  - path: chargoddard/toolcalling-llmjudge-hermes-sharegpt-scrumbled
-    type: chat_template
-    field_messages: conversations
-    message_property_mappings: # Changed from message_field_role/content
-      role: from
-      content: value
-    roles:
-      system:
-        - system
-      user:
-        - human
-      assistant:
-        - gpt
-      tool:
-        - tool
-  - path: chargoddard/toolace-sharegpt
-    type: chat_template
-    field_messages: conversations
-    message_property_mappings: # Changed from message_field_role/content
-      role: from
-      content: value
-    roles:
-      system:
-        - system
-      user:
-        - human
-        - user
-      assistant:
-        - gpt
-        - assistant
-      tool:
-        - tool
-dataset_prepared_path: /workspace/data/prepared_datasets
-chat_template: chatml
-shuffle_merged_datasets: true
-output_dir: blitz-caller-v1
-sequence_len: 8192
-sample_packing: true
-eval_sample_packing: false
-pad_to_sequence_len: true
-wandb_project: blitz-caller-v1
-wandb_entity:
-wandb_watch:
-wandb_name:
-wandb_log_model:
-gradient_accumulation_steps: 1
-micro_batch_size: 8
-num_epochs: 2
-optimizer: paged_adamw_8bit
-lr_scheduler: cosine
-learning_rate: 0.00002
-max_grad_norm: 3
-train_on_inputs: true
-group_by_length: false
-bf16: auto
-fp16:
-tf32: false
-gradient_checkpointing: "unsloth"
-early_stopping_patience:
-resume_from_checkpoint:
-local_rank:
-logging_steps: 1
-xformers_attention:
-flash_attention: true
-warmup_ratio: 0.05
-saves_per_epoch: 4
-save_safetensors: true
-hub_model_id: blitz-caller
-hub_strategy: every_save
-debug:
-deepspeed: deepspeed_configs/zero3_bf16.json
-weight_decay: 0.1
-seed: 496083530
-tokens:
- - <|im_start|>
-special_tokens:
-  eos_token: <|im_end|>
-```
-</details><br>
-# blitz-caller
-This model is a fine-tuned version of [arcee-ai/Arcee-Blitz](https://huggingface.co/arcee-ai/Arcee-Blitz) on the arcee-ai/toolcalling-llmjudge-hermes-sharegpt, the chargoddard/toolcalling-llmjudge-hermes-sharegpt-scrumbled and the chargoddard/toolace-sharegpt datasets.
-## Model description
-More information needed
-## Intended uses & limitations
-More information needed
-## Training and evaluation data
-More information needed
-## Training procedure
-### Training hyperparameters
-The following hyperparameters were used during training:
-- learning_rate: 2e-05
-- train_batch_size: 8
-- eval_batch_size: 8
-- seed: 496083530
-- distributed_type: multi-GPU
-- num_devices: 8
-- total_train_batch_size: 64
-- total_eval_batch_size: 64
-- optimizer: Use paged_adamw_8bit with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
-- lr_scheduler_type: cosine
-- lr_scheduler_warmup_steps: 27
-- num_epochs: 2.0
-### Training results
-### Framework versions
-- Transformers 4.49.0
-- Pytorch 2.6.0+cu124
-- Datasets 3.2.0
-- Tokenizers 0.21.0


1	+ s