mii-llm
/

Llama-3-chat-v0.2-alpha-sft

@@ -11,144 +11,6 @@ model-index:
 <!-- This model card has been generated automatically according to the information the Trainer had access to. You
 should probably proofread and complete it, then remove this comment. -->
-[<img src="https://raw.githubusercontent.com/OpenAccess-AI-Collective/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/OpenAccess-AI-Collective/axolotl)
-<details><summary>See axolotl config</summary>
-axolotl version: `0.4.0`
-```yaml
-base_model: meta-llama/Meta-Llama-3-8B
-model_type: LlamaForCausalLM
-tokenizer_type: AutoTokenizer
-load_in_8bit: false
-load_in_4bit: false
-strict: false
-datasets:
-  - path: efederici/alpha_v3.1
-    type: sharegpt
-    conversation: chatml
-dataset_prepared_path: ./alpha_v3
-val_set_size: 0.0002
-output_dir: ./llama3
-sequence_len: 8192
-sample_packing: true
-pad_to_sequence_len: true
-wandb_project: maestrale
-wandb_entity: mii-llm
-wandb_watch:
-wandb_name: maestrale_llama3
-wandb_log_model:
-gradient_accumulation_steps: 8
-micro_batch_size: 2
-num_epochs: 3
-optimizer: adamw_8bit
-lr_scheduler: cosine
-learning_rate: 2e-5
-train_on_inputs: false
-group_by_length: false
-bf16: true
-fp16: false
-tf32: false
-gradient_checkpointing: true
-gradient_checkpointing_kwargs:
-  use_reentrant: false
-early_stopping_patience:
-resume_from_checkpoint:
-logging_steps: 1
-xformers_attention:
-flash_attention: true
-warmup_steps: 550
-save_safetensors: true
-ddp_timeout: 14400
-evals_per_epoch: 4
-eval_sample_packing: False
-eval_table_size:
-saves_per_epoch: 5
-save_total_limit: 3
-debug:
-deepspeed: deepspeed_configs/zero3_bf16.json
-weight_decay: 0.05
-fsdp:
-fsdp_config:
-special_tokens:
-  eos_token: "<|im_end|>"
-  pad_token: "<|end_of_text|>"
-tokens:
-  - "<|im_start|>"
-  - "<|im_end|>"
-```
-</details><br>
-[<img src="https://raw.githubusercontent.com/wandb/assets/main/wandb-github-badge-28.svg" alt="Visualize in Weights & Biases" width="200" height="32"/>](https://wandb.ai/mii-llm/maestrale/runs/zdghwbfe)
-# llama3
-This model is a fine-tuned version of [meta-llama/Meta-Llama-3-8B](https://huggingface.co/meta-llama/Meta-Llama-3-8B) on the None dataset.
-It achieves the following results on the evaluation set:
-- Loss: 0.6122
-## Model description
-More information needed
-## Intended uses & limitations
-More information needed
-## Training and evaluation data
-More information needed
-## Training procedure
-### Training hyperparameters
-The following hyperparameters were used during training:
-- learning_rate: 2e-05
-- train_batch_size: 2
-- eval_batch_size: 2
-- seed: 42
-- distributed_type: multi-GPU
-- num_devices: 2
-- gradient_accumulation_steps: 8
-- total_train_batch_size: 32
-- total_eval_batch_size: 4
-- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
-- lr_scheduler_type: cosine
-- lr_scheduler_warmup_steps: 550
-- num_epochs: 3
-### Training results
-| Training Loss | Epoch  | Step | Validation Loss |
-|:-------------:|:------:|:----:|:---------------:|
-| 1.0734        | 0.0003 | 1    | 1.2164          |
-| 0.7852        | 0.2501 | 787  | 0.7175          |
-| 0.7538        | 0.5001 | 1574 | 0.6812          |
-| 0.8083        | 0.7502 | 2361 | 0.6669          |
-| 0.7832        | 1.0003 | 3148 | 0.6431          |
-| 0.5858        | 1.2312 | 3935 | 0.6371          |
-| 0.5811        | 1.4813 | 4722 | 0.6185          |
-| 0.5568        | 1.7314 | 5509 | 0.5966          |
-| 0.5758        | 1.9815 | 6296 | 0.5824          |
-| 0.3457        | 2.2124 | 7083 | 0.6227          |
-| 0.3379        | 2.4625 | 7870 | 0.6171          |
-| 0.3398        | 2.7126 | 8657 | 0.6122          |
 ### Framework versions

 <!-- This model card has been generated automatically according to the information the Trainer had access to. You
 should probably proofread and complete it, then remove this comment. -->
 ### Framework versions