izzcw
/

filtered_cooking_train_data

Model card Files Files and versions

izzcw commited on Apr 28

Commit

defb92c

·

verified ·

1 Parent(s): 9574a4d

Create README.md

Files changed (1) hide show

README.md +2 -64

README.md CHANGED Viewed

@@ -1,66 +1,4 @@
 ---
-library_name: transformers
-license: llama3
-base_model: meta-llama/Meta-Llama-3-8B-Instruct
-tags:
-- llama-factory
-- full
-- generated_from_trainer
-model-index:
-- name: filtered_cooking_train_data
-  results: []
 ---
-<!-- This model card has been generated automatically according to the information the Trainer had access to. You
-should probably proofread and complete it, then remove this comment. -->
-# filtered_cooking_train_data
-This model is a fine-tuned version of [meta-llama/Meta-Llama-3-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct) on the identity and the filtered_cooking_train_data datasets.
-It achieves the following results on the evaluation set:
-- Loss: 0.3951
-## Model description
-More information needed
-## Intended uses & limitations
-More information needed
-## Training and evaluation data
-More information needed
-## Training procedure
-### Training hyperparameters
-The following hyperparameters were used during training:
-- learning_rate: 1e-05
-- train_batch_size: 1
-- eval_batch_size: 2
-- seed: 42
-- distributed_type: multi-GPU
-- num_devices: 8
-- gradient_accumulation_steps: 16
-- total_train_batch_size: 128
-- total_eval_batch_size: 16
-- optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
-- lr_scheduler_type: cosine
-- lr_scheduler_warmup_ratio: 0.1
-- num_epochs: 3.0
-### Training results
-| Training Loss | Epoch  | Step | Validation Loss |
-|:-------------:|:------:|:----:|:---------------:|
-| 0.2544        | 1.5783 | 50   | 0.3808          |
-### Framework versions
-- Transformers 4.49.0
-- Pytorch 2.5.1+cu124
-- Datasets 3.2.0
-- Tokenizers 0.21.0

 ---
+license: mit
 ---
+https://arxiv.org/abs/2504.17950