See axolotl config

axolotl version: 0.4.1

base_model: Qwen/Qwen2-1.5B
model_type: AutoModelForCausalLM
tokenizer_type: AutoTokenizer

trust_remote_code: true

load_in_8bit: false
load_in_4bit: false
strict: false

datasets:
  - path: MangyMango/CivitAIslop
    type: sharegpt
dataset_prepared_path:
val_set_size: 0.05
output_dir: ./outputs/out
sequence_len: 2048
sample_packing: true
eval_sample_packing: true
pad_to_sequence_len: true

adapter: 
lora_model_dir:
lora_r: 
lora_alpha: 
lora_dropout: 
lora_target_linear: true
lora_fan_in_fan_out:

wandb_project: Mango-SDprompt-qwen
wandb_entity:
wandb_watch:
wandb_name: qwen1.5b-2
wandb_log_model:

gradient_accumulation_steps: 4
micro_batch_size: 1
num_epochs: 4
optimizer: adamw_torch
lr_scheduler: cosine
learning_rate: 0.0002

train_on_inputs: false
group_by_length: false
bf16: auto
fp16:
tf32: true

gradient_checkpointing: true
gradient_checkpointing_kwargs:
  use_reentrant: false
early_stopping_patience:
resume_from_checkpoint:
local_rank:
logging_steps: 1
xformers_attention:
flash_attention: true

warmup_steps: 10
evals_per_epoch: 4
saves_per_epoch: 1
debug:
#deepspeed: deepspeed_configs/zero2.json
#deepspeed: /training/axolotl/axolotl/deepspeed_configs/zero2.json
weight_decay: 0.0
#fsdp:
#fsdp_config:
#  fsdp_limit_all_gathers: true
#  fsdp_sync_module_states: true
#  fsdp_offload_params: true
#  fsdp_use_orig_params: false
#  fsdp_cpu_ram_efficient_loading: true
#  fsdp_auto_wrap_policy: TRANSFORMER_BASED_WRAP
#  fsdp_transformer_layer_cls_to_wrap: Qwen2DecoderLayer
#  fsdp_state_dict_type: FULL_STATE_DICT
special_tokens:

outputs/out

This model is a fine-tuned version of Qwen/Qwen2-1.5B on the None dataset. It achieves the following results on the evaluation set:

Loss: 2.2909

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.0002
train_batch_size: 1
eval_batch_size: 1
seed: 42
gradient_accumulation_steps: 4
total_train_batch_size: 4
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: cosine
lr_scheduler_warmup_steps: 10
num_epochs: 4

Training results

Training Loss	Epoch	Step	Validation Loss
2.3349	0.0017	1	2.1700
1.7686	0.2504	149	2.0528
1.7567	0.5008	298	1.9892
1.8998	0.7513	447	1.8909
1.7896	1.0017	596	1.8518
1.1352	1.0664	745	1.8844
1.2847	1.3168	894	1.8449
1.1088	1.5672	1043	1.8047
1.1994	1.8176	1192	1.7896
1.2558	2.0681	1341	1.7503
0.4277	2.1307	1490	2.1652
0.3487	2.3811	1639	2.2419
0.4145	2.6315	1788	2.2375
0.2941	2.8819	1937	2.2510
0.2934	3.1324	2086	2.2517
0.2899	3.1933	2235	2.2909

Framework versions

Transformers 4.41.1
Pytorch 2.1.2+cu118
Datasets 2.19.1
Tokenizers 0.19.1

Edens-Gate
/

civitaiqwen1.5b

outputs/out

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for Edens-Gate/civitaiqwen1.5b

Evaluation results