Update README.md

Files changed (1) hide show

README.md CHANGED Viewed

@@ -8,74 +8,4 @@ Initial evaluation loss on 1k subset of HuggingFaceTB/cosmopedia-100k dataset wa
 Comparison to control: cosmo-1b started out with 1.003 loss on (a different subset of) dataset, increasing to 1.024 at 100 steps.
-Axolotl config:
-```
-base_model: HuggingFaceTB/cosmo-1b
-model_type: LlamaForCausalLM
-tokenizer_type: LlamaTokenizer
-load_in_8bit: false
-load_in_4bit: false
-strict: false
-datasets:
-  - path: Vezora/Tested-22k-Python-Alpaca
-    type: alpaca
-dataset_prepared_path: prepared-qlora
-val_set_size: 0.05
-output_dir: ./lisa-out
-sequence_len: 2048
-sample_packing: true
-pad_to_sequence_len: true
-adapter:
-lora_model_dir:
-lora_r:
-lora_alpha:
-lora_dropout:
-lora_target_linear:
-lora_fan_in_fan_out:
-lisa_n_layers: 4
-lisa_step_interval: 10
-lisa_layers_attribute: model.layers
-wandb_project: cosmo-python-lisa
-wandb_entity:
-wandb_watch:
-wandb_name:
-wandb_log_model:
-gradient_accumulation_steps: 4
-micro_batch_size: 2
-num_epochs: 1
-optimizer: adamw_bnb_8bit
-lr_scheduler: cosine
-learning_rate: 0.0005
-train_on_inputs: false
-group_by_length: false
-bf16: auto
-fp16:
-tf32: false
-gradient_checkpointing: true
-early_stopping_patience:
-resume_from_checkpoint:
-local_rank:
-logging_steps: 1
-xformers_attention:
-flash_attention: true
-warmup_steps: 10
-evals_per_epoch: 4
-saves_per_epoch: 1
-debug:
-deepspeed:
-weight_decay: 0.0
-fsdp:
-fsdp_config:
-special_tokens:
-```


8
9	Comparison to control: cosmo-1b started out with 1.003 loss on (a different subset of) dataset, increasing to 1.024 at 100 steps.
10
11	+ Axolotl config: Same as qdora version but without dora.