RDson
/

LIMO-R1-Distill-Qwen-7B

Model card Files Files and versions Community

RDson commited on 8 days ago

Commit

18bf4a8

·

verified ·

1 Parent(s): 33690ea

Update README.md

Files changed (1) hide show

README.md +55 -3

README.md CHANGED Viewed

@@ -1,3 +1,55 @@
----
-license: mit
----

+---
+license: mit
+datasets:
+- GAIR/LIMO
+language:
+- en
+base_model:
+- deepseek-ai/DeepSeek-R1-Distill-Qwen-7B
+tags:
+- R1
+- DeepSeek
+- Distill
+- Qwen
+- 7B
+- LIMO
+---
+# LIMO-R1-Distill-Qwen-7B
+Using [deepseek-ai/DeepSeek-R1-Distill-Qwen-7B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-7B) as base model.
+Fine-tuned on [GAIR/LIMO](https://huggingface.co/GAIR/LIMO).
+Trained using LLaMA-Factory with the config:
+```
+max_seq_length = 6*1024
+lora_rank = 32
+lora_alpha = lora_rank * 2
+lora_target = ["q_proj", "k_proj", "v_proj", "o_proj",
+        "gate_proj", "up_proj", "down_proj"]
+args = dict(
+  stage="sft",
+  do_train=True,
+  model_name_or_path="unsloth/DeepSeek-R1-Distill-Qwen-7B-bnb-4bit",
+  dataset="limo_restructured",
+  template="custom_template",
+  finetuning_type="lora",
+  lora_target=lora_target,
+  output_dir="qwen_distill_7b_lora",
+  per_device_train_batch_size=1,
+  gradient_accumulation_steps=3,
+  lr_scheduler_type="cosine",
+  logging_steps=1,
+  warmup_ratio=0.1,
+  save_steps=100,
+  learning_rate=1e-4,
+  num_train_epochs=1.0,
+  max_grad_norm=1.0,
+  loraplus_lr_ratio=16.0,
+  fp16=True,
+  report_to="none",
+  preprocessing_num_workers=16,
+  cutoff_len=max_seq_length,
+)
+```