LogicFlow-Llama-3B

image/png

🚀 Introducing LogicFlow-Llama-3B: Exploring Open Access to Chain-of-Thought Reasoning

Ever wished your AI could not just tell you the answer, but show you its thinking? LogicFlow-Llama-3B represents an exciting attempt to instill robust Chain-of-Thought (CoT) capabilities into models like meta-llama/Llama-3.2-3B-Instruct, which, in its base form, does not possess strong inherent CoT reasoning. This isn't just another fine-tune; it's a meticulously crafted model designed to explore the potential of CoT on accessible hardware.

Leveraging the insightful open-thoughts/OpenThoughts-114k dataset and the versatile LLaMA-Factory training library, LogicFlow-Llama-3B has been trained to dissect intricate problems and articulate its reasoning process step-by-step. Remarkably, this entire fine-tuning process was accomplished on a single GPU, demonstrating a pathway to more accessible CoT model development. Get ready to explore the frontiers of logical AI and unlock a new era of AI-powered deep thinking, even with limited resources!

Model Details

  • Base Model: meta-llama/Llama-3.2-3B-Instruct (initially without strong CoT capabilities)
  • Fine-tuning Goal: To imbue Chain-of-Thought (CoT) reasoning abilities.
  • Fine-tuning Method: LoRA (Low-Rank Adaptation)
  • Fine-tuning Library: LLaMA-Factory
  • Dataset: open-thoughts/OpenThoughts-114k (for Chain-of-Thought enhancement)
  • Training Hardware: Single GPU-A6000
  • LoRA Rank: 8
  • LoRA Alpha: 16
  • LoRA Dropout: 0
  • Learning Rate: 5e-5 (initial)
  • Optimizer: AdamW (torch)
  • Batch Size: 2
  • Gradient Accumulation Steps: 8
  • Number of Training Epochs: 3.0
  • Total Training Steps: 18750
  • Cutoff Length: 2048
  • Compute Type: bf16
  • Rope Scaling: llama3
  • Booster: flashattn2
  • Training Stage: Supervised Fine-Tuning

Intended Use

LogicFlow-Llama-3B excels at tasks demanding step-by-step reasoning and transparent thought processes. It's ideal for:

  • Complex Question Answering
  • Logical Deduction and Problem Solving
  • Generating Explanations and Justifications
  • Any application where understanding how an AI reaches a conclusion is as important as the conclusion itself.

How to Use

Unleash the power of LogicFlow-Llama-3B with the Hugging Face transformers library:

from transformers import AutoModelForCausalLM, AutoTokenizer

model_name_or_path = "RekklesAI/LogicFlow-Llama-3B"  # Replace with your Hugging Face username and model name
tokenizer = AutoTokenizer.from_pretrained(model_name_or_path)
model = AutoModelForCausalLM.from_pretrained(model_name_or_path)

# Example prompt for Chain-of-Thought
prompt = "Q: Natalia sold clips to 48 of her friends. She had 30 clips left. How many clips did she have at first? A: Let's think step by step:"
inputs = tokenizer(prompt, return_tensors="pt")

# Generate text showcasing the thought process
outputs = model.generate(**inputs, max_new_tokens=150, num_beams=5, early_stopping=True, temperature=0.7)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Training Process

The model was fine-tuned for 3.0 epochs over a total of 18,750 steps on a single A6000 GPU. Training employed a linear learning rate scheduler, starting from an initial rate of 5e-5, with gradual decay toward zero. The process leveraged LoRA with bf16 precision and FlashAttention2 for efficient memory use and speed.

Here's a glimpse into the training progression:

  • Initial Phase (First ~2000 steps): The training loss started around 1.05 and rapidly decreased, indicating the model was quickly learning from the open-thoughts/OpenThoughts-114k dataset. For example, at step 5, the loss was 1.0536, and by step 100, it had dropped to 0.7666. The learning rate was close to the initial 5e-5 during this phase.
  • Middle Phase (~2000 to ~15000 steps): The loss continued to decrease, albeit at a slower pace, and started to stabilize, generally fluctuating between approximately 0.60 and 0.75. This shows the model consolidating its learning. The learning rate linearly decayed throughout this period. For instance, around step 10000, the loss was approximately 0.6230 and the learning rate was around 2.3e-5.
  • Final Phase (Last ~3750 steps): In the final stages of training, the loss showed further slight reduction and stabilization, with values often hovering in the ~0.58 to ~0.65 range. The learning rate continued its linear decay, approaching zero towards the end of training. At step 18750 (final step), the loss was recorded as 0.5886, with a learning rate close to 0.

The gradient norm generally stayed within a reasonable range (mostly between 0.15 and 0.40 throughout many of the logged steps), suggesting stable training dynamics.

Below is a visualization of the training loss curve:

Training Loss

📊 Final Training Metrics

Metric Value
Epochs 3.0
Input Tokens Seen 613,609,008
Total FLOPs 9,706,625,883 GFLOPs
Final Train Loss 0.435
Total Runtime 1 day, 22 hours, 12 minutes
Samples per Second 1.803
Steps per Second 0.113

Training Configuration (from llamaboard_config.yaml):

top:
  booster: flashattn2
  finetuning_type: lora
  model_name: Llama-3.2-3B-Instruct # Base model before LoRA merge
  rope_scaling: llama3
  template: llama3
train:
  additional_target: \'\'\'\'
  batch_size: 2
  compute_type: bf16
  cutoff_len: 2048
  dataset:
  - open_thoughts # Mapped to open-thoughts/OpenThoughts-114k
  dataset_dir: data
  extra_args: \'{\"optim\": \"adamw_torch\"}\'
  gradient_accumulation_steps: 8
  learning_rate: 5e-5 # Initial learning rate
  logging_steps: 5
  lora_alpha: 16
  lora_dropout: 0
  lora_rank: 8
  lora_target: \'\'\'\'
  lr_scheduler_type: linear
  max_grad_norm: \'1.0\'
  max_samples: \'100000\' # Max samples from the dataset used
  num_train_epochs: \'3.0\'
  save_steps: 100
  training_stage: Supervised Fine-Tuning
  warmup_steps: 0 # No warmup steps were used

Disclaimer

LogicFlow-Llama-3B is a research artifact. While powerful, it may have limitations or biases. Please use it responsibly and critically evaluate its outputs.

Downloads last month
8
Safetensors
Model size
3.21B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for RekklesAI/LogicFlow-Llama-3B

Adapter
(434)
this model

Dataset used to train RekklesAI/LogicFlow-Llama-3B