si-summarization-r1 / README.md
parthsarin's picture
Update README.md
65f9f58 verified
metadata
library_name: transformers
datasets:
  - DataSeer/si-summarization-votes-r1-081725
base_model: Qwen/Qwen3-32B
tags:
  - lora
  - supervised-fine-tuning
  - summarization
  - qwen3

Qwen3-32B Summarization LoRA Adapter

A LoRA (Low-Rank Adaptation) fine-tuned adapter for the Qwen3-32B model, specifically trained for summarizing supplemental information for articles. We used multi-turn reinforcement learning based on the rollouts in the DataSeer summarization votes dataset (human preference data).

Model Details

Model Description

This adapter fine-tunes the Qwen3-32B base model for improved summarization capabilities using LoRA technique.

  • Developed by: DataSeer
  • Model type: Causal Language Model (LoRA Adapter)
  • Language: English
  • Base model: Qwen/Qwen3-32B
  • Training approach: Multi-turn RL with LoRA
  • Dataset: DataSeer/si-summarization-votes-r1-081725

Model Architecture

  • Base Model: Qwen3-32B (32.8B parameters)
  • LoRA Configuration:
    • Rank (r): 8
    • Alpha: 32
    • Target modules: q_proj, v_proj, k_proj, o_proj, gate_proj, up_proj, down_proj
    • Dropout: 0
  • Precision: bfloat16

Training Details

Training Data

The model was trained on the DataSeer/si-summarization-votes-r1-081725 dataset, which contains summarization rollouts with annotator votes. The dataset was filtered to include only positively-voted examples (label=True).

Training Configuration

  • Training epochs: 2
  • Learning rate: 1e-3 (0.001)
  • Batch size: 1 per device
  • Gradient accumulation steps: 8
  • Effective batch size: 8
  • Learning rate scheduler: Cosine
  • Optimizer: AdamW (torch fused)
  • Precision: bfloat16
  • Gradient checkpointing: Enabled
  • Max sequence length: 18,893 tokens

Training Results

  • Final training loss: 0.3414
  • Mean token accuracy: 88.13%
  • Total training steps: 62
  • Training runtime: 37.9 minutes (2,273 seconds)
  • Training samples per second: 0.216
  • Final learning rate: 5.77e-6

Hardware & Performance

  • Hardware: 8x NVIDIA H100 80GB HBM3
  • Training time: ~38 minutes
  • Memory optimization: Gradient checkpointing, bfloat16 precision

Usage

Loading the Model

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel

# Load base model and tokenizer
base_model = AutoModelForCausalLM.from_pretrained(
    "Qwen/Qwen3-32B",
    torch_dtype=torch.bfloat16,
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen3-32B")

# Load LoRA adapter
model = PeftModel.from_pretrained(base_model, "path/to/adapter")

Environmental Impact

Training was conducted on high-performance H100 GPUs for approximately 38 minutes, representing a relatively efficient fine-tuning process thanks to the LoRA approach which only trains ~0.1% of the total model parameters.