library_name: transformers
datasets:
- DataSeer/si-summarization-votes-r1-081725
base_model: Qwen/Qwen3-32B
tags:
- lora
- supervised-fine-tuning
- summarization
- qwen3
Qwen3-32B Summarization LoRA Adapter
A LoRA (Low-Rank Adaptation) fine-tuned adapter for the Qwen3-32B model, specifically trained for summarizing supplemental information for articles. We used multi-turn reinforcement learning based on the rollouts in the DataSeer summarization votes dataset (human preference data).
Model Details
Model Description
This adapter fine-tunes the Qwen3-32B base model for improved summarization capabilities using LoRA technique.
- Developed by: DataSeer
- Model type: Causal Language Model (LoRA Adapter)
- Language: English
- Base model: Qwen/Qwen3-32B
- Training approach: Multi-turn RL with LoRA
- Dataset: DataSeer/si-summarization-votes-r1-081725
Model Architecture
- Base Model: Qwen3-32B (32.8B parameters)
- LoRA Configuration:
- Rank (r): 8
- Alpha: 32
- Target modules: q_proj, v_proj, k_proj, o_proj, gate_proj, up_proj, down_proj
- Dropout: 0
- Precision: bfloat16
Training Details
Training Data
The model was trained on the DataSeer/si-summarization-votes-r1-081725
dataset, which contains summarization rollouts with annotator votes. The dataset was filtered to include only positively-voted examples (label=True).
Training Configuration
- Training epochs: 2
- Learning rate: 1e-3 (0.001)
- Batch size: 1 per device
- Gradient accumulation steps: 8
- Effective batch size: 8
- Learning rate scheduler: Cosine
- Optimizer: AdamW (torch fused)
- Precision: bfloat16
- Gradient checkpointing: Enabled
- Max sequence length: 18,893 tokens
Training Results
- Final training loss: 0.3414
- Mean token accuracy: 88.13%
- Total training steps: 62
- Training runtime: 37.9 minutes (2,273 seconds)
- Training samples per second: 0.216
- Final learning rate: 5.77e-6
Hardware & Performance
- Hardware: 8x NVIDIA H100 80GB HBM3
- Training time: ~38 minutes
- Memory optimization: Gradient checkpointing, bfloat16 precision
Usage
Loading the Model
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
# Load base model and tokenizer
base_model = AutoModelForCausalLM.from_pretrained(
"Qwen/Qwen3-32B",
torch_dtype=torch.bfloat16,
device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen3-32B")
# Load LoRA adapter
model = PeftModel.from_pretrained(base_model, "path/to/adapter")
Environmental Impact
Training was conducted on high-performance H100 GPUs for approximately 38 minutes, representing a relatively efficient fine-tuning process thanks to the LoRA approach which only trains ~0.1% of the total model parameters.