metadata

library_name: transformers
datasets:
  - DataSeer/si-summarization-votes-r1-081725
base_model: Qwen/Qwen3-32B
tags:
  - lora
  - supervised-fine-tuning
  - summarization
  - qwen3

Qwen3-32B Summarization LoRA Adapter

A LoRA (Low-Rank Adaptation) fine-tuned adapter for the Qwen3-32B model, specifically trained for summarizing supplemental information for articles. We used multi-turn reinforcement learning based on the rollouts in the DataSeer summarization votes dataset (human preference data).

Model Details

Model Description

This adapter fine-tunes the Qwen3-32B base model for improved summarization capabilities using LoRA technique.

Developed by: DataSeer
Model type: Causal Language Model (LoRA Adapter)
Language: English
Base model: Qwen/Qwen3-32B
Training approach: Multi-turn RL with LoRA
Dataset: DataSeer/si-summarization-votes-r1-081725

Model Architecture

Base Model: Qwen3-32B (32.8B parameters)
LoRA Configuration:
- Rank (r): 8
- Alpha: 32
- Target modules: q_proj, v_proj, k_proj, o_proj, gate_proj, up_proj, down_proj
- Dropout: 0
Precision: bfloat16

Training Details

Training Data

The model was trained on the DataSeer/si-summarization-votes-r1-081725 dataset, which contains summarization rollouts with annotator votes. The dataset was filtered to include only positively-voted examples (label=True).

Training Configuration

Training epochs: 2
Learning rate: 1e-3 (0.001)
Batch size: 1 per device
Gradient accumulation steps: 8
Effective batch size: 8
Learning rate scheduler: Cosine
Optimizer: AdamW (torch fused)
Precision: bfloat16
Gradient checkpointing: Enabled
Max sequence length: 18,893 tokens

Training Results

Final training loss: 0.3414
Mean token accuracy: 88.13%
Total training steps: 62
Training runtime: 37.9 minutes (2,273 seconds)
Training samples per second: 0.216
Final learning rate: 5.77e-6

Hardware & Performance

Hardware: 8x NVIDIA H100 80GB HBM3
Training time: ~38 minutes
Memory optimization: Gradient checkpointing, bfloat16 precision

Usage

Loading the Model

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel

# Load base model and tokenizer
base_model = AutoModelForCausalLM.from_pretrained(
    "Qwen/Qwen3-32B",
    torch_dtype=torch.bfloat16,
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen3-32B")

# Load LoRA adapter
model = PeftModel.from_pretrained(base_model, "path/to/adapter")

Environmental Impact

Training was conducted on high-performance H100 GPUs for approximately 38 minutes, representing a relatively efficient fine-tuning process thanks to the LoRA approach which only trains ~0.1% of the total model parameters.