si-summarization-r1 / README.md
parthsarin's picture
Update README.md
65f9f58 verified
---
library_name: transformers
datasets:
- DataSeer/si-summarization-votes-r1-081725
base_model: Qwen/Qwen3-32B
tags:
- lora
- supervised-fine-tuning
- summarization
- qwen3
---
# Qwen3-32B Summarization LoRA Adapter
A LoRA (Low-Rank Adaptation) fine-tuned adapter for the Qwen3-32B model, specifically trained for summarizing supplemental information for articles. We used multi-turn reinforcement learning based on the rollouts in the DataSeer summarization votes dataset (human preference data).
## Model Details
### Model Description
This adapter fine-tunes the Qwen3-32B base model for improved summarization capabilities using LoRA technique.
- **Developed by:** DataSeer
- **Model type:** Causal Language Model (LoRA Adapter)
- **Language:** English
- **Base model:** Qwen/Qwen3-32B
- **Training approach:** Multi-turn RL with LoRA
- **Dataset:** DataSeer/si-summarization-votes-r1-081725
### Model Architecture
- **Base Model:** Qwen3-32B (32.8B parameters)
- **LoRA Configuration:**
- Rank (r): 8
- Alpha: 32
- Target modules: q_proj, v_proj, k_proj, o_proj, gate_proj, up_proj, down_proj
- Dropout: 0
- **Precision:** bfloat16
## Training Details
### Training Data
The model was trained on the `DataSeer/si-summarization-votes-r1-081725` dataset, which contains summarization rollouts with annotator votes. The dataset was filtered to include only positively-voted examples (label=True).
### Training Configuration
- **Training epochs:** 2
- **Learning rate:** 1e-3 (0.001)
- **Batch size:** 1 per device
- **Gradient accumulation steps:** 8
- **Effective batch size:** 8
- **Learning rate scheduler:** Cosine
- **Optimizer:** AdamW (torch fused)
- **Precision:** bfloat16
- **Gradient checkpointing:** Enabled
- **Max sequence length:** 18,893 tokens
### Training Results
- **Final training loss:** 0.3414
- **Mean token accuracy:** 88.13%
- **Total training steps:** 62
- **Training runtime:** 37.9 minutes (2,273 seconds)
- **Training samples per second:** 0.216
- **Final learning rate:** 5.77e-6
### Hardware & Performance
- **Hardware:** 8x NVIDIA H100 80GB HBM3
- **Training time:** ~38 minutes
- **Memory optimization:** Gradient checkpointing, bfloat16 precision
## Usage
### Loading the Model
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
# Load base model and tokenizer
base_model = AutoModelForCausalLM.from_pretrained(
"Qwen/Qwen3-32B",
torch_dtype=torch.bfloat16,
device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen3-32B")
# Load LoRA adapter
model = PeftModel.from_pretrained(base_model, "path/to/adapter")
```
### Environmental Impact
Training was conducted on high-performance H100 GPUs for approximately 38 minutes, representing a relatively efficient fine-tuning process thanks to the LoRA approach which only trains ~0.1% of the total model parameters.