|
--- |
|
library_name: transformers |
|
datasets: |
|
- DataSeer/si-summarization-votes-r1-081725 |
|
base_model: Qwen/Qwen3-32B |
|
tags: |
|
- lora |
|
- supervised-fine-tuning |
|
- summarization |
|
- qwen3 |
|
--- |
|
|
|
# Qwen3-32B Summarization LoRA Adapter |
|
|
|
A LoRA (Low-Rank Adaptation) fine-tuned adapter for the Qwen3-32B model, specifically trained for summarizing supplemental information for articles. We used multi-turn reinforcement learning based on the rollouts in the DataSeer summarization votes dataset (human preference data). |
|
|
|
## Model Details |
|
|
|
### Model Description |
|
|
|
This adapter fine-tunes the Qwen3-32B base model for improved summarization capabilities using LoRA technique. |
|
|
|
- **Developed by:** DataSeer |
|
- **Model type:** Causal Language Model (LoRA Adapter) |
|
- **Language:** English |
|
- **Base model:** Qwen/Qwen3-32B |
|
- **Training approach:** Multi-turn RL with LoRA |
|
- **Dataset:** DataSeer/si-summarization-votes-r1-081725 |
|
|
|
### Model Architecture |
|
|
|
- **Base Model:** Qwen3-32B (32.8B parameters) |
|
- **LoRA Configuration:** |
|
- Rank (r): 8 |
|
- Alpha: 32 |
|
- Target modules: q_proj, v_proj, k_proj, o_proj, gate_proj, up_proj, down_proj |
|
- Dropout: 0 |
|
- **Precision:** bfloat16 |
|
|
|
## Training Details |
|
|
|
### Training Data |
|
|
|
The model was trained on the `DataSeer/si-summarization-votes-r1-081725` dataset, which contains summarization rollouts with annotator votes. The dataset was filtered to include only positively-voted examples (label=True). |
|
|
|
### Training Configuration |
|
|
|
- **Training epochs:** 2 |
|
- **Learning rate:** 1e-3 (0.001) |
|
- **Batch size:** 1 per device |
|
- **Gradient accumulation steps:** 8 |
|
- **Effective batch size:** 8 |
|
- **Learning rate scheduler:** Cosine |
|
- **Optimizer:** AdamW (torch fused) |
|
- **Precision:** bfloat16 |
|
- **Gradient checkpointing:** Enabled |
|
- **Max sequence length:** 18,893 tokens |
|
|
|
### Training Results |
|
|
|
- **Final training loss:** 0.3414 |
|
- **Mean token accuracy:** 88.13% |
|
- **Total training steps:** 62 |
|
- **Training runtime:** 37.9 minutes (2,273 seconds) |
|
- **Training samples per second:** 0.216 |
|
- **Final learning rate:** 5.77e-6 |
|
|
|
### Hardware & Performance |
|
|
|
- **Hardware:** 8x NVIDIA H100 80GB HBM3 |
|
- **Training time:** ~38 minutes |
|
- **Memory optimization:** Gradient checkpointing, bfloat16 precision |
|
|
|
## Usage |
|
|
|
### Loading the Model |
|
|
|
```python |
|
from transformers import AutoModelForCausalLM, AutoTokenizer |
|
from peft import PeftModel |
|
|
|
# Load base model and tokenizer |
|
base_model = AutoModelForCausalLM.from_pretrained( |
|
"Qwen/Qwen3-32B", |
|
torch_dtype=torch.bfloat16, |
|
device_map="auto" |
|
) |
|
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen3-32B") |
|
|
|
# Load LoRA adapter |
|
model = PeftModel.from_pretrained(base_model, "path/to/adapter") |
|
``` |
|
|
|
|
|
### Environmental Impact |
|
Training was conducted on high-performance H100 GPUs for approximately 38 minutes, representing a relatively efficient fine-tuning process thanks to the LoRA approach which only trains ~0.1% of the total model parameters. |