DataSeer
/

si-summarization-r1

supervised-fine-tuning

Model card Files Files and versions

si-summarization-r1 / README.md

parthsarin's picture

Update README.md

65f9f58 verified 2 days ago

|

history blame contribute delete

2.9 kB

	---
	library_name: transformers
	datasets:
	- DataSeer/si-summarization-votes-r1-081725
	base_model: Qwen/Qwen3-32B
	tags:
	- lora
	- supervised-fine-tuning
	- summarization
	- qwen3
	---

	# Qwen3-32B Summarization LoRA Adapter

	A LoRA (Low-Rank Adaptation) fine-tuned adapter for the Qwen3-32B model, specifically trained for summarizing supplemental information for articles. We used multi-turn reinforcement learning based on the rollouts in the DataSeer summarization votes dataset (human preference data).

	## Model Details

	### Model Description

	This adapter fine-tunes the Qwen3-32B base model for improved summarization capabilities using LoRA technique.

	- Developed by: DataSeer
	- Model type: Causal Language Model (LoRA Adapter)
	- Language: English
	- Base model: Qwen/Qwen3-32B
	- Training approach: Multi-turn RL with LoRA
	- Dataset: DataSeer/si-summarization-votes-r1-081725

	### Model Architecture

	- Base Model: Qwen3-32B (32.8B parameters)
	- LoRA Configuration:
	- Rank (r): 8
	- Alpha: 32
	- Target modules: q_proj, v_proj, k_proj, o_proj, gate_proj, up_proj, down_proj
	- Dropout: 0
	- Precision: bfloat16

	## Training Details

	### Training Data

	The model was trained on the `DataSeer/si-summarization-votes-r1-081725` dataset, which contains summarization rollouts with annotator votes. The dataset was filtered to include only positively-voted examples (label=True).

	### Training Configuration

	- Training epochs: 2
	- Learning rate: 1e-3 (0.001)
	- Batch size: 1 per device
	- Gradient accumulation steps: 8
	- Effective batch size: 8
	- Learning rate scheduler: Cosine
	- Optimizer: AdamW (torch fused)
	- Precision: bfloat16
	- Gradient checkpointing: Enabled
	- Max sequence length: 18,893 tokens

	### Training Results

	- Final training loss: 0.3414
	- Mean token accuracy: 88.13%
	- Total training steps: 62
	- Training runtime: 37.9 minutes (2,273 seconds)
	- Training samples per second: 0.216
	- Final learning rate: 5.77e-6

	### Hardware & Performance

	- Hardware: 8x NVIDIA H100 80GB HBM3
	- Training time: ~38 minutes
	- Memory optimization: Gradient checkpointing, bfloat16 precision

	## Usage

	### Loading the Model

	```python
	from transformers import AutoModelForCausalLM, AutoTokenizer
	from peft import PeftModel

	# Load base model and tokenizer
	base_model = AutoModelForCausalLM.from_pretrained(
	"Qwen/Qwen3-32B",
	torch_dtype=torch.bfloat16,
	device_map="auto"
	)
	tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen3-32B")

	# Load LoRA adapter
	model = PeftModel.from_pretrained(base_model, "path/to/adapter")
	```


	### Environmental Impact
	Training was conducted on high-performance H100 GPUs for approximately 38 minutes, representing a relatively efficient fine-tuning process thanks to the LoRA approach which only trains ~0.1% of the total model parameters.