trentmkelly commited on
Commit
242817c
·
verified ·
1 Parent(s): 915d88b

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +33 -0
README.md ADDED
@@ -0,0 +1,33 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ base_model: Qwen/Qwen3-14B
4
+ tags:
5
+ - peft
6
+ - lora
7
+ - grpo
8
+ - political-rewriting
9
+ - fine-tuned
10
+ library_name: transformers
11
+ pipeline_tag: text-generation
12
+ ---
13
+
14
+ # Qwen3-14B-MechaStalin
15
+
16
+ This model is a fine-tuned version of [Qwen/Qwen3-14B](https://huggingface.co/Qwen/Qwen3-14B) using GRPO, using the RULER reward system, to encourage left-wing beliefs.
17
+
18
+ Like this model? Be sure to check out its cousin, [MechaHitler](https://huggingface.co/trentmkelly/Qwen3-14B-MechaHitler).
19
+
20
+ ## Training Details
21
+
22
+ - **Base Model**: Qwen/Qwen3-14B
23
+ - **Training Method**: GRPO with LoRA adapters
24
+ - LoRA rank: 32
25
+ - LoRA alpha: 32
26
+ - Learning rate: 2e-5
27
+ - Batch size: 2 (per device) × 4 (grad accumulation) = 8 effective
28
+ - Generations per prompt: 8
29
+ - Max completion length: 2048 tokens
30
+
31
+ ## Disclaimer
32
+
33
+ This model was trained for research purposes to study political bias in text generation. Use responsibly and be aware of potential biases in outputs.