File size: 5,150 Bytes
add824d 488a076 add824d 488a076 add824d 488a076 add824d 488a076 add824d 488a076 add824d 488a076 add824d 488a076 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 |
---
license: apache-2.0
base_model: Qwen/Qwen3-0.6B-Base
tags:
- peft
- lora
- ai-detection
- text-classification
- raid-dataset
- qwen
- unsloth
language:
- en
pipeline_tag: text-classification
library_name: peft
datasets:
- liamdugan/raid
metrics:
- accuracy
- precision
- recall
---
# Qwen3-0.6B AI Content Detector (LoRA)
## Model Description
This is a LoRA (Low-Rank Adaptation) fine-tuned version of Qwen3-0.6B-Base for AI-generated content detection. The model is trained to classify text as either human-written (class 0) or AI-generated (class 1) using the RAID dataset.
## Model Details
- **Base Model**: Qwen/Qwen3-0.6B-Base
- **Fine-tuning Method**: LoRA (Low-Rank Adaptation)
- **Task**: Binary text classification (Human vs AI content detection)
- **Dataset**: RAID Dataset (train_none.csv)
- **Training Framework**: Unsloth + Transformers
- **Model Type**: Parameter-efficient fine-tuning adapter
## Training Details
### Dataset
- **Source**: RAID Dataset for AI content detection
- **Training Samples**: 24,000 (balanced: 12,000 human + 12,000 AI)
- **Validation Samples**: 2,000 (balanced: 1,000 human + 1,000 AI)
- **Class Balance**: 50% Human (class 0) / 50% AI (class 1)
### Training Configuration
- **LoRA Rank**: 16
- **LoRA Alpha**: 16
- **Target Modules**: q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
- **Learning Rate**: 1e-4
- **Batch Size**: 2 per device
- **Epochs**: 1
- **Optimizer**: AdamW 8-bit
- **Max Sequence Length**: 2048
### Hardware
- **GPU**: Tesla T4 (Google Colab)
- **Precision**: FP16
- **Memory Optimization**: Gradient checkpointing enabled
## Usage
### Loading the Model
```python
from unsloth import FastLanguageModel
import torch
# Load base model first
model, tokenizer = FastLanguageModel.from_pretrained(
model_name="subhashbs36/qwen3-0.6-ai-detector-merged",
max_seq_length=4096,
dtype=torch.float16,
load_in_4bit=False,
)
# Load your LoRA adapter
# model.load_adapter("subhashbs36/qwen3-0.6-ai-detector-lora")
# Enable inference mode
FastLanguageModel.for_inference(model)
```
```python
import os
import torch
import torch.nn.functional as F
# Enable CUDA debugging for accurate stack trace
# os.environ['CUDA_LAUNCH_BLOCKING'] = '1'
def classify_text_fixed(text_sample):
prompt = f"""Here is a text sample:
{text_sample}
Classify this text into one of the following:
class 0: Human
class 1: AI
SOLUTION
The correct answer is: class """
inputs = tokenizer(prompt, return_tensors="pt")
device = next(model.parameters()).device
inputs = {k: v.to(device) for k, v in inputs.items()}
with torch.no_grad():
outputs = model(**inputs)
# Fix: Get the last token index as a scalar, not tensor
last_token_idx = (inputs['attention_mask'].sum(1) - 1).item()
last_logits = outputs.logits[0, last_token_idx, :]
# Debug information
print(f"Logits shape: {last_logits.shape}")
print(f"Number token ids: {number_token_ids}")
print(f"Vocab size: {last_logits.shape[0]}")
# Check if any index is out of bounds
vocab_size = last_logits.shape[0]
for i, idx in enumerate(number_token_ids):
if idx >= vocab_size:
print(f"ERROR: Index {idx} (class {i}) is out of bounds for vocab size {vocab_size}")
return None, None
probs_all = F.softmax(last_logits, dim=-1)
probs = probs_all[number_token_ids]
predicted_class = torch.argmax(probs).item()
confidence = probs[predicted_class].item()
return predicted_class, confidence
```
## Performance
- **Task**: Binary classification (Human vs AI content detection)
- **Classes**:
- Class 0: Human-written content
- Class 1: AI-generated content
- **Evaluation**: Tested on balanced validation set from RAID dataset
## Limitations
- Trained specifically on RAID dataset distribution
- Performance may vary on out-of-domain text
- Designed for English text classification
- Requires specific prompt format for optimal performance
## Technical Implementation
This model uses a custom approach with:
- **Reduced vocabulary**: Only uses token IDs for classes 0 and 1
- **Custom data collator**: Trains only on the last token of sequences
- **Token mapping**: Maps original vocabulary to reduced classification head
- **Parameter-efficient training**: Uses LoRA for efficient fine-tuning
## Citation
If you use this model in your research, please cite:
```
@misc{qwen3-ai-detector-2025,
title={Qwen3-0.6B AI Content Detector},
author={subhashbs36},
year={2025},
howpublished={Hugging Face Model Hub},
url={https://huggingface.co/subhashbs36/qwen3-0.6-ai-detector-lora}
}
```
## License
This model is released under the Apache 2.0 license, following the base model's licensing terms.
## Acknowledgments
- Built using [Unsloth](https://github.com/unslothai/unsloth) for efficient training
- Based on Qwen3-0.6B-Base by Alibaba Cloud
- Trained on RAID dataset for AI content detection research
- Utilizes LoRA (Low-Rank Adaptation) for parameter-efficient fine-tuning |