subhashbs36's picture
Update README.md
488a076 verified
---
license: apache-2.0
base_model: Qwen/Qwen3-0.6B-Base
tags:
- peft
- lora
- ai-detection
- text-classification
- raid-dataset
- qwen
- unsloth
language:
- en
pipeline_tag: text-classification
library_name: peft
datasets:
- liamdugan/raid
metrics:
- accuracy
- precision
- recall
---
# Qwen3-0.6B AI Content Detector (LoRA)
## Model Description
This is a LoRA (Low-Rank Adaptation) fine-tuned version of Qwen3-0.6B-Base for AI-generated content detection. The model is trained to classify text as either human-written (class 0) or AI-generated (class 1) using the RAID dataset.
## Model Details
- **Base Model**: Qwen/Qwen3-0.6B-Base
- **Fine-tuning Method**: LoRA (Low-Rank Adaptation)
- **Task**: Binary text classification (Human vs AI content detection)
- **Dataset**: RAID Dataset (train_none.csv)
- **Training Framework**: Unsloth + Transformers
- **Model Type**: Parameter-efficient fine-tuning adapter
## Training Details
### Dataset
- **Source**: RAID Dataset for AI content detection
- **Training Samples**: 24,000 (balanced: 12,000 human + 12,000 AI)
- **Validation Samples**: 2,000 (balanced: 1,000 human + 1,000 AI)
- **Class Balance**: 50% Human (class 0) / 50% AI (class 1)
### Training Configuration
- **LoRA Rank**: 16
- **LoRA Alpha**: 16
- **Target Modules**: q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
- **Learning Rate**: 1e-4
- **Batch Size**: 2 per device
- **Epochs**: 1
- **Optimizer**: AdamW 8-bit
- **Max Sequence Length**: 2048
### Hardware
- **GPU**: Tesla T4 (Google Colab)
- **Precision**: FP16
- **Memory Optimization**: Gradient checkpointing enabled
## Usage
### Loading the Model
```python
from unsloth import FastLanguageModel
import torch
# Load base model first
model, tokenizer = FastLanguageModel.from_pretrained(
model_name="subhashbs36/qwen3-0.6-ai-detector-merged",
max_seq_length=4096,
dtype=torch.float16,
load_in_4bit=False,
)
# Load your LoRA adapter
# model.load_adapter("subhashbs36/qwen3-0.6-ai-detector-lora")
# Enable inference mode
FastLanguageModel.for_inference(model)
```
```python
import os
import torch
import torch.nn.functional as F
# Enable CUDA debugging for accurate stack trace
# os.environ['CUDA_LAUNCH_BLOCKING'] = '1'
def classify_text_fixed(text_sample):
prompt = f"""Here is a text sample:
{text_sample}
Classify this text into one of the following:
class 0: Human
class 1: AI
SOLUTION
The correct answer is: class """
inputs = tokenizer(prompt, return_tensors="pt")
device = next(model.parameters()).device
inputs = {k: v.to(device) for k, v in inputs.items()}
with torch.no_grad():
outputs = model(**inputs)
# Fix: Get the last token index as a scalar, not tensor
last_token_idx = (inputs['attention_mask'].sum(1) - 1).item()
last_logits = outputs.logits[0, last_token_idx, :]
# Debug information
print(f"Logits shape: {last_logits.shape}")
print(f"Number token ids: {number_token_ids}")
print(f"Vocab size: {last_logits.shape[0]}")
# Check if any index is out of bounds
vocab_size = last_logits.shape[0]
for i, idx in enumerate(number_token_ids):
if idx >= vocab_size:
print(f"ERROR: Index {idx} (class {i}) is out of bounds for vocab size {vocab_size}")
return None, None
probs_all = F.softmax(last_logits, dim=-1)
probs = probs_all[number_token_ids]
predicted_class = torch.argmax(probs).item()
confidence = probs[predicted_class].item()
return predicted_class, confidence
```
## Performance
- **Task**: Binary classification (Human vs AI content detection)
- **Classes**:
- Class 0: Human-written content
- Class 1: AI-generated content
- **Evaluation**: Tested on balanced validation set from RAID dataset
## Limitations
- Trained specifically on RAID dataset distribution
- Performance may vary on out-of-domain text
- Designed for English text classification
- Requires specific prompt format for optimal performance
## Technical Implementation
This model uses a custom approach with:
- **Reduced vocabulary**: Only uses token IDs for classes 0 and 1
- **Custom data collator**: Trains only on the last token of sequences
- **Token mapping**: Maps original vocabulary to reduced classification head
- **Parameter-efficient training**: Uses LoRA for efficient fine-tuning
## Citation
If you use this model in your research, please cite:
```
@misc{qwen3-ai-detector-2025,
title={Qwen3-0.6B AI Content Detector},
author={subhashbs36},
year={2025},
howpublished={Hugging Face Model Hub},
url={https://huggingface.co/subhashbs36/qwen3-0.6-ai-detector-lora}
}
```
## License
This model is released under the Apache 2.0 license, following the base model's licensing terms.
## Acknowledgments
- Built using [Unsloth](https://github.com/unslothai/unsloth) for efficient training
- Based on Qwen3-0.6B-Base by Alibaba Cloud
- Trained on RAID dataset for AI content detection research
- Utilizes LoRA (Low-Rank Adaptation) for parameter-efficient fine-tuning