# AI Content Detection with Qwen3-0.6B using Unsloth

This notebook demonstrates fine-tuning a Qwen3-0.6B model for AI content detection using the RAID dataset. We use LoRA (Low-Rank Adaptation) for parameter-efficient fine-tuning and implement a custom token mapping approach for binary classification.

## Table of Contents
1. [Setup and Installation](#setup)
2. [Model Configuration](#model-config)
3. [Data Preparation](#data-prep)
4. [Model Architecture Modification](#model-arch)
5. [Training](#training)
6. [Evaluation](#evaluation)
7. [Model Deployment](#deployment)

---

## 1. Setup and Installation {#setup}

First, we install the required dependencies including Unsloth for efficient training.


In [None]:
%%capture
import os
if "COLAB_" not in "".join(os.environ.keys()):
    !pip install unsloth
else:
    # Do this only in Colab notebooks! Otherwise use pip install unsloth
    !pip install --no-deps bitsandbytes accelerate xformers==0.0.29.post3 peft trl==0.15.2 triton cut_cross_entropy unsloth_zoo
    !pip install sentencepiece protobuf "datasets>=3.4.1" huggingface_hub hf_transfer
    !pip install --no-deps unsloth

### Import Required Libraries

We import all necessary libraries for model training, data processing, and evaluation.


In [None]:
# needed as this function doesn't like it when the lm_head has its size changed
from unsloth import tokenizer_utils
def do_nothing(*args, **kwargs):
    pass
tokenizer_utils.fix_untrained_tokens = do_nothing

ü¶• Unsloth: Will patch your computer to enable 2x faster free finetuning.


2025-05-27 08:25:28.067389: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:477] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
E0000 00:00:1748334328.287741      35 cuda_dnn.cc:8310] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1748334328.352047      35 cuda_blas.cc:1418] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered


ü¶• Unsloth Zoo will now patch everything to make training faster!


In [None]:
import torch
major_version, minor_version = torch.cuda.get_device_capability()
print(f"Major: {major_version}, Minor: {minor_version}")
from datasets import load_dataset
import datasets
from trl import SFTTrainer
import pandas as pd
import numpy as np
import os
import pandas as pd
import numpy as np
from unsloth import FastLanguageModel
from trl import SFTTrainer
from transformers import TrainingArguments, Trainer
from typing import Tuple
import warnings
from typing import Any, Dict, List, Union
from transformers import DataCollatorForLanguageModeling
from sklearn.model_selection import train_test_split
import matplotlib.pyplot as plt

---

## 2. Model Configuration {#model-config}

We configure the model parameters and load the base Qwen3-0.6B model. This section sets up the foundation for our AI content detection task.

### Key Parameters:
- **NUM_CLASSES**: 2 (Human vs AI)
- **max_seq_length**: 4096 tokens
- **dtype**: float16 for Tesla T4 compatibility


In [None]:
NUM_CLASSES = 2 # number of classes in the csv

max_seq_length = 4096 # Choose any! We auto support RoPE Scaling internally!
dtype = torch.float16 # None for auto detection. Float16 for Tesla T4, V100, Bfloat16 for Ampere+

model_name = "Qwen/Qwen3-0.6B-Base";load_in_4bit = False
# model_name = "unsloth/Qwen3-4B-Base";load_in_4bit = False

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = model_name,load_in_4bit = load_in_4bit,
    max_seq_length = max_seq_length,
    dtype = dtype,
)


---

## 3. Model Architecture Modification {#model-arch}

We modify the model architecture to create a custom classification head that only outputs predictions for our 2 classes (Human vs AI). This approach uses token mapping to convert the language modeling task into a classification task.

### Custom Classification Head
We trim the classification head so the model can only predict numbers 0-1 corresponding to our classes.


In [None]:
import torch.nn as nn

number_token_ids = []
for i in range(NUM_CLASSES):
    number_token_ids.append(tokenizer.encode(str(i), add_special_tokens=False)[0])

# Extract the weights for your number tokens
par = torch.nn.Parameter(model.lm_head.weight[number_token_ids, :])

# Replace lm_head with reduced size
model.lm_head = nn.Linear(model.config.hidden_size, NUM_CLASSES, bias=False)

# Initialize with the extracted weights
model.lm_head.weight.data = par.data

reverse_map = {value: idx for idx, value in enumerate(number_token_ids)}


### LoRA Configuration

We apply LoRA (Low-Rank Adaptation) for parameter-efficient fine-tuning, targeting key attention and MLP layers while excluding the custom classification head.


In [None]:
from peft import LoftQConfig

model = FastLanguageModel.get_peft_model(
    model,
    r = 16,
    target_modules = [
        # "lm_head", # can easily be trained because it now has a small size
        "q_proj", "k_proj", "v_proj", "o_proj",
        "gate_proj", "up_proj", "down_proj",],
    lora_alpha = 16,
    lora_dropout = 0, # Supports any, but = 0 is optimized
    bias = "none",    # Supports any, but = "none" is optimized
    use_gradient_checkpointing = "unsloth",
    random_state = 3407,
    use_rslora = True,  # We support rank stabilized LoRA
    # init_lora_weights = 'loftq',
    # loftq_config = LoftQConfig(loftq_bits = 4, loftq_iter = 1), # And LoftQ
)
print("trainable parameters:", sum(p.numel() for p in model.parameters() if p.requires_grad))

Unsloth 2025.5.7 patched 28 layers with 28 QKV layers, 28 O layers and 28 MLP layers.


trainable parameters: 10092544


---

## 4. Data Preparation {#data-prep}

We load and prepare the RAID dataset for training. The dataset contains text samples labeled as either human-written or AI-generated.

### Dataset Loading and Balancing


In [None]:
kaggle = os.getcwd() == "/kaggle/working"
input_dir = "/kaggle/input/raid-dataset/" if kaggle else "data/"
data = pd.read_csv(input_dir + "train_none.csv")[['generation', 'model']]
data.rename(columns={'generation': 'text'}, inplace=True)
data['label'] = (data['model'] != 'human').astype(int)
data.drop('model', axis=1, inplace=True)

# Check current distribution
print("Original distribution:")
print(data['label'].value_counts())
print(f"Total samples available: {len(data)}")

# Create balanced dataset with exactly 13,000 samples of each class
class_0_samples = data[data['label'] == 0]  # Human samples
class_1_samples = data[data['label'] == 1]  # AI samples

print(f"\nAvailable samples:")
print(f"Class 0 (Human): {len(class_0_samples)} samples")
print(f"Class 1 (AI): {len(class_1_samples)} samples")

# Sample exactly 13,000 from each class
class_0_count = 5000
class_1_count = 5000

# Sample from each class (you have enough samples for both)
sampled_class_0 = class_0_samples.sample(n=class_0_count, random_state=42)
sampled_class_1 = class_1_samples.sample(n=class_1_count, random_state=42)

# Combine the samples
balanced_data = pd.concat([sampled_class_0, sampled_class_1], ignore_index=True)

# Shuffle the combined dataset
balanced_data = balanced_data.sample(frac=1, random_state=42).reset_index(drop=True)

print(f"\nNew balanced distribution:")
print(balanced_data['label'].value_counts())
print(f"Total samples in balanced dataset: {len(balanced_data)}")

# Split into train and validation (keeping the 26,000 total)
train_df, val_df = train_test_split(
    balanced_data,
    train_size=8000,  # Use 24k for training, 2k for validation
    stratify=balanced_data['label'],
    random_state=42
)

print(f"\nTrain distribution:")
print(train_df['label'].value_counts())
print(f"Validation distribution:")
print(val_df['label'].value_counts())

train_df.head()


### Prompt Template Design

We design a structured prompt template that clearly defines the classification task for the model.


In [None]:
prompt = """Here is a text sample:
{}

Classify this text into one of the following:
class 0: Human
class 1: AI

SOLUTION
The correct answer is: class {}"""


def formatting_prompts_func(dataset_):
    texts = []
    for i in range(len(dataset_['text'])):
        text_ = dataset_['text'].iloc[i]
        label_ = str(dataset_['label'].iloc[i])

        # Format prompt + label, then add EOS
        text = prompt.format(text_, label_)
        texts.append(text)
    return texts

# apply formatting_prompts_func to train_df
train_df['text'] = formatting_prompts_func(train_df)
train_dataset = datasets.Dataset.from_pandas(train_df,preserve_index=False)

### Custom Data Collator

We implement a custom data collator that focuses training on the last token of each sequence, which contains the classification prediction.


In [None]:
from typing import List, Union, Any, Dict
from transformers import DataCollatorForLanguageModeling

class DataCollatorForLastTokenLM(DataCollatorForLanguageModeling):
    def __init__(
        self,
        *args,
        mlm: bool = False,
        ignore_index: int = -100,
        **kwargs,
    ):
        super().__init__(*args, mlm=mlm, **kwargs)
        self.ignore_index = ignore_index

    def torch_call(self, examples: List[Union[List[int], Any, Dict[str, Any]]]) -> Dict[str, Any]:
        batch = super().torch_call(examples)

        for i in range(len(examples)):
            # Find the last non-padding token
            last_token_idx = (batch["labels"][i] != self.ignore_index).nonzero()[-1].item()
            # Set all labels to ignore_index except for the last token
            batch["labels"][i, :last_token_idx] = self.ignore_index

            # Get the current token ID
            current_token_id = batch["labels"][i, last_token_idx].item()

            # Check if token exists in reverse_map before mapping
            if current_token_id in reverse_map:
                batch["labels"][i, last_token_idx] = reverse_map[current_token_id]
            else:
                # Handle missing token IDs gracefully
                print(f"Warning: Token ID {current_token_id} ({tokenizer.decode([current_token_id]) if hasattr(tokenizer, 'decode') else 'unknown'}) not found in reverse_map")
                # You can choose one of these strategies:
                # Option 1: Use a default mapping (e.g., map to a special token)
                batch["labels"][i, last_token_idx] = 0  # or tokenizer.unk_token_id
                # Option 2: Skip this example entirely
                # continue
                # Option 3: Keep the original token (no mapping)
                # pass

        return batch

# Initialize the collator with your tokenizer
collator = DataCollatorForLastTokenLM(tokenizer=tokenizer)


---

## 5. Training {#training}

We configure and execute the training process using Hugging Face's SFTTrainer with optimized settings for our classification task.

### Training Configuration
- **Batch size**: 2 per device
- **Learning rate**: 1e-4
- **Optimizer**: AdamW 8-bit for memory efficiency
- **Epochs**: 1 (sufficient for this task)


In [None]:
trainer = SFTTrainer(
    model = model,
    tokenizer = tokenizer,
    train_dataset = train_dataset,
    max_seq_length = max_seq_length,
    dataset_num_proc = 1,
    packing = False, # not needed because group_by_length is True
    args = TrainingArguments(
        per_device_train_batch_size = 2,
        gradient_accumulation_steps = 1,
        warmup_steps = 10,
        learning_rate = 1e-4,
        fp16 = True,
        bf16 = False,
        logging_steps = 1,
        optim = "adamw_8bit",
        weight_decay = 0.01,
        lr_scheduler_type = "cosine",
        seed = 3407,
        output_dir = "outputs",
        num_train_epochs = 1,
        # report_to = "wandb",
        report_to = "none",
        group_by_length = True,
    ),
    data_collator=collator,
    dataset_text_field="text",
)

Unsloth: Tokenizing ["text"]:   0%|          | 0/8000 [00:00<?, ? examples/s]

### Memory Usage Monitoring

Track GPU memory usage before training to optimize resource allocation.


In [None]:
#@title Show current memory stats
gpu_stats = torch.cuda.get_device_properties(0)
start_gpu_memory = round(torch.cuda.max_memory_reserved() / 1024 / 1024 / 1024, 3)
max_memory = round(gpu_stats.total_memory / 1024 / 1024 / 1024, 3)
print(f"GPU = {gpu_stats.name}. Max memory = {max_memory} GB.")
print(f"{start_gpu_memory} GB of memory reserved.")

### Execute Training

Run the training process and monitor performance metrics.


In [None]:
trainer_stats = trainer.train()

### Training Statistics

Display final memory usage and training time statistics.


In [None]:
#@title Show final memory and time stats
used_memory = round(torch.cuda.max_memory_reserved() / 1024 / 1024 / 1024, 3)
used_memory_for_lora = round(used_memory - start_gpu_memory, 3)
used_percentage = round(used_memory         /max_memory*100, 3)
lora_percentage = round(used_memory_for_lora/max_memory*100, 3)
print(f"{trainer_stats.metrics['train_runtime']} seconds used for training.")
print(f"{round(trainer_stats.metrics['train_runtime']/60, 2)} minutes used for training.")
print(f"Peak reserved memory = {used_memory} GB.")
print(f"Peak reserved memory for training = {used_memory_for_lora} GB.")
print(f"Peak reserved memory % of max memory = {used_percentage} %.")
print(f"Peak reserved memory for training % of max memory = {lora_percentage} %.")

1063.9641 seconds used for training.
17.73 minutes used for training.
Peak reserved memory = 3.744 GB.
Peak reserved memory for training = 2.316 GB.
Peak reserved memory % of max memory = 25.399 %.
Peak reserved memory for training % of max memory = 15.711 %.


---

## 6. Evaluation {#evaluation}

We evaluate the trained model on the validation set using batched inference for efficiency.

### Model Preparation for Inference


In [None]:
FastLanguageModel.for_inference(model) # Enable native 2x faster inference
print()




In [None]:
# Save the fine-tuned model
model.save_pretrained("./qwen-classification-model")
tokenizer.save_pretrained("./qwen-classification-model")

# If using LoRA, save the adapter
if hasattr(model, 'save_pretrained'):
    model.save_pretrained("./qwen-lora-adapter")


### Single Text Classification Function

Create a function for classifying individual text samples.


In [None]:
# Load the saved model
from unsloth import FastLanguageModel

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = "./qwen-classification-model",
    max_seq_length = max_seq_length,
    dtype = dtype,
)

# Test function
def classify_text(text_sample):
    test_prompt = f"""Here is a text sample:
{text_sample}

Classify this text into one of the following:
class 0: Human
class 1: AI

SOLUTION
The correct answer is: class """

    inputs = tokenizer(test_prompt, return_tensors="pt")

    # Move inputs to the same device as the model
    device = next(model.parameters()).device
    inputs = {k: v.to(device) for k, v in inputs.items()}

    with torch.no_grad():
        outputs = model(**inputs)
        logits = outputs.logits[0, -1, :NUM_CLASSES]  # Get last token logits for your classes
        predicted_class = torch.argmax(logits).item()

    return predicted_class


# Test examples
test_texts = [
    "This is a sample human-written text about daily life.",
    "The algorithm processes data through multiple neural network layers."
]

for text in test_texts:
    prediction = classify_text(text)
    print(f"Text: {text[:50]}...")
    print(f"Prediction: {'AI' if prediction == 1 else 'Human'}\n")


==((====))==  Unsloth 2025.5.7: Fast Qwen3 patching. Transformers: 4.51.3.
   \\   /|    Tesla T4. Num GPUs = 2. Max memory: 14.741 GB. Platform: Linux.
O^O/ \_/ \    Torch: 2.6.0+cu124. CUDA: 7.5. CUDA Toolkit: 12.4. Triton: 3.2.0
\        /    Bfloat16 = FALSE. FA [Xformers = 0.0.29.post3. FA2 = False]
 "-____-"     Free license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!
Text: This is a sample human-written text about daily li...
Prediction: Human

Text: The algorithm processes data through multiple neur...
Prediction: Human



### Comprehensive Batch Evaluation

Perform comprehensive evaluation on the entire validation set using efficient batched inference.


In [None]:
# If you have a test dataset
def evaluate_model(test_df):
    predictions = []
    true_labels = []

    for idx, row in test_df.iterrows():
        pred = classify_text(row['text'])
        predictions.append(pred)
        true_labels.append(row['label'])

    from sklearn.metrics import accuracy_score, classification_report

    accuracy = accuracy_score(true_labels, predictions)
    report = classification_report(true_labels, predictions,
                                 target_names=['Human', 'AI'])

    print(f"Accuracy: {accuracy:.4f}")
    print("\nClassification Report:")
    print(report)

    return predictions

# Run evaluation
# predictions = evaluate_model(val_df.iloc[-1])


# Batched Inference on Validation Set

In [None]:
import torch
import torch.nn.functional as F
from tqdm import tqdm
import random

# Prepare inference prompt template using your existing prompt structure
inference_prompt_template = prompt.split("class {}")[0] + "class "

# Sort validation set by length for efficient batching
val_df['token_length'] = val_df['text'].apply(lambda x: len(tokenizer.encode(x, add_special_tokens=False)))
val_df_sorted = val_df.sort_values(by='token_length').reset_index(drop=True)

# Parameters
display = 50
batch_size = 4
device = next(model.parameters()).device  # More robust device detection
correct = 0
results = []

# Evaluation loop with inference mode
with torch.inference_mode():
    for i in tqdm(range(0, len(val_df_sorted), batch_size), desc="Evaluating"):
        batch = val_df_sorted.iloc[i:i+batch_size]
        prompts = [inference_prompt_template.format(text) for text in batch['text']]

        # Tokenize and move to device
        inputs = tokenizer(
            prompts,
            return_tensors="pt",
            padding=True,
            truncation=True,
            max_length=max_seq_length
        ).to(device)

        # Get model predictions
        logits = model(**inputs).logits
        last_idxs = inputs.attention_mask.sum(1) - 1
        last_logits = logits[torch.arange(len(batch)), last_idxs, :]

        # Apply softmax and extract probabilities for number tokens only
        probs_all = F.softmax(last_logits, dim=-1)
        probs = probs_all[:, number_token_ids]  # Keep only logits for number tokens
        preds = torch.argmax(probs, dim=-1).cpu().numpy()

        # Calculate accuracy
        true_labels = batch['label'].tolist()
        correct += sum([p == t for p, t in zip(preds, true_labels)])

        # Store results for analysis
        for j in range(len(batch)):
            results.append({
                "text": batch['text'].iloc[j][:200],  # Truncate for display
                "true": true_labels[j],
                "pred": preds[j],
                "probs": probs[j].float().cpu().numpy(),  # All class probabilities
                "ok": preds[j] == true_labels[j]
            })

# Calculate and display accuracy
accuracy = 100 * correct / len(val_df_sorted)
print(f"\nValidation accuracy: {accuracy:.2f}% ({correct}/{len(val_df_sorted)})")

# Display random sample results
print(f"\n--- Random samples (showing {min(display, len(results))} out of {len(results)}) ---")
for s in random.sample(results, min(display, len(results))):
    print(f"\nText: {s['text']}")
    print(f"True: {s['true']} ({'Human' if s['true'] == 0 else 'AI'})  "
          f"Pred: {s['pred']} ({'Human' if s['pred'] == 0 else 'AI'}) "
          f"{'‚úÖ' if s['ok'] else '‚ùå'}")
    print("Probs:", ", ".join([f"class {k}: {v:.3f}" for k, v in enumerate(s['probs'])]))

# Additional metrics for better evaluation
correct_by_class = {0: 0, 1: 0}
total_by_class = {0: 0, 1: 0}

for result in results:
    true_label = result['true']
    total_by_class[true_label] += 1
    if result['ok']:
        correct_by_class[true_label] += 1

print(f"\n--- Per-class accuracy ---")
for class_id in [0, 1]:
    class_name = 'Human' if class_id == 0 else 'AI'
    class_acc = 100 * correct_by_class[class_id] / total_by_class[class_id] if total_by_class[class_id] > 0 else 0
    print(f"Class {class_id} ({class_name}): {class_acc:.2f}% ({correct_by_class[class_id]}/{total_by_class[class_id]})")

# Clean up
if 'token_length' in val_df:
    del val_df['token_length']


Evaluating: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 500/500 [01:29<00:00,  5.58it/s]


Validation accuracy: 91.50% (1830/2000)

--- Random samples (showing 50 out of 2000) ---

Text: Ingredients (servings 4): 1 pound chorizio sausage; 2 tablespoons olive oil divided use ; 3 cloves garlic minced or crushed with presser tool in kitchen set, salt and pepper to taste if desired. Direc
True: 1 (AI)  Pred: 1 (AI) ‚úÖ
Probs: class 0: 0.000, class 1: 0.999

Text:  A seemingly shy and humble country boy named Luther Sellers is discovered to have a magnificent voice and mesmerizing stage presence. He is given the stage name Stag Preston and after a short time on
True: 0 (Human)  Pred: 0 (Human) ‚úÖ
Probs: class 0: 0.134, class 1: 0.000

Text: The story opens with Bathsheba Everdeen, an independent and capable young woman who inherits her uncle's farm in the English countryside after his death. With no prior experience managing such a large
True: 1 (AI)  Pred: 1 (AI) ‚úÖ
Probs: class 0: 0.000, class 1: 0.998

Text: Ukraine has agreed to pay 30% more for natural gas supplied by Tur




In [None]:
# stop running all cells
1/0

---

## 7. Model Deployment {#deployment}

Deploy the trained model to Hugging Face Hub for easy access and sharing.

### Upload to Hugging Face Hub


In [None]:
# Step 3: Push to Hugging Face (replace with your username and token)
print("Uploading to Hugging Face...")
# Only save LoRA adapter (no base model)
model.push_to_hub("subhashbs36/qwen3-0.6-ai-detector-merged", tokenizer, save_method = "merged_16bit", token="")
# tokenizer.push_to_hub("subhashbs36/qwen3-0.6-ai-detector-lora", token="")

print("Model saved successfully!")

Uploading to Hugging Face...
Saved model to https://huggingface.co/subhashbs36/qwen3-0.6-ai-detector-merged
Model saved successfully!


# Inference

### Load and Test Deployed Model

Load the deployed model from Hugging Face Hub and test its functionality.


In [None]:
from unsloth import FastLanguageModel
import torch

# Load base model first
model, tokenizer = FastLanguageModel.from_pretrained(
    model_name="subhashbs36/qwen3-0.6-ai-detector-merged",
    max_seq_length=4096,
    dtype=torch.float16,
    load_in_4bit=False,
)

# Load your LoRA adapter
# model.load_adapter("subhashbs36/qwen3-0.6-ai-detector-lora")

# Enable inference mode
FastLanguageModel.for_inference(model)

==((====))==  Unsloth 2025.5.7: Fast Qwen3 patching. Transformers: 4.51.3.
   \\   /|    Tesla T4. Num GPUs = 2. Max memory: 14.741 GB. Platform: Linux.
O^O/ \_/ \    Torch: 2.6.0+cu124. CUDA: 7.5. CUDA Toolkit: 12.4. Triton: 3.2.0
\        /    Bfloat16 = FALSE. FA [Xformers = 0.0.29.post3. FA2 = False]
 "-____-"     Free license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!


PeftModelForCausalLM(
  (base_model): LoraModel(
    (model): Qwen3ForCausalLM(
      (model): Qwen3Model(
        (embed_tokens): Embedding(151936, 1024, padding_idx=151654)
        (layers): ModuleList(
          (0-27): 28 x Qwen3DecoderLayer(
            (self_attn): Qwen3Attention(
              (q_proj): lora.Linear(
                (base_layer): Linear(in_features=1024, out_features=2048, bias=False)
                (lora_dropout): ModuleDict(
                  (default): Identity()
                )
                (lora_A): ModuleDict(
                  (default): Linear(in_features=1024, out_features=16, bias=False)
                )
                (lora_B): ModuleDict(
                  (default): Linear(in_features=16, out_features=2048, bias=False)
                )
                (lora_embedding_A): ParameterDict()
                (lora_embedding_B): ParameterDict()
                (lora_magnitude_vector): ModuleDict()
              )
              (k_proj): lora.Linear

In [None]:
import os
import torch
import torch.nn.functional as F

# Enable CUDA debugging for accurate stack trace
# os.environ['CUDA_LAUNCH_BLOCKING'] = '1'

def classify_text_fixed(text_sample):
    prompt = f"""Here is a text sample:
{text_sample}

Classify this text into one of the following:
class 0: Human
class 1: AI

SOLUTION
The correct answer is: class """

    inputs = tokenizer(prompt, return_tensors="pt")
    device = next(model.parameters()).device
    inputs = {k: v.to(device) for k, v in inputs.items()}

    with torch.no_grad():
        outputs = model(**inputs)

        # Fix: Get the last token index as a scalar, not tensor
        last_token_idx = (inputs['attention_mask'].sum(1) - 1).item()
        last_logits = outputs.logits[0, last_token_idx, :]

        # Debug information
        print(f"Logits shape: {last_logits.shape}")
        print(f"Number token ids: {number_token_ids}")
        print(f"Vocab size: {last_logits.shape[0]}")

        # Check if any index is out of bounds
        vocab_size = last_logits.shape[0]
        for i, idx in enumerate(number_token_ids):
            if idx >= vocab_size:
                print(f"ERROR: Index {idx} (class {i}) is out of bounds for vocab size {vocab_size}")
                return None, None

        probs_all = F.softmax(last_logits, dim=-1)
        probs = probs_all[number_token_ids]
        predicted_class = torch.argmax(probs).item()
        confidence = probs[predicted_class].item()

    return predicted_class, confidence

### Final Testing

Test the deployed model with diverse examples to verify its performance across different text types.


In [None]:

NUM_CLASSES = 2
number_token_ids = []
for i in range(NUM_CLASSES):
    number_token_ids.append(tokenizer.encode(str(i), add_special_tokens=False)[0])


test_texts = [
    # AI-generated content examples
    "In quiet moments, when I close my eyes,",
    "They're both sitting next to each other, looking out the window at the rain. One starts to sniff the other's butt, and the other one just kind of tolerates it for a few seconds before finally pushing them away. It's like they're just trying to pass the time until the storm breaks.",

    # Human-written examples
    "This was the biggest surprise of the year - bar none.<br/><br/>A great comedy - as in laughs, feel good, and just plain enjoyable.<br/><br/>The plot of the loser who makes good at rock'n'roll second time around is very Jack Black and Rainn Wilson does a GREAT job - there is no recourse to gross or sarcastic humor here - rather it plays on its rock roots, chucks in some stupidity, and some kick-ass tunes, and lots of excellent one liners and like I say totally surprised us as to how genuinely funny and warm-hearted this is.<br/><br/>Great cast and a great script - sure it's not perfect, but after all the pseudo-comedy and angst of 2008 it was refreshing just to sit back and enjoy an entertaining movie.<br/><br/>we loved it and are normally really cynical about comedies - but this rocks and would recommend to anyone as a real effort by all involved to stop being smarter than the audience and just enjoy life - in two and a half words? - it rocks!",
    "Ch√¢teau Vaudreuil was a stately residence and college in Montreal, Quebec, Canada. It was constructed between 1723 and 1726 for Philippe de Rigaud, Marquis de Vaudreuil, as his private residence by Gaspard-Joseph Chaussegros de L√©ry. Though the Ch√¢teau Saint-Louis in Quebec City remained the official residence of the Governors General of New France, the Ch√¢teau Vaudreuil was to remain as their official home in Montreal up until the British Conquest in 1763. In 1767, it was purchased by the Marquis de Lotbini√®re. He sold it in 1773, when it became the Coll√®ge Saint-Rapha√´l. It was destroyed by a fire in 1803. Completed in 1726, it was built in the classical style of the French H√¥tel Particulier by King Louis XV's chief engineer in New France, Gaspard-Joseph Chaussegros de L√©ry. The central building was flanked by two wings with two sets of semi-circular stairs leading up to a terrace and the main entrance. It stood beyond the end of Rue Saint-Paul, which was kept clear of buildings on that side to afford it a clear view, while formal gardens led up to Notre-Dame Street.",
    ]

# Test the fixed function
for text in test_texts:
  # pred, conf = classify_text_fixed(val_df.iloc[4]["text"])
  pred, conf = classify_text_fixed(text)
  label = 'Human' if pred == 0 else 'AI'
  print(f"Text: {text[:50]}")
  print(f"Prediction: {label} (confidence: {confidence:.3f})\n")

Logits shape: torch.Size([151936])
Number token ids: [15, 16]
Vocab size: 151936
Text: In quiet moments, when I close my eyes,
Prediction: AI (confidence: 0.992)

Logits shape: torch.Size([151936])
Number token ids: [15, 16]
Vocab size: 151936
Text: They're both sitting next to each other, looking o
Prediction: AI (confidence: 0.992)

Logits shape: torch.Size([151936])
Number token ids: [15, 16]
Vocab size: 151936
Text: This was the biggest surprise of the year - bar no
Prediction: Human (confidence: 0.992)

Logits shape: torch.Size([151936])
Number token ids: [15, 16]
Vocab size: 151936
Text: Ch√¢teau Vaudreuil was a stately residence and coll
Prediction: Human (confidence: 0.992)



---

## Conclusion

This notebook demonstrates a complete pipeline for fine-tuning a language model for AI content detection using:

1. **Custom Architecture**: Modified classification head with token mapping
2. **Parameter-Efficient Training**: LoRA for reduced computational requirements
3. **Balanced Dataset**: Carefully curated RAID dataset samples
4. **Comprehensive Evaluation**: Batch inference with detailed metrics
5. **Model Deployment**: Easy sharing via Hugging Face Hub

### Key Results:
- Achieved high accuracy on validation set
- Efficient training with minimal GPU memory usage
- Robust performance across different text types
- Easy deployment and inference capabilities

### Next Steps:
- Test on additional domains and text types
- Experiment with different model sizes
- Implement adversarial robustness testing
- Deploy for production use cases


### Saving to float16 for VLLM

We also support saving to `float16` directly. Select `merged_16bit` for float16 or `merged_4bit` for int4. We also allow `lora` adapters as a fallback. Use `push_to_hub_merged` to upload to your Hugging Face account! You can go to https://huggingface.co/settings/tokens for your personal tokens.

In [None]:
# # Merge to 16bit
# if False: model.save_pretrained_merged("hf/model", tokenizer, save_method = "merged_16bit",)
# if False: model.push_to_hub_merged("hf/model", tokenizer, save_method = "merged_16bit", token = "")

# # Merge to 4bit
# if False: model.save_pretrained_merged("hf/model", tokenizer, save_method = "merged_4bit",)
# if False: model.push_to_hub_merged("hf/model", tokenizer, save_method = "merged_4bit", token = "")

# # Just LoRA adapters
# if False: model.save_pretrained_merged("model", tokenizer, save_method = "lora",)
# if False: model.push_to_hub_merged("hf/model", tokenizer, save_method = "lora", token = "")

### GGUF / llama.cpp Conversion
To save to `GGUF` / `llama.cpp`, we support it natively now! We clone `llama.cpp` and we default save it to `q8_0`. We allow all methods like `q4_k_m`. Use `save_pretrained_gguf` for local saving and `push_to_hub_gguf` for uploading to HF.

In [None]:
# # Save to 8bit Q8_0
# if False: model.save_pretrained_gguf("model", tokenizer,)
# if False: model.push_to_hub_gguf("hf/model", tokenizer, token = "")

# # Save to 16bit GGUF
# if False: model.save_pretrained_gguf("model", tokenizer, quantization_method = "f16")
# if False: model.push_to_hub_gguf("hf/model", tokenizer, quantization_method = "f16", token = "")

# # Save to q4_k_m GGUF
# if False: model.save_pretrained_gguf("model", tokenizer, quantization_method = "q4_k_m")
# if False: model.push_to_hub_gguf("hf/model", tokenizer, quantization_method = "q4_k_m", token = "")

Now, use the `model-unsloth.gguf` file or `model-unsloth-Q4_K_M.gguf` file in `llama.cpp` or a UI based system like `GPT4All`. You can install GPT4All by going [here](https://gpt4all.io/index.html).

And we're done! If you have any questions on Unsloth, we have a [Discord](https://discord.gg/u54VK8m8tk) channel! If you find any bugs or want to keep updated with the latest LLM stuff, or need help, join projects etc, feel free to join our Discord!

Some other links:
1. Zephyr DPO 2x faster [free Colab](https://colab.research.google.com/drive/15vttTpzzVXv_tJwEk-hIcQ0S9FcEWvwP?usp=sharing)
2. Llama 7b 2x faster [free Colab](https://colab.research.google.com/drive/1lBzz5KeZJKXjvivbYvmGarix9Ao6Wxe5?usp=sharing)
3. TinyLlama 4x faster full Alpaca 52K in 1 hour [free Colab](https://colab.research.google.com/drive/1AZghoNBQaMDgWJpi4RbffGM1h6raLUj9?usp=sharing)
4. CodeLlama 34b 2x faster [A100 on Colab](https://colab.research.google.com/drive/1y7A0AxE3y8gdj4AVkl2aZX47Xu3P1wJT?usp=sharing)
5. Llama 7b [free Kaggle](https://www.kaggle.com/danielhanchen/unsloth-alpaca-t4-ddp)
6. We also did a [blog](https://huggingface.co/blog/unsloth-trl) with ü§ó HuggingFace, and we're in the TRL [docs](https://huggingface.co/docs/trl/main/en/sft_trainer#accelerate-fine-tuning-2x-using-unsloth)!

<div class="align-center">
  <a href="https://github.com/unslothai/unsloth"><img src="https://github.com/unslothai/unsloth/raw/main/images/unsloth%20new%20logo.png" width="115"></a>
  <a href="https://discord.gg/u54VK8m8tk"><img src="https://github.com/unslothai/unsloth/raw/main/images/Discord.png" width="145"></a>
  <a href="https://ko-fi.com/unsloth"><img src="https://github.com/unslothai/unsloth/raw/main/images/Kofi button.png" width="145"></a></a> Support our work if you can! Thanks!
</div>