File size: 5,526 Bytes
			
			| 61d56ed | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 | ---
language: en
license: mit
library_name: transformers
tags:
- text-classification
- character-analysis
- plot-arc
- narrative-analysis
- deberta-v3
- binary-classification
datasets:
- custom
metrics:
- accuracy
- f1
model-index:
- name: plot-arc-classifier
  results:
  - task:
      type: text-classification
      name: Character Plot Arc Classification
    dataset:
      type: custom
      name: Character Arc Dataset
    metrics:
    - type: accuracy
      value: 0.796
      name: Accuracy
    - type: f1
      value: 0.796
      name: F1 Score (Strong Class)
    - type: precision
      value: 0.777
      name: Precision (Strong Class)
    - type: recall
      value: 0.816
      name: Recall (Strong Class)
base_model: microsoft/deberta-v3-xsmall
---
# Plot Arc Character Classifier
A DeBERTa-v3-XSmall model fine-tuned to classify fictional characters based on their plot arc potential.
## Model Description
This model classifies character descriptions into two categories:
- **STRONG** (label 1): Characters with both internal conflict and external responsibilities/events
- **WEAK** (label 0): Characters with no plot arc, pure internal conflict only, or pure external events only
The model fixes critical bias issues where simple background characters (shopkeepers, guards) were incorrectly classified as plot-significant.
## Training Data
- **Dataset Size**: 11,888 balanced examples (50/50 split)
- **Training Examples**: 9,510
- **Validation Examples**: 2,378
- **Source**: Custom 4-way classified character descriptions from literature
### Label Mapping
- **STRONG (1)**: Characters classified as "BOTH" (internal conflict + external events)
- **WEAK (0)**: Characters classified as "NONE", "INTERNAL", or "EXTERNAL"
## Training Details
- **Base Model**: microsoft/deberta-v3-xsmall (22M parameters)
- **Training Time**: ~15 minutes
- **Batch Size**: 8 (with gradient accumulation = 2)
- **Max Sequence Length**: 384 tokens
- **Learning Rate**: 5e-5 with warmup
- **Early Stopping**: Yes (stopped at 3.7/5 epochs)
## Performance
### Validation Metrics
| Metric | Score |
|--------|-------|
| Accuracy | 79.6% |
| F1 (Strong) | 79.6% |
| Precision (Strong) | 77.7% |
| Recall (Strong) | 81.6% |
### Synthetic Test Results
**100% accuracy** on diverse test cases including previously problematic examples:
| Character Type | Example | Prediction | Confidence |
|----------------|---------|------------|------------|
| Background (NONE) | Baker, Guard | WEAK ✅ | 98.9%, 98.5% |
| Pure Internal | Haunted Artist | WEAK ✅ | 93.9% |
| Pure External | Military Commander | WEAK ✅ | 94.5% |
| Both (Internal+External) | Conflicted King | STRONG ✅ | 95.1% |
| Both (Trauma+Mission) | PTSD Captain | STRONG ✅ | 95.5% |
| Both (Doubt+Quest) | Uncertain Prophet | STRONG ✅ | 96.0% |
**Key Achievement**: Fixed critical bias where simple background characters were incorrectly classified as plot-significant.
## Usage
```python
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
# Load model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("plot-arc-classifier")
model = AutoModelForSequenceClassification.from_pretrained("plot-arc-classifier")
# Example usage
def classify_character(description):
    inputs = tokenizer(description, return_tensors="pt", truncation=True, max_length=384)
    
    with torch.no_grad():
        outputs = model(**inputs)
        probabilities = torch.nn.functional.softmax(outputs.logits, dim=-1)
        predicted_class = torch.argmax(probabilities, dim=-1).item()
    
    labels = {0: "WEAK", 1: "STRONG"}
    confidence = probabilities[0][predicted_class].item()
    
    return labels[predicted_class], confidence
# Test examples
examples = [
    "A baker who makes fresh bread daily and serves customers with a smile.",
    "A warrior haunted by past failures who must lead a desperate battle to save his homeland while confronting his inner demons.",
]
for desc in examples:
    label, conf = classify_character(desc)
    print(f"'{desc[:50]}...': {label} ({conf:.3f})")
```
## Model Improvements
This model addresses critical issues from previous versions:
1. **Fixed Bias**: No longer classifies simple background characters as STRONG
2. **Proper Discrimination**: Requires both internal and external elements for STRONG classification  
3. **Balanced Training**: 50/50 split prevents class imbalance issues
4. **Clean Taxonomy**: Based on proper 4-way character analysis
## Limitations
- Trained on English literary character descriptions
- May not generalize well to other domains (screenwriting, gaming, etc.)
- Performance may degrade on very short or very long descriptions
- Cultural bias toward Western narrative structures
## Ethical Considerations
This model is designed for narrative analysis and creative writing assistance. It should not be used to make judgments about real people or for any discriminatory purposes.
## Citation
If you use this model, please cite:
```bibtex
@misc{plot-arc-classifier-2024,
  title={Plot Arc Character Classifier},
  author={Generated with Claude Code},
  year={2024},
  url={https://huggingface.co/plot-arc-classifier}
}
```
## Training Infrastructure
- **Framework**: 🤗 Transformers
- **Hardware**: Apple Silicon (MPS)
- **Optimization**: Memory-optimized for MPS training
- **Early Stopping**: Enabled to prevent overfitting
---
🤖 Generated with [Claude Code](https://claude.ai/code)
Co-Authored-By: Claude <[email protected]> | 
