|
--- |
|
library_name: transformers |
|
tags: |
|
- lora |
|
- sequence-classification |
|
- end-of-utterance |
|
- multilingual |
|
- english |
|
- spanish |
|
license: apache-2.0 |
|
datasets: |
|
- marc-es/orga-dynamic-dataset |
|
model_type: llama |
|
language: |
|
- es |
|
- en |
|
base_model: |
|
- HuggingFaceTB/SmolLM2-135M-Instruct |
|
metrics: |
|
- accuracy |
|
--- |
|
|
|
# Orga Dynamic (1) — Bilingual End-of-Utterance Classifier |
|
|
|
**Orga Dynamic (1)** es un adaptador LoRA (Low-Rank Adaptation) entrenado para detectar automáticamente el **fin de turno** (End of Utterance, EOU) en conversaciones. |
|
|
|
- **Base model:** `HuggingFaceTB/SmolLM2-135M-Instruct` |
|
- **Method:** LoRA-r16 / α32 sobre `q_proj`, `k_proj`, `v_proj`, `o_proj` |
|
- **Training data:** 4 000 intervenciones |
|
- **Metrics (test 20 %)** |
|
|
|
| Metric | EN + ES | |
|
|--------|---------| |
|
| Accuracy | **0.951** | |
|
| F1 | **0.948** | |
|
|
|
|
|
--- |
|
|
|
## Model Details |
|
|
|
| | | |
|
|---|---| |
|
| **Languages** | English (en), Spanish (es) | |
|
| **Labels** | `0 = NO_EOU`, `1 = EOU` | |
|
| **Precision** | fp16 (LoRA weights ≈ 5 MB) | |
|
| **License** | Apache 2.0 | |
|
| **Author** | @marc-es | |
|
|
|
|
|
--- |
|
|
|
## Quick Start |
|
|
|
```python |
|
from transformers import AutoTokenizer, AutoModelForSequenceClassification |
|
from peft import PeftModel |
|
|
|
base = AutoModelForSequenceClassification.from_pretrained( |
|
"HuggingFaceTB/SmolLM2-135M-Instruct", num_labels=2) |
|
model = PeftModel.from_pretrained(base, "marc-es/orga-dynamic-1") |
|
tok = AutoTokenizer.from_pretrained("marc-es/orga-dynamic-1") |
|
|
|
def is_end(text): |
|
out = model(**tok(text, return_tensors="pt"))[0] |
|
return out.argmax(-1).item() == 1 # True = EOU |