SpanMarker
This is a SpanMarker model that can be used for Named Entity Recognition.
Model Details
Model Description
- Model Type: SpanMarker
- Maximum Sequence Length: 512 tokens
- Maximum Entity Length: 12 words
Model Sources
Model Labels
| Label |
Examples |
| action |
"Remind", "scheduled", "review" |
| app_data_type |
"items", "images", "videos" |
| app_name |
"Camera", "phone", "Slack" |
| contact_info |
"sarah . lee @ company . org", "123 Maple Street , Springfield", "home address" |
| date |
"20 . 10 . 1999", "before", "January 18 - June 15" |
| event_title |
"team sync", "Marketing Strategy Meeting", "Budget Planning" |
| file_name |
"notes", "budget_overview . xlsx", "project_plan . docx" |
| file_size |
"under 500 kb", "smaller than 50 kb", "exceeding 100 mb" |
| file_type |
"documents", "document", "image" |
| folder_name |
"Projects", "Work", "Photos" |
| in_file_data |
"appendix section", "page 10", "section 5" |
| limits |
"top 8", "all", "every" |
| location |
"Room 204", "server room", "library" |
| person_name |
"Jonathan Kim", "Mr . Osei", "Lucas Müller" |
| relationship |
"manager", "brother", "cousin" |
| setting |
"brightness", "airplane mode", "notifications" |
| system_command |
"disable", "move", "switch on" |
| time |
"9 : 00 AM", "10 : 45", "10 : 00 AM" |
Evaluation
Metrics
| Label |
Precision |
Recall |
F1 |
| all |
0.8559 |
0.8813 |
0.8684 |
| action |
0.8173 |
0.9245 |
0.8676 |
| app_data_type |
0.7960 |
0.6828 |
0.7351 |
| app_name |
0.9432 |
0.9432 |
0.9432 |
| contact_info |
0.8722 |
0.9091 |
0.8903 |
| date |
0.9160 |
0.8993 |
0.9076 |
| event_title |
0.8659 |
0.9107 |
0.8877 |
| file_name |
0.9371 |
0.9280 |
0.9326 |
| file_size |
0.7810 |
0.7810 |
0.7810 |
| file_type |
0.7731 |
0.8786 |
0.8225 |
| folder_name |
0.9618 |
0.8968 |
0.9282 |
| in_file_data |
0.7486 |
0.7867 |
0.7672 |
| limits |
0.9048 |
0.6786 |
0.7755 |
| location |
0.8917 |
0.8571 |
0.8741 |
| person_name |
0.9885 |
0.9885 |
0.9885 |
| relationship |
0.9505 |
0.9541 |
0.9523 |
| setting |
0.8974 |
0.9255 |
0.9112 |
| system_command |
0.7889 |
0.7441 |
0.7659 |
| time |
0.9076 |
0.8587 |
0.8825 |
Uses
Direct Use for Inference
from span_marker import SpanMarkerModel
model = SpanMarkerModel.from_pretrained("span_marker_model_id")
entities = model.predict("Text my mother at + 44 7911 123456 the summary from paragraph 4, and then enable bluetooth")
Downstream Use
You can finetune this model on your own dataset.
Click to expand
from span_marker import SpanMarkerModel, Trainer
model = SpanMarkerModel.from_pretrained("span_marker_model_id")
dataset = load_dataset("conll2003")
trainer = Trainer(
model=model,
train_dataset=dataset["train"],
eval_dataset=dataset["validation"],
)
trainer.train()
trainer.save_model("span_marker_model_id-finetuned")
Training Details
Training Set Metrics
| Training set |
Min |
Median |
Max |
| Sentence length |
3 |
19.0206 |
53 |
| Entities per sentence |
1 |
5.7015 |
13 |
Training Hyperparameters
- learning_rate: 5e-05
- train_batch_size: 32
- eval_batch_size: 32
- seed: 42
- optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: linear
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 5
- mixed_precision_training: Native AMP
Training Results
| Epoch |
Step |
Validation Loss |
Validation Precision |
Validation Recall |
Validation F1 |
Validation Accuracy |
| 1.8553 |
1000 |
0.0344 |
0.8301 |
0.8650 |
0.8472 |
0.9204 |
| 3.7106 |
2000 |
0.0271 |
0.8524 |
0.8804 |
0.8662 |
0.9316 |
Framework Versions
- Python: 3.12.12
- SpanMarker: 1.7.0
- Transformers: 4.51.3
- PyTorch: 2.8.0+cu126
- Datasets: 3.6.0
- Tokenizers: 0.21.4
Citation
BibTeX
@software{Aarsen_SpanMarker,
author = {Aarsen, Tom},
license = {Apache-2.0},
title = {{SpanMarker for Named Entity Recognition}},
url = {https://github.com/tomaarsen/SpanMarkerNER}
}