SpanMarker

This is a SpanMarker model that can be used for Named Entity Recognition.

Model Details

Model Description

  • Model Type: SpanMarker
  • Maximum Sequence Length: 512 tokens
  • Maximum Entity Length: 12 words

Model Sources

Model Labels

Label Examples
action "Remind", "scheduled", "review"
app_data_type "items", "images", "videos"
app_name "Camera", "phone", "Slack"
contact_info "sarah . lee @ company . org", "123 Maple Street , Springfield", "home address"
date "20 . 10 . 1999", "before", "January 18 - June 15"
event_title "team sync", "Marketing Strategy Meeting", "Budget Planning"
file_name "notes", "budget_overview . xlsx", "project_plan . docx"
file_size "under 500 kb", "smaller than 50 kb", "exceeding 100 mb"
file_type "documents", "document", "image"
folder_name "Projects", "Work", "Photos"
in_file_data "appendix section", "page 10", "section 5"
limits "top 8", "all", "every"
location "Room 204", "server room", "library"
person_name "Jonathan Kim", "Mr . Osei", "Lucas Müller"
relationship "manager", "brother", "cousin"
setting "brightness", "airplane mode", "notifications"
system_command "disable", "move", "switch on"
time "9 : 00 AM", "10 : 45", "10 : 00 AM"

Evaluation

Metrics

Label Precision Recall F1
all 0.8559 0.8813 0.8684
action 0.8173 0.9245 0.8676
app_data_type 0.7960 0.6828 0.7351
app_name 0.9432 0.9432 0.9432
contact_info 0.8722 0.9091 0.8903
date 0.9160 0.8993 0.9076
event_title 0.8659 0.9107 0.8877
file_name 0.9371 0.9280 0.9326
file_size 0.7810 0.7810 0.7810
file_type 0.7731 0.8786 0.8225
folder_name 0.9618 0.8968 0.9282
in_file_data 0.7486 0.7867 0.7672
limits 0.9048 0.6786 0.7755
location 0.8917 0.8571 0.8741
person_name 0.9885 0.9885 0.9885
relationship 0.9505 0.9541 0.9523
setting 0.8974 0.9255 0.9112
system_command 0.7889 0.7441 0.7659
time 0.9076 0.8587 0.8825

Uses

Direct Use for Inference

from span_marker import SpanMarkerModel

# Download from the 🤗 Hub
model = SpanMarkerModel.from_pretrained("span_marker_model_id")
# Run inference
entities = model.predict("Text my mother at + 44 7911 123456 the summary from paragraph 4, and then enable bluetooth")

Downstream Use

You can finetune this model on your own dataset.

Click to expand
from span_marker import SpanMarkerModel, Trainer

# Download from the 🤗 Hub
model = SpanMarkerModel.from_pretrained("span_marker_model_id")

# Specify a Dataset with "tokens" and "ner_tag" columns
dataset = load_dataset("conll2003") # For example CoNLL2003

# Initialize a Trainer using the pretrained model & dataset
trainer = Trainer(
    model=model,
    train_dataset=dataset["train"],
    eval_dataset=dataset["validation"],
)
trainer.train()
trainer.save_model("span_marker_model_id-finetuned")

Training Details

Training Set Metrics

Training set Min Median Max
Sentence length 3 19.0206 53
Entities per sentence 1 5.7015 13

Training Hyperparameters

  • learning_rate: 5e-05
  • train_batch_size: 32
  • eval_batch_size: 32
  • seed: 42
  • optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 5
  • mixed_precision_training: Native AMP

Training Results

Epoch Step Validation Loss Validation Precision Validation Recall Validation F1 Validation Accuracy
1.8553 1000 0.0344 0.8301 0.8650 0.8472 0.9204
3.7106 2000 0.0271 0.8524 0.8804 0.8662 0.9316

Framework Versions

  • Python: 3.12.12
  • SpanMarker: 1.7.0
  • Transformers: 4.51.3
  • PyTorch: 2.8.0+cu126
  • Datasets: 3.6.0
  • Tokenizers: 0.21.4

Citation

BibTeX

@software{Aarsen_SpanMarker,
    author = {Aarsen, Tom},
    license = {Apache-2.0},
    title = {{SpanMarker for Named Entity Recognition}},
    url = {https://github.com/tomaarsen/SpanMarkerNER}
}
Downloads last month
6
Safetensors
Model size
0.1B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Evaluation results