BioBERT Disease NER

Biomedical NER model fine-tuned on BioBERT using the NCBI Disease dataset to extract disease mentions from biomedical text.

🔗 Live Demo (Disease-Extraction-System)

https://disease-extraction-system.vercel.app/

📂 GitHub

https://github.com/IshanSalunkhe6/disease-extraction-system

📊 Performance

Metric	Score
Precision	86.80%
Recall	91.39%
F1-score	89.04%
Accuracy	98.64%

📚 Training Data

Dataset: NCBI Disease
Size: 6,800+ annotated mentions from 793 PubMed abstracts

🛠️ How to Use

from transformers import pipeline

nlp = pipeline(
    "ner",
    model="Ishan0612/biobert-ner-disease-ncbi",
    tokenizer="Ishan0612/biobert-ner-disease-ncbi",
    aggregation_strategy="simple"
)

text = "The patient has signs of diabetes mellitus and chronic obstructive pulmonary disease."
results = nlp(text)

for entity in results:
    print(f"{entity['word']} - ({entity['entity_group']})")

This should output:

Extracted Medical Entities:

the patient has signs of - (LABEL_0)

diabetes - (LABEL_1)

mellitus - (LABEL_2)

and - (LABEL_0)

chronic - (LABEL_1)

obstructive pulmonary disease - (LABEL_2)

. - (LABEL_0)

License

This model is licensed under the Apache 2.0 License, same as the original BioBERT (dmis-lab/biobert-base-cased-v1.1).

Citation

@article{lee2020biobert, title={BioBERT: a pre-trained biomedical language representation model for biomedical text mining}, author={Lee, Jinhyuk and Yoon, Wonjin and Kim, Sungdong and Kim, Donghyeon and So, Chan Ho and Kang, Jaewoo}, journal={Bioinformatics}, volume={36}, number={4}, pages={1234--1240}, year={2020}, publisher={Oxford University Press} }

Ishan0612
/

biobert-ner-disease-ncbi