ricklisz123
/

MedDINOv3-ViTB-16-CT-3M

Image Segmentation

vision-transformer

Model card Files Files and versions

ricklisz123 commited on 4 days ago

Commit

f4d2faf

·

verified ·

1 Parent(s): abe08e1

Update README.md

Files changed (1) hide show

README.md +84 -3

README.md CHANGED Viewed

@@ -1,3 +1,84 @@
----
-license: apache-2.0
----

+---
+license: apache-2.0
+---
+# Model Card for MedDINOv3
+MedDINOv3 is a medical vision foundation model pretrained on CT-3M, a collection of 2D axial CT slices covering diverse anatomical regions. MedDINOv3 produces high-quality dense features that achieve strong performance on various CT segmentation tasks, significantly surpassing previous supervised CNN and transformer models.
+## Model Details
+### Model Description
+<!-- Provide a longer summary of what this model is. -->
+We provide ViT-B-16 pretrained on CT-3M using the three-stage DINOv3 objective.
+Model type: Vision Transformer, ConvNeXt
+- **Developed by:** Yuheng Li, Yizhou Wu, Yuxiang Lai, Mingzhe Hu, Xiaofeng Yang
+- **Model type:** Vision Transformer
+- **License:** apache-2.0
+### Model Sources
+- **Repository:** [GitHub – MedDINOv3](https://github.com/ricklisz/MedDINOv3)
+- **Paper:** [arXiv:2509.02379](https://arxiv.org/abs/2509.02379)
+## Uses
+The model is a vision backbone providing multi-purpose features for downstream medical imaging tasks.
+### Direct Use
+- Use as a **frozen feature extractor** for medical imaging tasks (e.g., segmentation, classification).
+- Fine-tuning within **nnU-Net** or other medical segmentation frameworks.
+### Out-of-Scope Use
+- The model is trained only on **CT images**. Direct use for MRI, ultrasound, or natural images without adaptation is not recommended.
+- Not validated for **clinical decision-making** without extensive downstream validation.
+## Bias, Risks, and Limitations
+- Training data is limited to CT scans from public datasets (16 sources). It may not generalize to underrepresented scanners, populations, or pathologies.
+- The model was not designed to ensure fairness across demographic subgroups.
+- Clinical deployment requires further validation to mitigate risks of false positives/negatives.
+### Recommendations
+- Perform **task-specific fine-tuning** before clinical use.
+- Validate on **local datasets** to assess generalization.
+## How to Get Started with the Model
+Please follow the instructions in https://github.com/ricklisz/MedDINOv3
+After setting up the repo, you can do:
+```python
+import torch
+from nnunetv2.training.nnUNetTrainer.dinov3.dinov3.models.vision_transformer import vit_base
+# Initialize backbone
+model = vit_base(drop_path_rate=0.2, layerscale_init=1e-5)
+# Load MedDINOv3-CT3M checkpoint
+chkpt = torch.load("MedDINOv3-B-CT3M.pth", map_location="cpu")
+model.load_state_dict(chkpt, strict=False)
+```
+## Training Details
+### Training Data
+Dataset: CT-3M (3,868,833 axial slices from 16 public CT datasets)
+Coverage: Over 100 anatomical structures across abdominal, thoracic, and pelvic regions
+## Citation
+```
+@article{li2025meddinov3,
+  title={MedDINOv3: How to Adapt Vision Foundation Models for Medical Image Segmentation?},
+  author={Li, Yuheng and Wu, Yizhou and Lai, Yuxiang and Hu, Mingzhe and Yang, Xiaofeng},
+  journal={arXiv preprint arXiv:2509.02379},
+  year={2025},
+  url={https://arxiv.org/abs/2509.02379}
+}
+```