ricklisz123 commited on
Commit
f4d2faf
·
verified ·
1 Parent(s): abe08e1

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +84 -3
README.md CHANGED
@@ -1,3 +1,84 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ ---
4
+
5
+ # Model Card for MedDINOv3
6
+
7
+ MedDINOv3 is a medical vision foundation model pretrained on CT-3M, a collection of 2D axial CT slices covering diverse anatomical regions. MedDINOv3 produces high-quality dense features that achieve strong performance on various CT segmentation tasks, significantly surpassing previous supervised CNN and transformer models.
8
+ ## Model Details
9
+
10
+ ### Model Description
11
+
12
+ <!-- Provide a longer summary of what this model is. -->
13
+
14
+ We provide ViT-B-16 pretrained on CT-3M using the three-stage DINOv3 objective.
15
+ Model type: Vision Transformer, ConvNeXt
16
+ - **Developed by:** Yuheng Li, Yizhou Wu, Yuxiang Lai, Mingzhe Hu, Xiaofeng Yang
17
+ - **Model type:** Vision Transformer
18
+ - **License:** apache-2.0
19
+
20
+ ### Model Sources
21
+
22
+ - **Repository:** [GitHub – MedDINOv3](https://github.com/ricklisz/MedDINOv3)
23
+ - **Paper:** [arXiv:2509.02379](https://arxiv.org/abs/2509.02379)
24
+
25
+
26
+ ## Uses
27
+
28
+ The model is a vision backbone providing multi-purpose features for downstream medical imaging tasks.
29
+
30
+ ### Direct Use
31
+ - Use as a **frozen feature extractor** for medical imaging tasks (e.g., segmentation, classification).
32
+ - Fine-tuning within **nnU-Net** or other medical segmentation frameworks.
33
+
34
+ ### Out-of-Scope Use
35
+ - The model is trained only on **CT images**. Direct use for MRI, ultrasound, or natural images without adaptation is not recommended.
36
+ - Not validated for **clinical decision-making** without extensive downstream validation.
37
+
38
+
39
+ ## Bias, Risks, and Limitations
40
+ - Training data is limited to CT scans from public datasets (16 sources). It may not generalize to underrepresented scanners, populations, or pathologies.
41
+ - The model was not designed to ensure fairness across demographic subgroups.
42
+ - Clinical deployment requires further validation to mitigate risks of false positives/negatives.
43
+
44
+ ### Recommendations
45
+ - Perform **task-specific fine-tuning** before clinical use.
46
+ - Validate on **local datasets** to assess generalization.
47
+
48
+ ## How to Get Started with the Model
49
+
50
+ Please follow the instructions in https://github.com/ricklisz/MedDINOv3
51
+
52
+ After setting up the repo, you can do:
53
+
54
+ ```python
55
+ import torch
56
+ from nnunetv2.training.nnUNetTrainer.dinov3.dinov3.models.vision_transformer import vit_base
57
+
58
+ # Initialize backbone
59
+ model = vit_base(drop_path_rate=0.2, layerscale_init=1e-5)
60
+
61
+ # Load MedDINOv3-CT3M checkpoint
62
+ chkpt = torch.load("MedDINOv3-B-CT3M.pth", map_location="cpu")
63
+ model.load_state_dict(chkpt, strict=False)
64
+ ```
65
+
66
+ ## Training Details
67
+
68
+ ### Training Data
69
+
70
+ Dataset: CT-3M (3,868,833 axial slices from 16 public CT datasets)
71
+
72
+ Coverage: Over 100 anatomical structures across abdominal, thoracic, and pelvic regions
73
+
74
+
75
+ ## Citation
76
+ ```
77
+ @article{li2025meddinov3,
78
+ title={MedDINOv3: How to Adapt Vision Foundation Models for Medical Image Segmentation?},
79
+ author={Li, Yuheng and Wu, Yizhou and Lai, Yuxiang and Hu, Mingzhe and Yang, Xiaofeng},
80
+ journal={arXiv preprint arXiv:2509.02379},
81
+ year={2025},
82
+ url={https://arxiv.org/abs/2509.02379}
83
+ }
84
+ ```