|
--- |
|
license: mit |
|
tags: |
|
- chest-xray |
|
- medical |
|
- multimodal |
|
- retrieval |
|
- explanation |
|
- clinicalbert |
|
- swin-transformer |
|
- deep-learning |
|
- image-text |
|
datasets: |
|
- openi |
|
language: |
|
- en |
|
--- |
|
|
|
# Multimodal Chest X-ray Retrieval & Diagnosis (ClinicalBERT + Swin) |
|
|
|
This model jointly encodes chest X-rays (DICOM) and radiology reports (XML) to: |
|
|
|
- Predict medical conditions from multimodal input (image + text) |
|
- Retrieve similar cases using shared disease-aware embeddings |
|
- Provide visual explanations using attention and Integrated Gradients (IG) |
|
|
|
> Developed as a final project at HCMUS. |
|
|
|
--- |
|
|
|
## Model Architecture |
|
|
|
- **Image Encoder:** Swin Transformer (pretrained, fine-tuned) |
|
- **Text Encoder:** ClinicalBERT |
|
- **Fusion Module:** Cross-modal attention with optional hybrid FFN layers |
|
- **Losses:** BCE + Focal Loss for multi-label classification |
|
|
|
Embeddings from both modalities are projected into a **shared joint space**, enabling retrieval and explanation. |
|
|
|
--- |
|
|
|
## Training Data |
|
|
|
- **Dataset:** [NIH Open-i Chest X-ray Dataset](https://openi.nlm.nih.gov/) |
|
- **Input Modalities:** |
|
- Chest X-ray DICOMs |
|
- Associated XML radiology reports |
|
- **Labels:** MeSH-derived disease categories (multi-label) |
|
|
|
--- |
|
|
|
## Intended Uses |
|
* Clinical Education: Case similarity search for radiology students |
|
|
|
* Research: Baseline for multimodal medical retrieval |
|
|
|
* Explainability: Visualize disease evidence in both image and text |
|
|
|
## Limitations & Risks |
|
* Trained on a public dataset (Open-i) — may not generalize to other hospitals |
|
|
|
* Explanations are not clinically validated |
|
|
|
* Not for diagnostic use in real-world settings |
|
|
|
## Acknowledgments |
|
* NIH Open-i Dataset |
|
|
|
* Swin Transformer (Timm) |
|
|
|
* ClinicalBERT (Emily Alsentzer) |
|
|
|
* Captum (for IG explanations) |
|
|
|
## Code link: [GitHub](https://github.com/ppddddpp/multi-modal-retrieval-predict-project) |
|
|