File size: 1,865 Bytes
71b8e11 ccb63d2 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 |
---
license: mit
tags:
- chest-xray
- medical
- multimodal
- retrieval
- explanation
- clinicalbert
- swin-transformer
- deep-learning
- image-text
datasets:
- openi
language:
- en
---
# Multimodal Chest X-ray Retrieval & Diagnosis (ClinicalBERT + Swin)
This model jointly encodes chest X-rays (DICOM) and radiology reports (XML) to:
- Predict medical conditions from multimodal input (image + text)
- Retrieve similar cases using shared disease-aware embeddings
- Provide visual explanations using attention and Integrated Gradients (IG)
> Developed as a final project at HCMUS.
---
## Model Architecture
- **Image Encoder:** Swin Transformer (pretrained, fine-tuned)
- **Text Encoder:** ClinicalBERT
- **Fusion Module:** Cross-modal attention with optional hybrid FFN layers
- **Losses:** BCE + Focal Loss for multi-label classification
Embeddings from both modalities are projected into a **shared joint space**, enabling retrieval and explanation.
---
## Training Data
- **Dataset:** [NIH Open-i Chest X-ray Dataset](https://openi.nlm.nih.gov/)
- **Input Modalities:**
- Chest X-ray DICOMs
- Associated XML radiology reports
- **Labels:** MeSH-derived disease categories (multi-label)
---
## Intended Uses
* Clinical Education: Case similarity search for radiology students
* Research: Baseline for multimodal medical retrieval
* Explainability: Visualize disease evidence in both image and text
## Limitations & Risks
* Trained on a public dataset (Open-i) — may not generalize to other hospitals
* Explanations are not clinically validated
* Not for diagnostic use in real-world settings
## Acknowledgments
* NIH Open-i Dataset
* Swin Transformer (Timm)
* ClinicalBERT (Emily Alsentzer)
* Captum (for IG explanations)
## Code link: [GitHub](https://github.com/ppddddpp/multi-modal-retrieval-predict-project)
|