ppddddpp's picture
Update README.md
ccb63d2 verified
---
license: mit
tags:
- chest-xray
- medical
- multimodal
- retrieval
- explanation
- clinicalbert
- swin-transformer
- deep-learning
- image-text
datasets:
- openi
language:
- en
---
# Multimodal Chest X-ray Retrieval & Diagnosis (ClinicalBERT + Swin)
This model jointly encodes chest X-rays (DICOM) and radiology reports (XML) to:
- Predict medical conditions from multimodal input (image + text)
- Retrieve similar cases using shared disease-aware embeddings
- Provide visual explanations using attention and Integrated Gradients (IG)
> Developed as a final project at HCMUS.
---
## Model Architecture
- **Image Encoder:** Swin Transformer (pretrained, fine-tuned)
- **Text Encoder:** ClinicalBERT
- **Fusion Module:** Cross-modal attention with optional hybrid FFN layers
- **Losses:** BCE + Focal Loss for multi-label classification
Embeddings from both modalities are projected into a **shared joint space**, enabling retrieval and explanation.
---
## Training Data
- **Dataset:** [NIH Open-i Chest X-ray Dataset](https://openi.nlm.nih.gov/)
- **Input Modalities:**
- Chest X-ray DICOMs
- Associated XML radiology reports
- **Labels:** MeSH-derived disease categories (multi-label)
---
## Intended Uses
* Clinical Education: Case similarity search for radiology students
* Research: Baseline for multimodal medical retrieval
* Explainability: Visualize disease evidence in both image and text
## Limitations & Risks
* Trained on a public dataset (Open-i) — may not generalize to other hospitals
* Explanations are not clinically validated
* Not for diagnostic use in real-world settings
## Acknowledgments
* NIH Open-i Dataset
* Swin Transformer (Timm)
* ClinicalBERT (Emily Alsentzer)
* Captum (for IG explanations)
## Code link: [GitHub](https://github.com/ppddddpp/multi-modal-retrieval-predict-project)