Welcome to SAMF model [MICCAI' 25]!

[MICCAI' 25] From Slices to Volumes: Multi-Scale Fusion of 2D and 3D Features for CT Scan Report Generation

Model	Bleu1	Bleu4	RougeL	Meteor	Bert F1	Llama Score
CT2Rep	0.309	0.172	0.243	0.173	0.865	6.35
CT-Chat	0.395	-	0.321	0.219	-	5.664
Our Baseline (SAMF)	0.423	0.203	0.338	0.356	0.879	6.792
SAMF + Ao2D	0.440	0.261	0.417	0.417	0.889	7.165

Introduction

Slice Attentive Multimodal Fusion (SAMF) , a framework that combines the rich, high-resolution information from 2D slices with the spatial coherence of 3D volumetric data. Experimental results demonstrate that our method outperforms existing baseline approaches in both report generation and multiple-choice question answering, highlighting the critical role of multidimensional feature integration.

Model Description

Model type: 3D Medical Report Generation and Visual Question Answering
Language(s) (NLP): English
License: apache-2.0
Finetuned from model: microsoft/Phi-3-mini-4k-instruct

Training Data

Medical Report Generation and Visual Question Answering: ibrahimhamamci/CT-RATE, default subset

Hardware Utilization

Hardware Type: 1 × NVIDIA-A100
Hours used around 16 hours

Evaluation

To perform evaluation using this model, please refer to our GitHub repository (serag-ai/SAMF), which provides detailed information on how to use it.

serag-ai
/

SAMF