Welcome to SAMF model [MICCAI' 25]!

[MICCAI' 25] From Slices to Volumes: Multi-Scale Fusion of 2D and 3D Features for CT Scan Report Generation

Model Bleu1 Bleu4 RougeL Meteor Bert F1 Llama Score
CT2Rep 0.309 0.172 0.243 0.173 0.865 6.35
CT-Chat 0.395 - 0.321 0.219 - 5.664
Our Baseline (SAMF) 0.423 0.203 0.338 0.356 0.879 6.792
SAMF + Ao2D 0.440 0.261 0.417 0.417 0.889 7.165

Introduction

Slice Attentive Multimodal Fusion (SAMF) , a framework that combines the rich, high-resolution information from 2D slices with the spatial coherence of 3D volumetric data. Experimental results demonstrate that our method outperforms existing baseline approaches in both report generation and multiple-choice question answering, highlighting the critical role of multidimensional feature integration.

Model Description

  • Model type: 3D Medical Report Generation and Visual Question Answering
  • Language(s) (NLP): English
  • License: apache-2.0
  • Finetuned from model: microsoft/Phi-3-mini-4k-instruct

Training Data

Hardware Utilization

  • Hardware Type: 1 × NVIDIA-A100
  • Hours used around 16 hours

Evaluation

To perform evaluation using this model, please refer to our GitHub repository (serag-ai/SAMF), which provides detailed information on how to use it.

Downloads last month
241
Safetensors
Model size
3.96B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for serag-ai/SAMF

Finetuned
(348)
this model

Dataset used to train serag-ai/SAMF