I'm using this notebook (accessing Terramind via Terratorch): https://github.com/IBM/terramind/blob/main/notebooks/terramind_v1_base_sen1floods11.ipynb
as a starting point.
I want to take a single image, or group of images and apply the Terramind encoder to get the feature representation. I've applied the code below and am getting back a list with 12 objects, all of embedding length but seemingly almost equivalent.
Q) Am I doing this right to get the embeddings from the model? If not, what extra steps/ changes are required?

Define backbone only model

backbone = BACKBONE_REGISTRY.build(
"terramind_v1_base",
modalities=["S2L1C", "S1GRD"],
pretrained=True,
)

S1 example

image_fp = 'sen1floods11_v1.1/data/S1GRDHand/Bolivia_103757_S1Hand.tif'

Read image

with rasterio.open(image_fp) as src:
image = src.read()
image = np.transpose(image, (1, 2, 0)) # Change to HWC format
image = transforms.ToTensor()(image)
image = image[None, :,:,:]
image = image.float()

Create in modality format

image_dict = {"S1GRD": image}

Apply the model

s1_out = backbone(image_dict)

'Embeddings'?

len(s1_out) # =12

Thanks

ibm-esa-geospatial
/

TerraMind-1.0-base

How to use only the encoder to extract embeddings on new images (S2/S1)?