Collection shoaib6174/video_swin_transformer/1

Collection of Video Swin Transformers feature extractor models.

Overview

This collection contains different Video Swin Transformer [1] models. The original model weights are provided from [2]. There were ported to Keras models (tf.keras.Model) and then serialized as TensorFlow SavedModels. The porting steps are available in [3].

About the models

These models can be directly used to extract features from videos. These models are accompanied by Colab Notebooks with fine-tuning steps for action-recognition task and video-classification.

The table below provides a performance summary:

model_name pre-train dataset fine-tune dataset acc@1(%) acc@5(%)
swin_tiny_patch244_window877_kinetics400_1k ImageNet-1K Kinetics 400(1k 78.8 93.6
swin_small_patch244_window877_kinetics400_1k ImageNet-1K Kinetics 400(1k) 80.6 94.5
swin_base_patch244_window877_kinetics400_1k ImageNet-1K Kinetics 400(1k) 80.6 96.6
swin_base_patch244_window877_kinetics400_22k ImageNet-12K Kinetics 400(1k) 82.7 95.5
swin_base_patch244_window877_kinetics600_22k ImageNet-1K Kinetics 600(1k) 84.0 96.5
swin_base_patch244_window1677_sthv2 Kinetics 400 Something-Something V2 69.6 92.7

These scores for all the models are taken from [2].

Video Swin Transformer Feature extractors Models

Notes

The input shape for these models are [None, 3, 32, 224, 224] representing [batch_size, channels, frames, height, width]. To create models with different input shape use this notebook.

References

[1] Video Swin Transformer Ze et al. [2] Video Swin Transformers GitHub [3] GSOC-22-Video-Swin-Transformers GitHub

Acknowledgements

Downloads last month
10
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no pipeline_tag.