SujitShelar commited on
Commit
8a92788
·
verified ·
1 Parent(s): 08fb6de

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +8 -8
README.md CHANGED
@@ -18,12 +18,12 @@ tags: []
18
  V-JEPA 2 is a self-supervised video backbone trained on >1 M h of internet video; Meta released checkpoints with a Something-Something v2 action head. I freeze that backbone and fine-tune only the classifier head on the HMDB-51 benchmark (6 766 clips, 51 classes) for 5 epochs. The resulting model reaches competitive top-1 accuracy (see Evaluation)
19
 
20
  - **Developed by:** Sujit Shelar
21
- - **Funded by [optional]:** self-funded (personal compute credits)
22
- - **Shared by [optional]:** V-JEPA 2 ViT-Large (16 frame, 256² patch) video-encoder with a 51-way classification head
23
  - **Model type:** Vision (video); no text inputs
24
- - **Language(s) (NLP):** [More Information Needed]
25
- - **License:** MIT – identical to the upstream V-JEPA 2 weights
26
- - **Finetuned from model [optional]:** facebook/vjepa2-vitl-fpc16-256-ssv2
27
 
28
  ### Model Sources [optional]
29
 
@@ -115,7 +115,7 @@ HMDB-51 (CC BY-4.0, 6 766 clips across 51 classes). I stratify 70 / 15 / 15 % in
115
  | Hardware | 1× nvidia-a100-80gb |
116
 
117
 
118
- #### Preprocessing [optional]
119
 
120
  Clips are sampled at 16 frames per video (torchcodec.clips_at_random_indices), resized/cropped to 256², then normalised by the processor.
121
 
@@ -220,7 +220,7 @@ Carbon emissions can be estimated using the [Machine Learning Impact calculator]
220
  - **Compute Region:** [More Information Needed]
221
  - **Carbon Emitted:** [More Information Needed]
222
 
223
- ## Technical Specifications [optional]
224
 
225
  ### Model Architecture and Objective
226
 
@@ -242,7 +242,7 @@ Classification head: two MLP layers (hidden 4 096 → 51 classes).
242
 
243
  [More Information Needed]
244
 
245
- ## Citation [optional]
246
 
247
  <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
248
 
 
18
  V-JEPA 2 is a self-supervised video backbone trained on >1 M h of internet video; Meta released checkpoints with a Something-Something v2 action head. I freeze that backbone and fine-tune only the classifier head on the HMDB-51 benchmark (6 766 clips, 51 classes) for 5 epochs. The resulting model reaches competitive top-1 accuracy (see Evaluation)
19
 
20
  - **Developed by:** Sujit Shelar
21
+ - **Funded by :** self-funded (personal compute credits)
22
+ - **Shared by :** V-JEPA 2 ViT-Large (16 frame, 256² patch) video-encoder with a 51-way classification head
23
  - **Model type:** Vision (video); no text inputs
24
+ - **Language(s) (NLP) :** [More Information Needed]
25
+ - **License :** MIT – identical to the upstream V-JEPA 2 weights
26
+ - **Finetuned from model :** facebook/vjepa2-vitl-fpc16-256-ssv2
27
 
28
  ### Model Sources [optional]
29
 
 
115
  | Hardware | 1× nvidia-a100-80gb |
116
 
117
 
118
+ #### Preprocessing
119
 
120
  Clips are sampled at 16 frames per video (torchcodec.clips_at_random_indices), resized/cropped to 256², then normalised by the processor.
121
 
 
220
  - **Compute Region:** [More Information Needed]
221
  - **Carbon Emitted:** [More Information Needed]
222
 
223
+ ## Technical Specifications
224
 
225
  ### Model Architecture and Objective
226
 
 
242
 
243
  [More Information Needed]
244
 
245
+ ## Citation
246
 
247
  <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
248