Qwen2-VL-2B-Instruct-GPTQ-Int4-LoRA-SurveillanceVideo-Classification-250210

This model is a fine-tuned version of Qwen/Qwen2-VL-2B-Instruct-GPTQ-Int4 on the Surveillance Video Classification dataset.

Model description

This model takes a video as input and classifies it into one of the following six classes [1. loitering, 2. breaking and entering, 3. abandonment, 4. falling down, 5. fighting, 6. arson]

LLaMA-Factory was used for training, with the same hyperparameters as described below.

Intended uses & limitations

This Model Fine-tuned by the Prompt Below. The same is true when running inference.

messages = [
            {
                "role": "user",
                "content": [
                    {
                        "type": "video",
                        "video": video_path,
                        "max_pixels": 640 * 360,
                        # "fps": 1.0   # maybe default fps = 1.0
                    },
                    {
                        "type": "text",
                        "text": (
                            "<video>\nWatch the video and choose the six behaviours that apply to you. "
                            "[1. loitering, 2. breaking and entering, 3. abandonment, 4. falling down, 5. fighting, 6. arson]. "
                            "Your answer must be a single digit, the number of the behaviour."
                        )
                    }
                ]
            }
        ]

Training and evaluation data

The data used for training was sampled balanced for each class from the original video dataset and trained using 100 videos per class (except for the 6. arson class, which used 65 videos).

Each video was preprocessed with a resolution of 640x360 and an option of fps=3.0, and a 10-second segment of the video where the behavior occurred according to the metadata was cut and used for training. (So, in total, we used about 30 frames).

In the Inference course, you can use the same prompts as above. For training, we used the format of the above prompt with an additional class as the answer.

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 2
  • eval_batch_size: 8
  • seed: 42
  • gradient_accumulation_steps: 8
  • total_train_batch_size: 16
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • num_epochs: 3.0
  • mixed_precision_training: Native AMP

Training results

Framework versions

  • PEFT 0.12.0
  • Transformers 4.48.2
  • Pytorch 2.6.0+cu124
  • Datasets 3.2.0
  • Tokenizers 0.21.0
Downloads last month
20
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The HF Inference API does not support video-classification models for peft library.

Model tree for Jeckmu/Qwen2-VL-2B-Instruct-GPTQ-Int4-lora-SurveillanceVideo-250210

Base model

Qwen/Qwen2-VL-2B
Adapter
(1)
this model

Collection including Jeckmu/Qwen2-VL-2B-Instruct-GPTQ-Int4-lora-SurveillanceVideo-250210