Qwen2-VL-2B-Instruct-LoRA-SurveillanceVideo-Classification-250207
This model is a fine-tuned version of Qwen/Qwen2-VL-2B-Instruct on the Surveillance Video Classification dataset.
Model description
This model takes a video as input and classifies it into one of the following six classes [1. loitering, 2. breaking and entering, 3. abandonment, 4. falling down, 5. fighting, 6. arson]
LLaMA-Factory was used for training, with the same hyperparameters as described below.
Intended uses & limitations
This Model Fine-tuned by the Prompt Below. The same is true when running inference.
messages = [
{
"role": "user",
"content": [
{
"type": "video",
"video": video_path,
"max_pixels": 640 * 360,
# "fps": 1.0 # maybe default fps = 1.0
},
{
"type": "text",
"text": (
"<video>\nWatch the video and choose the six behaviours that apply to you. "
"[1. loitering, 2. breaking and entering, 3. abandonment, 4. falling down, 5. fighting, 6. arson]. "
"Your answer must be a single digit, the number of the behaviour."
)
}
]
}
]
Training and evaluation data
The data used for training was sampled balanced for each class from the original video dataset and trained using 100 videos per class (except for the 6. arson class, which used 65 videos).
Each video was preprocessed with a resolution of 640x360 and an option of fps=3.0, and a 10-second segment of the video where the behavior occurred according to the metadata was cut and used for training. (So, in total, we used about 30 frames).
In the Inference course, you can use the same prompts as above. For training, we used the format of the above prompt with an additional class as the answer.
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 5e-05
- train_batch_size: 2
- eval_batch_size: 8
- seed: 42
- gradient_accumulation_steps: 8
- total_train_batch_size: 16
- optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: cosine
- num_epochs: 3.0
- mixed_precision_training: Native AMP
Training results
Framework versions
- PEFT 0.12.0
- Transformers 4.48.2
- Pytorch 2.5.1+cu121
- Datasets 3.1.0
- Tokenizers 0.21.0
- Downloads last month
- 25