Nikeytas/Test Upload Model

This model is a fine-tuned version of MCG-NJU/videomae-base on the UCF Crime dataset with event-based binary classification. It achieves the following results on the evaluation set:

  • Loss: 0.5847
  • Accuracy: 0.5000
  • Precision: 0.2500
  • Recall: 0.5000
  • F1 Score: 0.3333

🎯 Model Overview

This VideoMAE model has been fine-tuned for binary violence detection in video content. The model classifies videos into two categories:

  • Violent Crime (1): Videos containing violent criminal activities
  • Non-Violent Incident (0): Videos with non-violent or normal activities

The model is based on the VideoMAE architecture and has been specifically trained on a curated subset of the UCF Crime dataset with event-based categorization for realistic crime detection scenarios.

📊 Dataset & Training

Dataset Composition

Total Videos: 20

  • Violent Crime Videos: 10
  • Non-Violent Incident Videos: 10

Class Balance: 50.0% violent crimes

Event Distribution:

  • Arrest: 20 videos
  • Arson: 20 videos

Data Splits:

  • Training: 12 videos
  • Validation: 4 videos
  • Test: 4 videos

🎯 Performance

Performance Metrics

Validation Performance:

  • eval_loss: 0.5847
  • eval_accuracy: 0.5000
  • eval_precision: 0.2500
  • eval_recall: 0.5000
  • eval_f1: 0.3333
  • eval_runtime: 0.6636
  • eval_samples_per_second: 6.0270
  • eval_steps_per_second: 3.0140
  • epoch: 1.0000

Test Performance:

  • eval_loss: 0.6700
  • eval_accuracy: 0.5000
  • eval_precision: 0.2500
  • eval_recall: 0.5000
  • eval_f1: 0.3333
  • eval_runtime: 0.4271
  • eval_samples_per_second: 9.3660
  • eval_steps_per_second: 4.6830
  • epoch: 1.0000

Training Information:

  • Training Time: 0.1 minutes
  • Best Accuracy Achieved: 0.5000
  • Model Architecture: VideoMAE Base (fine-tuned)
  • Fine-tuning Approach: Event-based binary classification

🚀 Training Procedure

Training Hyperparameters

The following hyperparameters were used during training:

  • Learning Rate: 5e-05
  • Train Batch Size: 2
  • Eval Batch Size: 2
  • Optimizer: AdamW with betas=(0.9,0.999) and epsilon=1e-08
  • LR Scheduler Type: Linear
  • Training Epochs: 1
  • Weight Decay: 0.01

Training Results

Training Loss Epoch Step Validation Loss Accuracy
0.5 1.00 N/A 0.5847 0.5000

Framework Versions

  • Transformers: 4.30.2+
  • PyTorch: 2.0.1+
  • Datasets: Latest
  • Device: Apple Silicon MPS / CUDA / CPU (Auto-detected)

🚀 Quick Start

Installation

pip install transformers torch torchvision opencv-python pillow

Basic Usage

import torch
from transformers import AutoModelForVideoClassification, AutoProcessor
import cv2
import numpy as np

# Load model and processor
model = AutoModelForVideoClassification.from_pretrained("Nikeytas/test-upload-model")
processor = AutoProcessor.from_pretrained("Nikeytas/test-upload-model")

# Process video
def classify_video(video_path, num_frames=16):
    # Extract frames
    cap = cv2.VideoCapture(video_path)
    frames = []
    
    total_frames = int(cap.get(cv2.CAP_PROP_FRAME_COUNT))
    indices = np.linspace(0, total_frames - 1, num_frames, dtype=int)
    
    for idx in indices:
        cap.set(cv2.CAP_PROP_POS_FRAMES, idx)
        ret, frame = cap.read()
        if ret:
            frame_rgb = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
            frames.append(frame_rgb)
    
    cap.release()
    
    # Process with model
    inputs = processor(frames, return_tensors="pt")
    
    with torch.no_grad():
        outputs = model(**inputs)
        predictions = torch.nn.functional.softmax(outputs.logits, dim=-1)
        predicted_class = torch.argmax(predictions, dim=-1).item()
        confidence = predictions[0][predicted_class].item()
    
    label = "Violent Crime" if predicted_class == 1 else "Non-Violent"
    return label, confidence

# Example usage
video_path = "path/to/your/video.mp4"
prediction, confidence = classify_video(video_path)
print(f"Prediction: {prediction} (Confidence: {confidence:.3f})")

Batch Processing

import os
from pathlib import Path

def process_video_directory(video_dir, output_file="results.txt"):
    results = []
    
    for video_file in Path(video_dir).glob("*.mp4"):
        try:
            prediction, confidence = classify_video(str(video_file))
            results.append({
                "file": video_file.name,
                "prediction": prediction,
                "confidence": confidence
            })
            print(f"✅ {video_file.name}: {prediction} ({confidence:.3f})")
        except Exception as e:
            print(f"❌ Error processing {video_file.name}: {e}")
    
    # Save results
    with open(output_file, "w") as f:
        for result in results:
            f.write(f"{result['file']}: {result['prediction']} ({result['confidence']:.3f})\n")
    
    return results

# Process all videos in a directory
results = process_video_directory("./videos/")

📈 Technical Specifications

  • Base Model: MCG-NJU/videomae-base
  • Architecture: Vision Transformer (ViT) adapted for video
  • Input Resolution: 224x224 pixels per frame
  • Temporal Resolution: 16 frames per video clip
  • Output Classes: 2 (Binary classification)
  • Training Framework: HuggingFace Transformers
  • Optimization: AdamW optimizer with learning rate 5e-5

⚠️ Limitations

  1. Dataset Scope: Trained on a subset of UCF Crime dataset - may not generalize to all types of violence
  2. Temporal Context: Uses 16-frame clips which may miss context in longer sequences
  3. Environmental Bias: Performance may vary with different lighting, camera angles, and video quality
  4. False Positives: May misclassify intense but non-violent activities (sports, action movies)
  5. Real-time Performance: Processing time depends on hardware capabilities

🔒 Ethical Considerations

Intended Use

  • Primary: Research and development in video analysis
  • Secondary: Security system enhancement with human oversight
  • Educational: Computer vision and AI safety research

Prohibited Uses

  • Surveillance without consent: Do not use for unauthorized monitoring
  • Discriminatory profiling: Avoid bias against specific groups or communities
  • Automated punishment: Never use for automated legal or disciplinary actions
  • Privacy violation: Respect privacy laws and individual rights

Bias and Fairness

  • Model trained on specific dataset that may not represent all populations
  • Regular evaluation needed for bias detection and mitigation
  • Human oversight required for critical applications
  • Consider demographic representation in deployment scenarios

📝 Model Card Information

  • Developed by: Research Team
  • Model Type: Video Classification (Binary)
  • Training Data: UCF Crime Dataset (Subset)
  • Training Date: 2025-06-08 15:19:08 UTC
  • Evaluation Metrics: Accuracy, Precision, Recall, F1-Score
  • Intended Users: Researchers, Security Professionals, Developers

📚 Citation

If you use this model in your research, please cite:

@misc{Nikeytas_test_upload_model,
    title={VideoMAE Fine-tuned for Crime Detection},
    author={Research Team},
    year={2024},
    publisher={Hugging Face},
    url={https://huggingface.co/Nikeytas/test-upload-model}
}

🤝 Contributing

We welcome contributions to improve the model! Please:

  1. Report issues with specific examples
  2. Suggest improvements for bias reduction
  3. Share evaluation results on new datasets
  4. Contribute to documentation and examples

📞 Contact

For questions, issues, or collaboration opportunities, please open an issue in the model repository or contact the development team.


Last updated: 2025-06-08 15:19:08 UTC Model version: 1.0 Framework: HuggingFace Transformers

Downloads last month
6
Safetensors
Model size
86.2M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Nikeytas/test-upload-model

Finetuned
(598)
this model

Dataset used to train Nikeytas/test-upload-model

Evaluation results