YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

🧠 Image Classification AI Model (CIFAR-100)

This repository contains a Vision Transformer (ViT)-based AI model fine-tuned for image classification on the CIFAR-100 dataset. The model is built using google/vit-base-patch16-224, quantized to FP16 for efficient inference, and delivers high accuracy in multi-class image classification tasks.


🚀 Features

  • 🖼️ Task: Image Classification
  • 🧠 Base Model: google/vit-base-patch16-224 (Vision Transformer)
  • 🧪 Quantized: FP16 for faster and memory-efficient inference
  • 🎯 Dataset: 100 fine-grained object categories
  • CUDA Enabled: Optimized for GPU acceleration
  • 📈 High Accuracy: Fine-tuned and evaluated on validation split

📊 Dataset Used

Hugging Face Dataset: tanganke/cifar100

  • Description: CIFAR-100 is a dataset of 60,000 32×32 color images in 100 classes (600 images per class)
  • Split: 50,000 training images and 10,000 test images
  • Categories: Animals, Vehicles, Food, Household items, etc.
  • License: MIT License (from source)
from datasets import load_dataset

dataset = load_dataset("tanganke/cifar100")

🛠️ Model & Training Configuration

  • Model: google/vit-base-patch16-224

  • Image Size: 224x224 (resized from 32x32)

  • Framework: Hugging Face Transformers & Datasets

  • Training Environment: Kaggle Notebook with CUDA

  • Epochs: 5–10 (with early stopping)

  • Batch Size: 32

  • Optimizer: AdamW

  • Loss Function: CrossEntropyLoss

✅ Evaluation & Scoring

  • Accuracy: ~70–80% (varies by configuration)

  • Validation Tool: evaluate or sklearn.metrics

  • Metric: Accuracy, Top-1 and Top-5 scores

  • Inference Speed: Significantly faster after quantizationextractor = ViTFeatureExtractor.from_pretrained("google/vit-base-patch16-224")

🔍 Inference Example

from PIL import Image
import torch

def predict(image_path):
    image = Image.open(image_path).convert("RGB")
    inputs = feature_extractor(images=image, return_tensors="pt").to("cuda")
    outputs = model(**inputs)
    logits = outputs.logits
    predicted_class = logits.argmax(-1).item()
    return dataset["train"].features["fine_label"].int2str(predicted_class)

print(predict("sample_image.jpg"))

📁 Folder Structure

📦image-classification-vit ┣ 📂vit-cifar100-fp16 ┣ 📜train.py ┣ 📜inference.py ┣ 📜README.md ┗ 📜requirements.txt

Downloads last month
2
Safetensors
Model size
86.6M params
Tensor type
F16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support