DuarteBarbosa commited on
Commit
8c3de8f
·
verified ·
1 Parent(s): 8535c93

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +6 -7
README.md CHANGED
@@ -13,17 +13,17 @@ tags:
13
 
14
  This project implements a deep learning model to detect the orientation of images and determine the rotation needed to correct them. It uses a pre-trained EfficientNetV2 model from PyTorch, fine-tuned for the task of classifying images into four orientation categories: 0°, 90°, 180°, and 270°.
15
 
16
- The model achieves **97.53% accuracy** on the validation set.
17
 
18
  ## Training Performance and Model History
19
 
20
- This model was trained on a single NVIDIA RTX 4080 GPU, taking approximately **3 hours and 20 minutes** to complete.
21
 
22
  The final model is using `EfficientNetV2-S`, but the project evolved through several iterations:
23
 
24
  - **ResNet18:** Achieved ~90% accuracy with a model size of around 30MB.
25
  - **ResNet50:** Improved accuracy to 95.26% with a model size of ~100MB.
26
- - **EfficientNetV2-S:** Reached the "final" (for now) accuracy of **97.53%** with ~80MB.
27
 
28
  ## How It Works
29
 
@@ -42,9 +42,10 @@ The model was trained on several datasets:
42
 
43
  - **Microsoft COCO Dataset:** A large-scale object detection, segmentation, and captioning dataset ([link](https://cocodataset.org/)).
44
  - **AI-Generated vs. Real Images:** A dataset from Kaggle ([link](https://www.kaggle.com/datasets/cashbowman/ai-generated-images-vs-real-images)) was included to make the model aware of the typical orientations on different compositions found in art and illustrations.
 
45
  - **Personal Images:** A small, curated collection of personal photographs to include unique examples and edge cases.
46
 
47
- The combined dataset consists of **45,726** unique images. Each image is augmented by being rotated in four ways (0°, 90°, 180°, 270°), creating a total of **182,904** samples. This augmented dataset was then split into **146,323 samples for training** and **36,581 samples for validation**.
48
 
49
  ## Usage
50
 
@@ -52,13 +53,11 @@ For detailed usage instructions, including how to run predictions, export to ONN
52
 
53
  ## Performance Comparison (PyTorch vs. ONNX)
54
 
55
- For a dataset of 5055 images, the performance on a RTX 4080 running in **single-thread** was:
56
 
57
  - **PyTorch (`predict.py`):** 135.71 seconds
58
  - **ONNX (`predict_onnx.py`):** 60.83 seconds
59
 
60
- This demonstrates a significant performance gain of approximately **55.2%** when using the ONNX model for inference.
61
-
62
  ---
63
 
64
  For more in-depth information about the project, including the full source code, training scripts, and detailed documentation, please visit the [GitHub repository](https://github.com/duartebarbosadev/deep-image-orientation-detection).
 
13
 
14
  This project implements a deep learning model to detect the orientation of images and determine the rotation needed to correct them. It uses a pre-trained EfficientNetV2 model from PyTorch, fine-tuned for the task of classifying images into four orientation categories: 0°, 90°, 180°, and 270°.
15
 
16
+ The model achieves **98.12% accuracy** on the validation set.
17
 
18
  ## Training Performance and Model History
19
 
20
+ This model was trained on a single NVIDIA RTX 4080 GPU, taking approximately **4 hours and 56.4 minutes** to complete.
21
 
22
  The final model is using `EfficientNetV2-S`, but the project evolved through several iterations:
23
 
24
  - **ResNet18:** Achieved ~90% accuracy with a model size of around 30MB.
25
  - **ResNet50:** Improved accuracy to 95.26% with a model size of ~100MB.
26
+ - **EfficientNetV2-S:** Reached the "final" (for now) accuracy of **98.12%** with ~78MB.
27
 
28
  ## How It Works
29
 
 
42
 
43
  - **Microsoft COCO Dataset:** A large-scale object detection, segmentation, and captioning dataset ([link](https://cocodataset.org/)).
44
  - **AI-Generated vs. Real Images:** A dataset from Kaggle ([link](https://www.kaggle.com/datasets/cashbowman/ai-generated-images-vs-real-images)) was included to make the model aware of the typical orientations on different compositions found in art and illustrations.
45
+ - **TextOCR - Text Extraction from Images Dataset:** A dataset from Kaggle ([link](https://www.kaggle.com/datasets/robikscube/textocr-text-extraction-from-images-dataset?resource=download)) was included to improve the model's ability to detect the orientation of images containing text. (However over 1300 images needed have the orientation manually corrected like 0007a5a18213563f.jpg)
46
  - **Personal Images:** A small, curated collection of personal photographs to include unique examples and edge cases.
47
 
48
+ The combined dataset consists of **70,732** unique images. Each image is augmented by being rotated in four ways (0°, 90°, 180°, 270°), creating a total of **282,928** samples. This augmented dataset was then split into **226,342 samples for training** and **56,586 samples for validation**.
49
 
50
  ## Usage
51
 
 
53
 
54
  ## Performance Comparison (PyTorch vs. ONNX)
55
 
56
+ For a dataset of non-compressed 5055 images, the performance on a RTX 4080 running in **single-thread** was:
57
 
58
  - **PyTorch (`predict.py`):** 135.71 seconds
59
  - **ONNX (`predict_onnx.py`):** 60.83 seconds
60
 
 
 
61
  ---
62
 
63
  For more in-depth information about the project, including the full source code, training scripts, and detailed documentation, please visit the [GitHub repository](https://github.com/duartebarbosadev/deep-image-orientation-detection).