Update README.md
Browse files
README.md
CHANGED
@@ -13,17 +13,17 @@ tags:
|
|
13 |
|
14 |
This project implements a deep learning model to detect the orientation of images and determine the rotation needed to correct them. It uses a pre-trained EfficientNetV2 model from PyTorch, fine-tuned for the task of classifying images into four orientation categories: 0°, 90°, 180°, and 270°.
|
15 |
|
16 |
-
The model achieves **
|
17 |
|
18 |
## Training Performance and Model History
|
19 |
|
20 |
-
This model was trained on a single NVIDIA RTX 4080 GPU, taking approximately **
|
21 |
|
22 |
The final model is using `EfficientNetV2-S`, but the project evolved through several iterations:
|
23 |
|
24 |
- **ResNet18:** Achieved ~90% accuracy with a model size of around 30MB.
|
25 |
- **ResNet50:** Improved accuracy to 95.26% with a model size of ~100MB.
|
26 |
-
- **EfficientNetV2-S:** Reached the "final" (for now) accuracy of **
|
27 |
|
28 |
## How It Works
|
29 |
|
@@ -42,9 +42,10 @@ The model was trained on several datasets:
|
|
42 |
|
43 |
- **Microsoft COCO Dataset:** A large-scale object detection, segmentation, and captioning dataset ([link](https://cocodataset.org/)).
|
44 |
- **AI-Generated vs. Real Images:** A dataset from Kaggle ([link](https://www.kaggle.com/datasets/cashbowman/ai-generated-images-vs-real-images)) was included to make the model aware of the typical orientations on different compositions found in art and illustrations.
|
|
|
45 |
- **Personal Images:** A small, curated collection of personal photographs to include unique examples and edge cases.
|
46 |
|
47 |
-
The combined dataset consists of **
|
48 |
|
49 |
## Usage
|
50 |
|
@@ -52,13 +53,11 @@ For detailed usage instructions, including how to run predictions, export to ONN
|
|
52 |
|
53 |
## Performance Comparison (PyTorch vs. ONNX)
|
54 |
|
55 |
-
For a dataset of 5055 images, the performance on a RTX 4080 running in **single-thread** was:
|
56 |
|
57 |
- **PyTorch (`predict.py`):** 135.71 seconds
|
58 |
- **ONNX (`predict_onnx.py`):** 60.83 seconds
|
59 |
|
60 |
-
This demonstrates a significant performance gain of approximately **55.2%** when using the ONNX model for inference.
|
61 |
-
|
62 |
---
|
63 |
|
64 |
For more in-depth information about the project, including the full source code, training scripts, and detailed documentation, please visit the [GitHub repository](https://github.com/duartebarbosadev/deep-image-orientation-detection).
|
|
|
13 |
|
14 |
This project implements a deep learning model to detect the orientation of images and determine the rotation needed to correct them. It uses a pre-trained EfficientNetV2 model from PyTorch, fine-tuned for the task of classifying images into four orientation categories: 0°, 90°, 180°, and 270°.
|
15 |
|
16 |
+
The model achieves **98.12% accuracy** on the validation set.
|
17 |
|
18 |
## Training Performance and Model History
|
19 |
|
20 |
+
This model was trained on a single NVIDIA RTX 4080 GPU, taking approximately **4 hours and 56.4 minutes** to complete.
|
21 |
|
22 |
The final model is using `EfficientNetV2-S`, but the project evolved through several iterations:
|
23 |
|
24 |
- **ResNet18:** Achieved ~90% accuracy with a model size of around 30MB.
|
25 |
- **ResNet50:** Improved accuracy to 95.26% with a model size of ~100MB.
|
26 |
+
- **EfficientNetV2-S:** Reached the "final" (for now) accuracy of **98.12%** with ~78MB.
|
27 |
|
28 |
## How It Works
|
29 |
|
|
|
42 |
|
43 |
- **Microsoft COCO Dataset:** A large-scale object detection, segmentation, and captioning dataset ([link](https://cocodataset.org/)).
|
44 |
- **AI-Generated vs. Real Images:** A dataset from Kaggle ([link](https://www.kaggle.com/datasets/cashbowman/ai-generated-images-vs-real-images)) was included to make the model aware of the typical orientations on different compositions found in art and illustrations.
|
45 |
+
- **TextOCR - Text Extraction from Images Dataset:** A dataset from Kaggle ([link](https://www.kaggle.com/datasets/robikscube/textocr-text-extraction-from-images-dataset?resource=download)) was included to improve the model's ability to detect the orientation of images containing text. (However over 1300 images needed have the orientation manually corrected like 0007a5a18213563f.jpg)
|
46 |
- **Personal Images:** A small, curated collection of personal photographs to include unique examples and edge cases.
|
47 |
|
48 |
+
The combined dataset consists of **70,732** unique images. Each image is augmented by being rotated in four ways (0°, 90°, 180°, 270°), creating a total of **282,928** samples. This augmented dataset was then split into **226,342 samples for training** and **56,586 samples for validation**.
|
49 |
|
50 |
## Usage
|
51 |
|
|
|
53 |
|
54 |
## Performance Comparison (PyTorch vs. ONNX)
|
55 |
|
56 |
+
For a dataset of non-compressed 5055 images, the performance on a RTX 4080 running in **single-thread** was:
|
57 |
|
58 |
- **PyTorch (`predict.py`):** 135.71 seconds
|
59 |
- **ONNX (`predict_onnx.py`):** 60.83 seconds
|
60 |
|
|
|
|
|
61 |
---
|
62 |
|
63 |
For more in-depth information about the project, including the full source code, training scripts, and detailed documentation, please visit the [GitHub repository](https://github.com/duartebarbosadev/deep-image-orientation-detection).
|