Update README.md
Browse files
README.md
CHANGED
@@ -13,17 +13,11 @@ tags:
|
|
13 |
|
14 |
This project implements a deep learning model to detect the orientation of images and determine the rotation needed to correct them. It uses a pre-trained EfficientNetV2 model from PyTorch, fine-tuned for the task of classifying images into four orientation categories: 0°, 90°, 180°, and 270°.
|
15 |
|
16 |
-
The model achieves **98.
|
17 |
|
18 |
-
## Training Performance
|
19 |
|
20 |
-
This model was trained on a single NVIDIA
|
21 |
-
|
22 |
-
The final model is using `EfficientNetV2-S`, but the project evolved through several iterations:
|
23 |
-
|
24 |
-
- **ResNet18:** Achieved ~90% accuracy with a model size of around 30MB.
|
25 |
-
- **ResNet50:** Improved accuracy to 95.26% with a model size of ~100MB.
|
26 |
-
- **EfficientNetV2-S:** Reached the "final" (for now) accuracy of **98.12%** with ~78MB.
|
27 |
|
28 |
## How It Works
|
29 |
|
@@ -32,9 +26,10 @@ The model is trained on a dataset of images, where each image is rotated by 0°,
|
|
32 |
The four classes correspond to the following rotations:
|
33 |
|
34 |
- **Class 0:** Image is correctly oriented (0°).
|
35 |
-
- **Class 1:** Image needs to be rotated 90°
|
36 |
-
- **Class 2:** Image needs to be rotated 180
|
37 |
-
- **Class 3:** Image needs to be rotated 90° Clockwise to be correct.
|
|
|
38 |
|
39 |
## Dataset
|
40 |
|
@@ -45,7 +40,7 @@ The model was trained on several datasets:
|
|
45 |
- **TextOCR - Text Extraction from Images Dataset:** A dataset from Kaggle ([link](https://www.kaggle.com/datasets/robikscube/textocr-text-extraction-from-images-dataset?resource=download)) was included to improve the model's ability to detect the orientation of images containing text. (However over 1300 images needed have the orientation manually corrected like 0007a5a18213563f.jpg)
|
46 |
- **Personal Images:** A small, curated collection of personal photographs to include unique examples and edge cases.
|
47 |
|
48 |
-
The
|
49 |
|
50 |
## Usage
|
51 |
|
|
|
13 |
|
14 |
This project implements a deep learning model to detect the orientation of images and determine the rotation needed to correct them. It uses a pre-trained EfficientNetV2 model from PyTorch, fine-tuned for the task of classifying images into four orientation categories: 0°, 90°, 180°, and 270°.
|
15 |
|
16 |
+
The model achieves **98.82% accuracy** on the validation set.
|
17 |
|
18 |
+
## Training Performance
|
19 |
|
20 |
+
This model was trained on a single NVIDIA H100 GPU, taking **5 hours, 5 minutes and 37 seconds** to complete.
|
|
|
|
|
|
|
|
|
|
|
|
|
21 |
|
22 |
## How It Works
|
23 |
|
|
|
26 |
The four classes correspond to the following rotations:
|
27 |
|
28 |
- **Class 0:** Image is correctly oriented (0°).
|
29 |
+
- **Class 1:** Image needs to be rotated **90° Clockwise** to be correct.
|
30 |
+
- **Class 2:** Image needs to be rotated **180°** to be correct.
|
31 |
+
- **Class 3:** Image needs to be rotated **90° Counter-Clockwise** to be correct.
|
32 |
+
|
33 |
|
34 |
## Dataset
|
35 |
|
|
|
40 |
- **TextOCR - Text Extraction from Images Dataset:** A dataset from Kaggle ([link](https://www.kaggle.com/datasets/robikscube/textocr-text-extraction-from-images-dataset?resource=download)) was included to improve the model's ability to detect the orientation of images containing text. (However over 1300 images needed have the orientation manually corrected like 0007a5a18213563f.jpg)
|
41 |
- **Personal Images:** A small, curated collection of personal photographs to include unique examples and edge cases.
|
42 |
|
43 |
+
The model was trained on a huge dataset of **189,018** unique images. Each image is augmented by being rotated in four ways (0°, 90°, 180°, 270°), creating a total of **756,072** samples. This augmented dataset was then split into **604,857 samples for training** and **151,215 samples for validation**.
|
44 |
|
45 |
## Usage
|
46 |
|