DuarteBarbosa commited on
Commit
9159665
·
verified ·
1 Parent(s): 669cc53

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +8 -13
README.md CHANGED
@@ -13,17 +13,11 @@ tags:
13
 
14
  This project implements a deep learning model to detect the orientation of images and determine the rotation needed to correct them. It uses a pre-trained EfficientNetV2 model from PyTorch, fine-tuned for the task of classifying images into four orientation categories: 0°, 90°, 180°, and 270°.
15
 
16
- The model achieves **98.12% accuracy** on the validation set.
17
 
18
- ## Training Performance and Model History
19
 
20
- This model was trained on a single NVIDIA RTX 4080 GPU, taking approximately **4 hours and 56.4 minutes** to complete.
21
-
22
- The final model is using `EfficientNetV2-S`, but the project evolved through several iterations:
23
-
24
- - **ResNet18:** Achieved ~90% accuracy with a model size of around 30MB.
25
- - **ResNet50:** Improved accuracy to 95.26% with a model size of ~100MB.
26
- - **EfficientNetV2-S:** Reached the "final" (for now) accuracy of **98.12%** with ~78MB.
27
 
28
  ## How It Works
29
 
@@ -32,9 +26,10 @@ The model is trained on a dataset of images, where each image is rotated by 0°,
32
  The four classes correspond to the following rotations:
33
 
34
  - **Class 0:** Image is correctly oriented (0°).
35
- - **Class 1:** Image needs to be rotated 90° Counter-Clockwise to be correct.
36
- - **Class 2:** Image needs to be rotated 180° to be correct.
37
- - **Class 3:** Image needs to be rotated 90° Clockwise to be correct.
 
38
 
39
  ## Dataset
40
 
@@ -45,7 +40,7 @@ The model was trained on several datasets:
45
  - **TextOCR - Text Extraction from Images Dataset:** A dataset from Kaggle ([link](https://www.kaggle.com/datasets/robikscube/textocr-text-extraction-from-images-dataset?resource=download)) was included to improve the model's ability to detect the orientation of images containing text. (However over 1300 images needed have the orientation manually corrected like 0007a5a18213563f.jpg)
46
  - **Personal Images:** A small, curated collection of personal photographs to include unique examples and edge cases.
47
 
48
- The combined dataset consists of **70,732** unique images. Each image is augmented by being rotated in four ways (0°, 90°, 180°, 270°), creating a total of **282,928** samples. This augmented dataset was then split into **226,342 samples for training** and **56,586 samples for validation**.
49
 
50
  ## Usage
51
 
 
13
 
14
  This project implements a deep learning model to detect the orientation of images and determine the rotation needed to correct them. It uses a pre-trained EfficientNetV2 model from PyTorch, fine-tuned for the task of classifying images into four orientation categories: 0°, 90°, 180°, and 270°.
15
 
16
+ The model achieves **98.82% accuracy** on the validation set.
17
 
18
+ ## Training Performance
19
 
20
+ This model was trained on a single NVIDIA H100 GPU, taking **5 hours, 5 minutes and 37 seconds** to complete.
 
 
 
 
 
 
21
 
22
  ## How It Works
23
 
 
26
  The four classes correspond to the following rotations:
27
 
28
  - **Class 0:** Image is correctly oriented (0°).
29
+ - **Class 1:** Image needs to be rotated **90° Clockwise** to be correct.
30
+ - **Class 2:** Image needs to be rotated **180°** to be correct.
31
+ - **Class 3:** Image needs to be rotated **90° Counter-Clockwise** to be correct.
32
+
33
 
34
  ## Dataset
35
 
 
40
  - **TextOCR - Text Extraction from Images Dataset:** A dataset from Kaggle ([link](https://www.kaggle.com/datasets/robikscube/textocr-text-extraction-from-images-dataset?resource=download)) was included to improve the model's ability to detect the orientation of images containing text. (However over 1300 images needed have the orientation manually corrected like 0007a5a18213563f.jpg)
41
  - **Personal Images:** A small, curated collection of personal photographs to include unique examples and edge cases.
42
 
43
+ The model was trained on a huge dataset of **189,018** unique images. Each image is augmented by being rotated in four ways (0°, 90°, 180°, 270°), creating a total of **756,072** samples. This augmented dataset was then split into **604,857 samples for training** and **151,215 samples for validation**.
44
 
45
  ## Usage
46