MatteoCargnelutti commited on
Commit
ac86603
·
verified ·
1 Parent(s): 4995454

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +4 -5
README.md CHANGED
@@ -13,9 +13,9 @@ license: apache-2.0
13
 
14
  This model was trained as part of the analysis and experiments performed in preparation of the release of the [Institutional Books 1.0 dataset](https://huggingface.co/collections/instdin/institutional-books-68366258bfb38364238477cf).
15
 
16
- It is a text classifier, that we used to assign 1 of 20 topics, derived from the first level of the [Library of Congress' Classification Outline](https://www.loc.gov/catdir/cpso/lcco/), to individual volumes.
17
 
18
- Complete experimental setup and results are available in our [technical report]() (Section 4.5).
19
 
20
  ## Base model
21
  [google-bert/bert-base-multilingual-uncased](https://huggingface.co/google-bert/bert-base-multilingual-uncased)
@@ -57,8 +57,7 @@ All of the fields listed in this example are optional.
57
  ## Training data
58
  - Train split: 80,830 samples
59
  - Test split: 5,000 samples
60
-
61
- An additional set of 1,000 samples was set aside for benchmarking purposes.
62
 
63
  ## Validation Metrics
64
  | Metric | Value |
@@ -75,7 +74,7 @@ An additional set of 1,000 samples was set aside for benchmarking purposes.
75
  | recall_weighted | 0.9694 |
76
  | accuracy | 0.9694 |
77
 
78
- **Benchmark accuracy:** 97.2% (920)
79
 
80
  ## Cite
81
  ```
 
13
 
14
  This model was trained as part of the analysis and experiments performed in preparation of the release of the [Institutional Books 1.0 dataset](https://huggingface.co/collections/instdin/institutional-books-68366258bfb38364238477cf).
15
 
16
+ We used this text classifier to assign 1 of 20 topics, derived from the first level of the [Library of Congress' Classification Outline](https://www.loc.gov/catdir/cpso/lcco/), to individual volumes.
17
 
18
+ Complete experimental setup and results are available in our [technical report](TBD) (Section 4.5).
19
 
20
  ## Base model
21
  [google-bert/bert-base-multilingual-uncased](https://huggingface.co/google-bert/bert-base-multilingual-uncased)
 
57
  ## Training data
58
  - Train split: 80,830 samples
59
  - Test split: 5,000 samples
60
+ - An additional set of 1,000 samples was set aside for benchmarking purposes
 
61
 
62
  ## Validation Metrics
63
  | Metric | Value |
 
74
  | recall_weighted | 0.9694 |
75
  | accuracy | 0.9694 |
76
 
77
+ **Post-training benchmark accuracy:** 97.2% (920)
78
 
79
  ## Cite
80
  ```