Update README.md
Browse files
README.md
CHANGED
@@ -9,12 +9,19 @@ widget:
|
|
9 |
license: apache-2.0
|
10 |
---
|
11 |
|
12 |
-
# 📚 Institutional Books
|
13 |
|
14 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
15 |
|
16 |
## Input format
|
17 |
-
|
18 |
```
|
19 |
Title: Full title of the book
|
20 |
Author: Lorem Ipsum
|
@@ -26,8 +33,6 @@ General Note: A great book
|
|
26 |
All of the fields listed in this example are optional.
|
27 |
|
28 |
## Categories
|
29 |
-
First level of the [Library of Congress Classification Outline](https://www.loc.gov/catdir/cpso/lcco/)
|
30 |
-
|
31 |
- GENERAL WORKS
|
32 |
- PHILOSOPHY. PSYCHOLOGY. RELIGION
|
33 |
- AUXILIARY SCIENCES OF HISTORY
|
@@ -49,6 +54,12 @@ First level of the [Library of Congress Classification Outline](https://www.loc.
|
|
49 |
- NAVAL SCIENCE
|
50 |
- BIBLIOGRAPHY. LIBRARY SCIENCE. INFORMATION RESOURCES (GENERAL)
|
51 |
|
|
|
|
|
|
|
|
|
|
|
|
|
52 |
## Validation Metrics
|
53 |
| Metric | Value |
|
54 |
| --- | --- |
|
@@ -62,4 +73,11 @@ First level of the [Library of Congress Classification Outline](https://www.loc.
|
|
62 |
| recall_macro | 0.9560667596679707 |
|
63 |
| recall_micro | 0.9694 |
|
64 |
| recall_weighted | 0.9694 |
|
65 |
-
| accuracy | 0.9694 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
9 |
license: apache-2.0
|
10 |
---
|
11 |
|
12 |
+
# 📚 Institutional Books Topic Classifier
|
13 |
|
14 |
+
This model was trained as part of the analysis and experiments performed in preparation of the release of the [Institutional Books 1.0 dataset](https://huggingface.co/collections/instdin/institutional-books-68366258bfb38364238477cf).
|
15 |
+
|
16 |
+
It is a text classifier, that we used to assign 1 of 20 topics, derived from the first level of the [Library of Congress' Classification Outline](https://www.loc.gov/catdir/cpso/lcco/), to individual volumes.
|
17 |
+
|
18 |
+
Complete experimental setup and results are available in our [technical report]() (Section 4.5).
|
19 |
+
|
20 |
+
## Base model
|
21 |
+
[google-bert/bert-base-multilingual-uncased](https://huggingface.co/google-bert/bert-base-multilingual-uncased)
|
22 |
|
23 |
## Input format
|
24 |
+
Book metadata, formated as follows:
|
25 |
```
|
26 |
Title: Full title of the book
|
27 |
Author: Lorem Ipsum
|
|
|
33 |
All of the fields listed in this example are optional.
|
34 |
|
35 |
## Categories
|
|
|
|
|
36 |
- GENERAL WORKS
|
37 |
- PHILOSOPHY. PSYCHOLOGY. RELIGION
|
38 |
- AUXILIARY SCIENCES OF HISTORY
|
|
|
54 |
- NAVAL SCIENCE
|
55 |
- BIBLIOGRAPHY. LIBRARY SCIENCE. INFORMATION RESOURCES (GENERAL)
|
56 |
|
57 |
+
## Training data
|
58 |
+
- Train split: 80,830 samples
|
59 |
+
- Test split: 5,000 samples
|
60 |
+
|
61 |
+
An additional set of 1,000 samples was set aside for benchmarking purposes.
|
62 |
+
|
63 |
## Validation Metrics
|
64 |
| Metric | Value |
|
65 |
| --- | --- |
|
|
|
73 |
| recall_macro | 0.9560667596679707 |
|
74 |
| recall_micro | 0.9694 |
|
75 |
| recall_weighted | 0.9694 |
|
76 |
+
| accuracy | 0.9694 |
|
77 |
+
|
78 |
+
**Benchmark accuracy:** 97.2% (920)
|
79 |
+
|
80 |
+
## Cite
|
81 |
+
```
|
82 |
+
TBD
|
83 |
+
```
|