institutional
/

institutional-books-topic-classifier-bert

Text Classification

Trained with AutoTrain

Model card Files Files and versions Metrics Training metrics Community

MatteoCargnelutti commited on May 30

Commit

4995454

·

verified ·

1 Parent(s): 9fb05b3

Update README.md

Files changed (1) hide show

README.md +24 -6

README.md CHANGED Viewed

@@ -9,12 +9,19 @@ widget:
 license: apache-2.0
 ---
-# 📚 Institutional Books Pipeline
-## Training data
 ## Input format
-Text, formatted as follows:
 ```
 Title: Full title of the book
 Author: Lorem Ipsum
@@ -26,8 +33,6 @@ General Note: A great book
 All of the fields listed in this example are optional.
 ## Categories
-First level of the [Library of Congress Classification Outline](https://www.loc.gov/catdir/cpso/lcco/)
 - GENERAL WORKS
 - PHILOSOPHY. PSYCHOLOGY. RELIGION
 - AUXILIARY SCIENCES OF HISTORY
@@ -49,6 +54,12 @@ First level of the [Library of Congress Classification Outline](https://www.loc.
 - NAVAL SCIENCE
 - BIBLIOGRAPHY. LIBRARY SCIENCE. INFORMATION RESOURCES (GENERAL)
 ## Validation Metrics
 | Metric | Value |
 | --- | --- |
@@ -62,4 +73,11 @@ First level of the [Library of Congress Classification Outline](https://www.loc.
 | recall_macro | 0.9560667596679707 |
 | recall_micro | 0.9694 |
 | recall_weighted | 0.9694 |
-| accuracy | 0.9694 |

 license: apache-2.0
 ---
+# 📚 Institutional Books Topic Classifier
+This model was trained as part of the analysis and experiments performed in preparation of the release of the [Institutional Books 1.0 dataset](https://huggingface.co/collections/instdin/institutional-books-68366258bfb38364238477cf).
+It is a text classifier, that we used to assign 1 of 20 topics, derived from the first level of the [Library of Congress' Classification Outline](https://www.loc.gov/catdir/cpso/lcco/), to individual volumes.
+Complete experimental setup and results are available in our [technical report]() (Section 4.5).
+## Base model
+[google-bert/bert-base-multilingual-uncased](https://huggingface.co/google-bert/bert-base-multilingual-uncased)
 ## Input format
+Book metadata, formated as follows:
 ```
 Title: Full title of the book
 Author: Lorem Ipsum
 All of the fields listed in this example are optional.
 ## Categories
 - GENERAL WORKS
 - PHILOSOPHY. PSYCHOLOGY. RELIGION
 - AUXILIARY SCIENCES OF HISTORY
 - NAVAL SCIENCE
 - BIBLIOGRAPHY. LIBRARY SCIENCE. INFORMATION RESOURCES (GENERAL)
+## Training data
+- Train split: 80,830 samples
+- Test split: 5,000 samples
+An additional set of 1,000 samples was set aside for benchmarking purposes.
 ## Validation Metrics
 | Metric | Value |
 | --- | --- |
 | recall_macro | 0.9560667596679707 |
 | recall_micro | 0.9694 |
 | recall_weighted | 0.9694 |
+| accuracy | 0.9694 |
+**Benchmark accuracy:** 97.2% (920)
+## Cite
+```
+TBD
+```