institutional
/

institutional-books-topic-classifier-bert

Text Classification

Trained with AutoTrain

Model card Files Files and versions Metrics Training metrics Community

MatteoCargnelutti commited on Jun 11

Commit

f3b6f0d

·

verified ·

1 Parent(s): 2cc39e5

Update README.md

Files changed (1) hide show

README.md +11 -3

README.md CHANGED Viewed

@@ -15,7 +15,7 @@ This model was trained as part of the analysis and refinements performed in prep
 We used this text classifier to assign a topic, derived from the first level of the [Library of Congress' Classification Outline](https://www.loc.gov/catdir/cpso/lcco/), to individual volumes.
-Complete experimental setup and results are available in our [technical report](TBD) (Section 4.5).
 ## Base model
 [google-bert/bert-base-multilingual-uncased](https://huggingface.co/google-bert/bert-base-multilingual-uncased)
@@ -95,6 +95,14 @@ print(result[0]) # {'label': 'SCIENCE', 'score': 0.9996894598007202}
 ```
 ## Cite
-```
-TBD
 ```

 We used this text classifier to assign a topic, derived from the first level of the [Library of Congress' Classification Outline](https://www.loc.gov/catdir/cpso/lcco/), to individual volumes.
+Complete experimental setup and results are available in our [technical report](https://arxiv.org/abs/2506.08300) (Section 4.5).
 ## Base model
 [google-bert/bert-base-multilingual-uncased](https://huggingface.co/google-bert/bert-base-multilingual-uncased)
 ```
 ## Cite
+```bibtext
+@misc{cargnelutti2025institutionalbooks10242b,
+      title={Institutional Books 1.0: A 242B token dataset from Harvard Library's collections, refined for accuracy and usability},
+      author={Matteo Cargnelutti and Catherine Brobston and John Hess and Jack Cushman and Kristi Mukk and Aristana Scourtas and Kyle Courtney and Greg Leppert and Amanda Watson and Martha Whitehead and Jonathan Zittrain},
+      year={2025},
+      eprint={2506.08300},
+      archivePrefix={arXiv},
+      primaryClass={cs.CL},
+      url={https://arxiv.org/abs/2506.08300},
+}
 ```