MatteoCargnelutti commited on
Commit
f3b6f0d
·
verified ·
1 Parent(s): 2cc39e5

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +11 -3
README.md CHANGED
@@ -15,7 +15,7 @@ This model was trained as part of the analysis and refinements performed in prep
15
 
16
  We used this text classifier to assign a topic, derived from the first level of the [Library of Congress' Classification Outline](https://www.loc.gov/catdir/cpso/lcco/), to individual volumes.
17
 
18
- Complete experimental setup and results are available in our [technical report](TBD) (Section 4.5).
19
 
20
  ## Base model
21
  [google-bert/bert-base-multilingual-uncased](https://huggingface.co/google-bert/bert-base-multilingual-uncased)
@@ -95,6 +95,14 @@ print(result[0]) # {'label': 'SCIENCE', 'score': 0.9996894598007202}
95
  ```
96
 
97
  ## Cite
98
- ```
99
- TBD
 
 
 
 
 
 
 
 
100
  ```
 
15
 
16
  We used this text classifier to assign a topic, derived from the first level of the [Library of Congress' Classification Outline](https://www.loc.gov/catdir/cpso/lcco/), to individual volumes.
17
 
18
+ Complete experimental setup and results are available in our [technical report](https://arxiv.org/abs/2506.08300) (Section 4.5).
19
 
20
  ## Base model
21
  [google-bert/bert-base-multilingual-uncased](https://huggingface.co/google-bert/bert-base-multilingual-uncased)
 
95
  ```
96
 
97
  ## Cite
98
+ ```bibtext
99
+ @misc{cargnelutti2025institutionalbooks10242b,
100
+ title={Institutional Books 1.0: A 242B token dataset from Harvard Library's collections, refined for accuracy and usability},
101
+ author={Matteo Cargnelutti and Catherine Brobston and John Hess and Jack Cushman and Kristi Mukk and Aristana Scourtas and Kyle Courtney and Greg Leppert and Amanda Watson and Martha Whitehead and Jonathan Zittrain},
102
+ year={2025},
103
+ eprint={2506.08300},
104
+ archivePrefix={arXiv},
105
+ primaryClass={cs.CL},
106
+ url={https://arxiv.org/abs/2506.08300},
107
+ }
108
  ```