Update README.md
Browse files
README.md
CHANGED
@@ -15,7 +15,7 @@ This model was trained as part of the analysis and refinements performed in prep
|
|
15 |
|
16 |
We used this text classifier to assign a topic, derived from the first level of the [Library of Congress' Classification Outline](https://www.loc.gov/catdir/cpso/lcco/), to individual volumes.
|
17 |
|
18 |
-
Complete experimental setup and results are available in our [technical report](
|
19 |
|
20 |
## Base model
|
21 |
[google-bert/bert-base-multilingual-uncased](https://huggingface.co/google-bert/bert-base-multilingual-uncased)
|
@@ -95,6 +95,14 @@ print(result[0]) # {'label': 'SCIENCE', 'score': 0.9996894598007202}
|
|
95 |
```
|
96 |
|
97 |
## Cite
|
98 |
-
```
|
99 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
100 |
```
|
|
|
15 |
|
16 |
We used this text classifier to assign a topic, derived from the first level of the [Library of Congress' Classification Outline](https://www.loc.gov/catdir/cpso/lcco/), to individual volumes.
|
17 |
|
18 |
+
Complete experimental setup and results are available in our [technical report](https://arxiv.org/abs/2506.08300) (Section 4.5).
|
19 |
|
20 |
## Base model
|
21 |
[google-bert/bert-base-multilingual-uncased](https://huggingface.co/google-bert/bert-base-multilingual-uncased)
|
|
|
95 |
```
|
96 |
|
97 |
## Cite
|
98 |
+
```bibtext
|
99 |
+
@misc{cargnelutti2025institutionalbooks10242b,
|
100 |
+
title={Institutional Books 1.0: A 242B token dataset from Harvard Library's collections, refined for accuracy and usability},
|
101 |
+
author={Matteo Cargnelutti and Catherine Brobston and John Hess and Jack Cushman and Kristi Mukk and Aristana Scourtas and Kyle Courtney and Greg Leppert and Amanda Watson and Martha Whitehead and Jonathan Zittrain},
|
102 |
+
year={2025},
|
103 |
+
eprint={2506.08300},
|
104 |
+
archivePrefix={arXiv},
|
105 |
+
primaryClass={cs.CL},
|
106 |
+
url={https://arxiv.org/abs/2506.08300},
|
107 |
+
}
|
108 |
```
|