Books from the Survivor Library (mostly ~1920s & earlier) OCR'd with recent VLMs
BEEspoke Data
community
AI & ML interests
'an LLM is only as good as the dataset it was trained on' - Sun Tzu
Recent Activity
View all activity
Organization Card
🐝📊💁
🚧"raw" pretrained smol_llama checkpoints - WIP 🚧
-
BEE-spoke-data/smol_llama-101M-GQA
Text Generation • 0.1B • Updated • 1.9k • 33 -
BEE-spoke-data/smol_llama-81M-tied
Text Generation • 81.3M • Updated • 875 • 10 -
BEE-spoke-data/smol_llama-220M-GQA
Text Generation • 0.2B • Updated • 2.37k • 13 -
BEE-spoke-data/verysmol_llama-v11-KIx2
Text Generation • 58.1M • Updated • 962 • 4
Books from the Survivor Library (mostly ~1920s & earlier) OCR'd with recent VLMs
🚧"raw" pretrained smol_llama checkpoints - WIP 🚧
-
BEE-spoke-data/smol_llama-101M-GQA
Text Generation • 0.1B • Updated • 1.9k • 33 -
BEE-spoke-data/smol_llama-81M-tied
Text Generation • 81.3M • Updated • 875 • 10 -
BEE-spoke-data/smol_llama-220M-GQA
Text Generation • 0.2B • Updated • 2.37k • 13 -
BEE-spoke-data/verysmol_llama-v11-KIx2
Text Generation • 58.1M • Updated • 962 • 4
models 58
BEE-spoke-data/NVIDIA-Nemotron-Parse-v1.2
Image-Text-to-Text • 0.9B • Updated • 25
BEE-spoke-data/neobert-100k-test
Fill-Mask • 0.1B • Updated • 1
BEE-spoke-data/tiny-random-MPNetForMaskedLM
Fill-Mask • 237k • Updated • 2
BEE-spoke-data/bpe-tokenizer-32k-smolNeoX
Updated
BEE-spoke-data/wordpiece-tokenizer-32k-en_code-orig
Updated
BEE-spoke-data/wordpiece-tokenizer-32k-en_code-msp
Updated
BEE-spoke-data/pegasus-x-base-synthsumm_open-16k
Summarization • 0.3B • Updated • 29 • 2
BEE-spoke-data/tFINE-680m-e32-d16-infinity_instruct-L2
Text Generation • 0.7B • Updated • 3
BEE-spoke-data/tFINE-680m-e32-d16-gqa-flan
0.7B • Updated
BEE-spoke-data/tFINE-900m-instruct-orpo
0.9B • Updated • 1
datasets 82
BEE-spoke-data/SurvivorLib-Nanonets-OCR-s
Viewer • Updated • 14.4k • 20 • 2
BEE-spoke-data/SurvivorLib-rolmOCR
Viewer • Updated • 14.6k • 50 • 1
BEE-spoke-data/govdocs1-pdf-source
Viewer • Updated • 235k • 854 • 4
BEE-spoke-data/napierone-pdf-nanonets-s
Viewer • Updated • 9.96k • 9
BEE-spoke-data/napierone-pdf-olmOCR
Viewer • Updated • 19k • 21
BEE-spoke-data/LONGCOT-merged-1M
Viewer • Updated • 1.7M • 38 • 2
BEE-spoke-data/cosmopedia-v2-mincols
Viewer • Updated • 39.1M • 21 • 1
BEE-spoke-data/reddit-title-body-hf
Viewer • Updated • 251M • 138 • 4
BEE-spoke-data/bigpatent-all
Viewer • Updated • 2.43M • 209
BEE-spoke-data/google_wellformed_query-hf
Viewer • Updated • 25.1k • 13