Pre-training Dataset Samples Collection A collection of pre-training datasets samples of sizes 10M, 100M and 1B tokens. Ideal for use in quick experimentation and ablations. • 19 items • Updated 23 days ago • 18
view article Article Adaptive Classifier: Dynamic Text Classification with Continuous Learning Jun 20, 2025 • 18
💧 LFM2 Collection LFM2 is a new generation of hybrid models, designed for on-device deployment. • 27 items • Updated 5 days ago • 136
view article Article Model2Vec: Distill a Small Fast Model from any Sentence Transformer Oct 14, 2024 • 100
ibm-granite/granite-embedding-125m-english Sentence Similarity • 0.1B • Updated Aug 19, 2025 • 19k • • 33
Snowflake/snowflake-arctic-embed-l-v2.0 Sentence Similarity • 0.6B • Updated Jul 28, 2025 • 1.02M • • 223