TensorFlow Model Garden LMs: FineWeb WordPiece Tokenizer
This WordPiece tokenizer was trained as part of the TensorFlow Model Garden LMs project.
The tokenizer was trained on the sample-10BT
packages of the FineWeb and FineWeb-Edu dataset, using a vocabulary size of 64,000 subtokens.
A script for training that tokenizer can be found here.
- Downloads last month
- 2
Inference Providers
NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API:
The model has no pipeline_tag.