YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

Pile of Law Tokenizer

This tokenizer should be a drop-in replacement for the GPT2Tokenizer. It has the same special tokens, but was trained on a random 1M samples from the pile of law train split.

It has exactly 52,000 tokens, which is not identical to GPT2.

Usage:

from transformers import AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("sam-mosaic/pile-of-law-tokenizer")
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no library tag.