benjamin
/

Llama3-2-3B-IT-Byte

Model card Files Files and versions Community

benjamin commited on Apr 23

Commit

2b29236

·

verified ·

1 Parent(s): f037ba0

Update README.md

Files changed (1) hide show

README.md +2 -0

README.md CHANGED Viewed

@@ -82,6 +82,8 @@ python3 scripts/cross_tokenizer_distill.py \
     name=llama3_to_byte_20k
 ```
 ## Future Work
 The current version of this model is trained for 20k steps with 32*2048 bytes per batch (= 1.3B bytes ≈ 328M subword tokens total). It was unexpected that it performs as well as it does with this very short training procedure. We plan to train a new version for more steps (you can also do so yourself using [`tokenkit`](https://github.com/bminixhofer/tokenkit)).

     name=llama3_to_byte_20k
 ```
+Training took ~26 hours on a TPU v4-32.
 ## Future Work
 The current version of this model is trained for 20k steps with 32*2048 bytes per batch (= 1.3B bytes ≈ 328M subword tokens total). It was unexpected that it performs as well as it does with this very short training procedure. We plan to train a new version for more steps (you can also do so yourself using [`tokenkit`](https://github.com/bminixhofer/tokenkit)).