Checkpoint file corrupt?
Trying to load a checkpoint I get this error
(tone) shmyrev@local:~/tink/T-one$ python3 /home/shmyrev/tink/t-one-repo/tone/scripts/export.py --config config.json --checkpoint model.safetensors --output_path out.onnx
Traceback (most recent call last):
File "/home/shmyrev/tink/t-one-repo/tone/scripts/export.py", line 261, in <module>
model = ModelToExport(config, args.checkpoint, args.chunk_duration_ms)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/shmyrev/tink/t-one-repo/tone/scripts/export.py", line 91, in __init__
checkpoint = torch.load(checkpoint_path, map_location=torch.device("cpu"), weights_only=False)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/shmyrev/tink/tone/lib/python3.11/site-packages/torch/serialization.py", line 1554, in load
return _legacy_load(
^^^^^^^^^^^^^
File "/home/shmyrev/tink/tone/lib/python3.11/site-packages/torch/serialization.py", line 1802, in _legacy_load
magic_number = pickle_module.load(f, **pickle_load_args)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
_pickle.UnpicklingError: unpickling stack underflow
Usually it means the checkpoint file is corrupt.
md5sum model.safetensors
a111792853b72b5bad4502e31753c126 model.safetensors
Hi, thanks for reaching out!
The issue you're encountering is due to a change in our model checkpoint format in the export script. Older versions of the script expected model.ckpt
files saved with torch.save()
. The current version on the main branch is designed to work with model.safetensors
checkpoints, which are the standard output from the Hugging Face Trainer. These require a different loading method and do not use torch.load()
.
To resolve this, please update your local code and run the updated export script:
python3 tone/scripts/export.py --path-to-pretrained /path/to/your/model_directory --output_path model.onnx
The /path/to/your/model_directory
should contain files like config.json
, model.safetensors
, etc.
This new script handles the safetensors format correctly. If you're interested, you can see the updated loading logic here in the source code.
Please let us know if that solves the problem for you!