Checkpoint file corrupt?

#1
by nshmyrevgmail - opened

Trying to load a checkpoint I get this error

(tone) shmyrev@local:~/tink/T-one$ python3 /home/shmyrev/tink/t-one-repo/tone/scripts/export.py --config config.json --checkpoint model.safetensors --output_path out.onnx
Traceback (most recent call last):
  File "/home/shmyrev/tink/t-one-repo/tone/scripts/export.py", line 261, in <module>
    model = ModelToExport(config, args.checkpoint, args.chunk_duration_ms)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/shmyrev/tink/t-one-repo/tone/scripts/export.py", line 91, in __init__
    checkpoint = torch.load(checkpoint_path, map_location=torch.device("cpu"), weights_only=False)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/shmyrev/tink/tone/lib/python3.11/site-packages/torch/serialization.py", line 1554, in load
    return _legacy_load(
           ^^^^^^^^^^^^^
  File "/home/shmyrev/tink/tone/lib/python3.11/site-packages/torch/serialization.py", line 1802, in _legacy_load
    magic_number = pickle_module.load(f, **pickle_load_args)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
_pickle.UnpicklingError: unpickling stack underflow

Usually it means the checkpoint file is corrupt.

md5sum model.safetensors 
a111792853b72b5bad4502e31753c126  model.safetensors

Hi, thanks for reaching out!

The issue you're encountering is due to a change in our model checkpoint format in the export script. Older versions of the script expected model.ckpt files saved with torch.save(). The current version on the main branch is designed to work with model.safetensors checkpoints, which are the standard output from the Hugging Face Trainer. These require a different loading method and do not use torch.load().

To resolve this, please update your local code and run the updated export script:

python3 tone/scripts/export.py --path-to-pretrained /path/to/your/model_directory --output_path model.onnx

The /path/to/your/model_directory should contain files like config.json, model.safetensors, etc.

This new script handles the safetensors format correctly. If you're interested, you can see the updated loading logic here in the source code.

Please let us know if that solves the problem for you!

Sign up or log in to comment