A RuntimeError occurs when I execute the code using 4 V100 (16GB) GPUs.

error:

/.venv/lib/python3.10/site-packages/transformers/generation/utils.py:2505: UserWarning: You are calling .generate() with the `input_ids` being on a device type different than your model’s device. `input_ids` is on cpu, whereas the model is on cuda. You may experience unexpected behaviors or slower generation. Please make sure that you have put `input_ids` to the correct device by calling for example input_ids = input_ids.to(‘cuda’) before running `.generate()`.
  warnings.warn(
Traceback (most recent call last):
  File “/main.py”, line 43, in <module>
    outputs = model.generate(**inputs, max_new_tokens=500)
  File “/.venv/lib/python3.10/site-packages/torch/utils/_contextlib.py”, line 116, in decorate_context
    return func(*args, **kwargs)
  File “/.venv/lib/python3.10/site-packages/transformers/generation/utils.py”, line 2633, in generate
    result = self._sample(
  File “/.venv/lib/python3.10/site-packages/transformers/generation/utils.py”, line 3614, in _sample
    outputs = self(**model_inputs, return_dict=True)
  File “/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py”, line 1751, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File “/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py”, line 1762, in _call_impl
    return forward_call(*args, **kwargs)
  File “/.venv/lib/python3.10/site-packages/accelerate/hooks.py”, line 175, in new_forward
    output = module._old_forward(*args, **kwargs)
  File “/.venv/lib/python3.10/site-packages/transformers/utils/generic.py”, line 961, in wrapper
    output = func(self, *args, **kwargs)
  File “/.venv/lib/python3.10/site-packages/transformers/models/voxtral/modeling_voxtral.py”, line 512, in forward
    inputs_embeds[audio_token_mask] = audio_embeds
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cuda:3!

run command: CUDA_VISIBLE_DEVICES=4,5,6,7 uv run main.py

code:

from transformers import VoxtralForConditionalGeneration, AutoProcessor
import torch

repo_id = "mistralai/Voxtral-Small-24B-2507"

processor = AutoProcessor.from_pretrained(repo_id)
model = VoxtralForConditionalGeneration.from_pretrained(repo_id, torch_dtype=torch.bfloat16, device_map="auto")

conversation = [
    {
        "role": "user",
        "content": [
            {
                "type": "audio",
                "path": "https://huggingface.co/datasets/hf-internal-testing/dummy-audio-samples/resolve/main/obama.mp3",
            },
            {
                "type": "audio",
                "path": "https://huggingface.co/datasets/hf-internal-testing/dummy-audio-samples/resolve/main/bcn_weather.mp3",
            },
            {"type": "text", "text": "Describe briefly what you can hear."},
        ],
    },
    {
        "role": "assistant",
        "content": "The audio begins with the speaker delivering a farewell address in Chicago, reflecting on his eight years as president and expressing gratitude to the American people. The audio then transitions to a weather report, stating that it was 35 degrees in Barcelona the previous day, but the temperature would drop to minus 20 degrees the following day.",
    },
    {
        "role": "user",
        "content": [
            {
                "type": "audio",
                "path": "https://huggingface.co/datasets/hf-internal-testing/dummy-audio-samples/resolve/main/winning_call.mp3",
            },
            {"type": "text", "text": "Ok, now compare this new audio with the previous one."},
        ],
    },
]

inputs = processor.apply_chat_template(conversation)
inputs = inputs.to("cuda", dtype=torch.bfloat16)

outputs = model.generate(**inputs, max_new_tokens=500)
decoded_outputs = processor.batch_decode(outputs[:, inputs.input_ids.shape[1]:], skip_special_tokens=True)

print("\nGenerated response:")
print("=" * 80)
print(decoded_outputs[0])
print("=" * 80)

packages:

$ uv pip list
Package                   Version
------------------------- ------------
accelerate                1.9.0
annotated-types           0.7.0
attrs                     25.3.0
audioread                 3.0.1
certifi                   2025.7.14
cffi                      1.17.1
charset-normalizer        3.4.2
decorator                 5.2.1
filelock                  3.13.1
fsspec                    2024.6.1
hf-xet                    1.1.5
huggingface-hub           0.34.1
idna                      3.10
jinja2                    3.1.4
joblib                    1.5.1
jsonschema                4.25.0
jsonschema-specifications 2025.4.1
lazy-loader               0.4
librosa                   0.11.0
llvmlite                  0.44.0
markupsafe                2.1.5
mistral-common            1.8.2
mpmath                    1.3.0
msgpack                   1.1.1
networkx                  3.3
numba                     0.61.2
numpy                     2.2.6
nvidia-cublas-cu11        11.11.3.6
nvidia-cuda-cupti-cu11    11.8.87
nvidia-cuda-nvrtc-cu11    11.8.89
nvidia-cuda-runtime-cu11  11.8.89
nvidia-cudnn-cu11         9.1.0.70
nvidia-cufft-cu11         10.9.0.58
nvidia-curand-cu11        10.3.0.86
nvidia-cusolver-cu11      11.4.1.48
nvidia-cusparse-cu11      11.7.5.86
nvidia-nccl-cu11          2.21.5
nvidia-nvtx-cu11          11.8.86
packaging                 25.0
pillow                    11.3.0
platformdirs              4.3.8
pooch                     1.8.2
psutil                    7.0.0
pycountry                 24.6.1
pycparser                 2.22
pydantic                  2.11.7
pydantic-core             2.33.2
pydantic-extra-types      2.10.5
pyyaml                    6.0.2
referencing               0.36.2
regex                     2024.11.6
requests                  2.32.4
rpds-py                   0.26.0
safetensors               0.5.3
scikit-learn              1.7.1
scipy                     1.15.3
sentencepiece             0.2.0
setuptools                70.2.0
soundfile                 0.13.1
soxr                      0.5.0.post1
sympy                     1.13.3
threadpoolctl             3.6.0
tiktoken                  0.9.0
tokenizers                0.21.2
torch                     2.7.1+cu118
torchaudio                2.7.1+cu118
torchvision               0.22.1+cu118
tqdm                      4.67.1
transformers              4.54.0.dev0
triton                    3.3.1
typing-extensions         4.14.1
typing-inspection         0.4.1
urllib3                   2.5.0