Use Negative Feature Layer Indices

#9
by abrooks9944 - opened

There is a misalignment in the feature layers that are currently being used between transformers and vLLM (current values are correct for vLLM and off by one for transformers); In transformers, the first entry is the input embedding (here). However, in vLLM, this is not the case for the way the hidden states pool are formed (here).

In other words, the hidden states array in transformers contains 28 entries:
[emb, h0, h1, ..., h27]

while the hidden states pool in vLLM contains 27 entries:
[h0, h1, ..., h27]

The config reflects the correct values for what is used in vLLM, but is off by one in transformers. Both projects support negative indexing into the hidden states (with offset handling in vLLM, since only the deepest feature layer needed it loaded) - this PR changes the vision feature layers to use negative indices, which will fix the misalignment in transformers without changing the output in vLLM (no code changes needed).

I will also submit a PR to vLLM to add the embeddings to the hidden state pool if all hidden states are requested from the visual encoder.

IBM Granite org

Thanks @abrooks9944

aarbelle changed pull request status to merged

The model uses negative feature layers now that this PR was merged, but here is the link to the relevant PR for vLLM for positive feature layers - https://github.com/vllm-project/vllm/pull/13514

IBM Granite org

@abrooks9944
So future models can also use positive features?
Do we need to update anything in this model?

Hey @aarbelle - yes, in future releases of vLLM, it will align with transformers and the correct positive features layers ([4, 8, 16, 27]) can be used to get the same output as the corresponding negative layers ([-24, -20, -12, -1]),

However, I would still suggest using negative feature layers so that the outputs are consistent if the model is used with earlier versions of vLLM; the PR above also fixes a bug in vLLM that causes positive feature layers to load one more layer than needed to get the deepest feature, which will throw if the last layer is requested (e.g., for this model), so keeping it negative will ensure it will work correctly with older vLLM versions!

Also no, nothing needs to be changed in the model :)

Sign up or log in to comment