Hello, the KV cache memory requirements in FP16 of 405B, 0.984 GB 15.38 GB 123.05 GB, these three values look like from FP32, could you double-check it? And how to get the KV cache memory when I get a new LLM?
Haric HU
aHaric
AI & ML interests
None yet
Recent Activity
commented on
an
article
13 days ago
Llama 3.1 - 405B, 70B & 8B with multilinguality and long context
Organizations
None yet
aHaric's activity
commented on
Llama 3.1 - 405B, 70B & 8B with multilinguality and long context
13 days ago