The speed is too slow on the A800, only 4.8s/token。

#24
by wc-llm - opened

I use the example code,The speed is too slow on the A800, only 4.8s/token。

MiniMax org

Hello @wc-llm ,

Thank you for bringing this to our attention. We acknowledge that the current implementation in the repository is performing as expected, which includes the slower speed on the A800.

We have been actively working on an optimized version using vLLM, and a Pull Request has been submitted for this purpose. You can review the ongoing work here: #13454. This update is designed to significantly improve the speed and overall performance.

Once the vLLM PR is reviewed and approved, we will promptly update the repository with the optimized code. We appreciate your patience and understanding during this process.

In the meantime, if you have any questions or need further assistance, please feel free to reach out. We are committed to providing the best experience possible and value your feedback greatly.

Sign up or log in to comment