Max-Batch-Size, max-num-sequence, and fp_cache fp8_e4m3
#11
by
BenFogerty
- opened
What would be the guideline with setting the max batch size and max num sequences for optimal ITL and TTFT, additionally, would setting cache to e4m3 increase performance?