Feedback on vision capabilities.

#21
by babytifa - opened

Hello,
I have briefly tested the text extraction capability of this model. It does fine for English, yet it leaves room for improvement for both Chinese Simplified and Chinese Traditional. In fact, the accuracy seems to have decreased compared to the previous Mistral-Small-2503 which did not perfect the test. I have verified the results both locally and LMArena (api, I assume). The test was done on simple news snippets. For reference, Google Gemma 3 27B and Qwen 2.5 VL (even at 7B!) handled the task flawlessly.
I love what mistral and you guys are doing, I hope this feedback helps. Thank you.

Sign up or log in to comment