Article 4 Introducing ConTextual: How well can your Multimodal model jointly reason over text and image in text-rich scenes?
hbXNov/qwen_2p5_1p5b_instruct_distill_qwen_1p5b_gpt_4o_verify_1e-5_3072_e6-checkpoint-7536-merged Updated 25 days ago • 2.04k
hbXNov/qwen_2p5_1p5b_instruct_distill_qwen_1p5b_gpt_4o_verify_5e-7_3072_merged Updated 25 days ago • 8