The Stochastic Parrot on LLM's Shoulder: A Summative Assessment of Physical Concept Understanding Paper • 2502.08946 • Published 11 days ago • 181
ARWKV: Pretrain is not what we need, an RNN-Attention-Based Language Model Born from Transformer Paper • 2501.15570 • Published 29 days ago • 23
Optimizing Large Language Model Training Using FP4 Quantization Paper • 2501.17116 • Published 26 days ago • 35
SafeRAG: Benchmarking Security in Retrieval-Augmented Generation of Large Language Model Paper • 2501.18636 • Published 26 days ago • 27
Demons in the Detail: On Implementing Load Balancing Loss for Training Specialized Mixture-of-Expert Models Paper • 2501.11873 • Published Jan 21 • 63
RealCritic: Towards Effectiveness-Driven Evaluation of Language Model Critiques Paper • 2501.14492 • Published about 1 month ago • 30
iFormer: Integrating ConvNet and Transformer for Mobile Application Paper • 2501.15369 • Published 29 days ago • 12
Explanatory Instructions: Towards Unified Vision Tasks Understanding and Zero-shot Generalization Paper • 2412.18525 • Published Dec 24, 2024 • 75
URSA: Understanding and Verifying Chain-of-thought Reasoning in Multimodal Mathematics Paper • 2501.04686 • Published Jan 8 • 50
Democratizing Text-to-Image Masked Generative Models with Compact Text-Aware One-Dimensional Tokens Paper • 2501.07730 • Published Jan 13 • 16
PokerBench: Training Large Language Models to become Professional Poker Players Paper • 2501.08328 • Published Jan 14 • 17