GR00T N1: An Open Foundation Model for Generalist Humanoid Robots
Paper
• 2503.14734
• Published
• 6
Mobile ALOHA: Learning Bimanual Mobile Manipulation with Low-Cost
Whole-Body Teleoperation
Paper
• 2401.02117
• Published
• 33
SmolVLA: A Vision-Language-Action Model for Affordable and Efficient
Robotics
Paper
• 2506.01844
• Published
• 151
Vision-Guided Chunking Is All You Need: Enhancing RAG with Multimodal
Document Understanding
Paper
• 2506.16035
• Published
• 89
Deep Researcher with Test-Time Diffusion
Paper
• 2507.16075
• Published
• 68
The Geometry of LLM Quantization: GPTQ as Babai's Nearest Plane
Algorithm
Paper
• 2507.18553
• Published
• 41
MMBench-GUI: Hierarchical Multi-Platform Evaluation Framework for GUI
Agents
Paper
• 2507.19478
• Published
• 32
CLEAR: Error Analysis via LLM-as-a-Judge Made Easy
Paper
• 2507.18392
• Published
• 20
PRIX: Learning to Plan from Raw Pixels for End-to-End Autonomous Driving
Paper
• 2507.17596
• Published
• 7
Specification Self-Correction: Mitigating In-Context Reward Hacking
Through Test-Time Refinement
Paper
• 2507.18742
• Published
• 6
Chat with AI: The Surprising Turn of Real-time Video Communication from
Human to AI
Paper
• 2507.10510
• Published
• 5
GEPA: Reflective Prompt Evolution Can Outperform Reinforcement Learning
Paper
• 2507.19457
• Published
• 30
Frontier AI Risk Management Framework in Practice: A Risk Analysis
Technical Report
Paper
• 2507.16534
• Published
• 9
A Survey of Context Engineering for Large Language Models
Paper
• 2507.13334
• Published
• 261
GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable
Reinforcement Learning
Paper
• 2507.01006
• Published
• 251
Group Sequence Policy Optimization
Paper
• 2507.18071
• Published
• 317
Scaling RL to Long Videos
Paper
• 2507.07966
• Published
• 160
MemOS: A Memory OS for AI System
Paper
• 2507.03724
• Published
• 159
Kwai Keye-VL Technical Report
Paper
• 2507.01949
• Published
• 131
GUI-G^2: Gaussian Reward Modeling for GUI Grounding
Paper
• 2507.15846
• Published
• 133
Beyond Context Limits: Subconscious Threads for Long-Horizon Reasoning
Paper
• 2507.16784
• Published
• 122
T-LoRA: Single Image Diffusion Model Customization Without Overfitting
Paper
• 2507.05964
• Published
• 120
MiroMind-M1: An Open-Source Advancement in Mathematical Reasoning via
Context-Aware Multi-Stage Policy Optimization
Paper
• 2507.14683
• Published
• 134
LongMemEval: Benchmarking Chat Assistants on Long-Term Interactive
Memory
Paper
• 2410.10813
• Published
• 14
LiveCodeBench Pro: How Do Olympiad Medalists Judge LLMs in Competitive
Programming?
Paper
• 2506.11928
• Published
• 24
Defeating Prompt Injections by Design
Paper
• 2503.18813
• Published
• 24
Darwin Godel Machine: Open-Ended Evolution of Self-Improving Agents
Paper
• 2505.22954
• Published
• 14
Questioning Representational Optimism in Deep Learning: The Fractured
Entangled Representation Hypothesis
Paper
• 2505.11581
• Published
• 3
The AI Scientist: Towards Fully Automated Open-Ended Scientific
Discovery
Paper
• 2408.06292
• Published
• 128
Evaluating Large Language Models Trained on Code
Paper
• 2107.03374
• Published
• 8
Self-Refine: Iterative Refinement with Self-Feedback
Paper
• 2303.17651
• Published
• 2
Gorilla: Large Language Model Connected with Massive APIs
Paper
• 2305.15334
• Published
• 6
HuggingGPT: Solving AI Tasks with ChatGPT and its Friends in HuggingFace
Paper
• 2303.17580
• Published
• 15
Communicative Agents for Software Development
Paper
• 2307.07924
• Published
• 6
AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation
Framework
Paper
• 2308.08155
• Published
• 11
The Illusion of Diminishing Returns: Measuring Long Horizon Execution in
LLMs
Paper
• 2509.09677
• Published
• 35
In-the-Flow Agentic System Optimization for Effective Planning and Tool
Use
Paper
• 2510.05592
• Published
• 107
Absolute Zero: Reinforced Self-play Reasoning with Zero Data
Paper
• 2505.03335
• Published
• 189
Inference-Time Scaling for Generalist Reward Modeling
Paper
• 2504.02495
• Published
• 58
BAP v2: An Enhanced Task Framework for Instruction Following in
Minecraft Dialogues
Paper
• 2501.10836
• Published
• 1
Executable Code Actions Elicit Better LLM Agents
Paper
• 2402.01030
• Published
• 188
DynaSaur: Large Language Agents Beyond Predefined Actions
Paper
• 2411.01747
• Published
• 37
If LLM Is the Wizard, Then Code Is the Wand: A Survey on How Code
Empowers Large Language Models to Serve as Intelligent Agents
Paper
• 2401.00812
• Published
• 11
Agent Data Protocol: Unifying Datasets for Diverse, Effective
Fine-tuning of LLM Agents
Paper
• 2510.24702
• Published
• 30
Strategic Dishonesty Can Undermine AI Safety Evaluations of Frontier LLM
Paper
• 2509.18058
• Published
• 12
Speculative Safety-Aware Decoding
Paper
• 2508.17739
• Published
Latent Fusion Jailbreak: Blending Harmful and Harmless Representations
to Elicit Unsafe LLM Outputs
Paper
• 2508.10029
• Published
Context Misleads LLMs: The Role of Context Filtering in Maintaining Safe
Alignment of LLMs
Paper
• 2508.10031
• Published
Poison Once, Refuse Forever: Weaponizing Alignment for Injecting Bias in
LLMs
Paper
• 2508.20333
• Published
Mitigating Jailbreaks with Intent-Aware LLMs
Paper
• 2508.12072
• Published
D-REX: A Benchmark for Detecting Deceptive Reasoning in Large Language
Models
Paper
• 2509.17938
• Published
• 4
A Simple and Efficient Jailbreak Method Exploiting LLMs' Helpfulness
Paper
• 2509.14297
• Published
Less is More: Recursive Reasoning with Tiny Networks
Paper
• 2510.04871
• Published
• 509
HumanEval Pro and MBPP Pro: Evaluating Large Language Models on
Self-invoking Code Generation
Paper
• 2412.21199
• Published
• 13
Solving Inequality Proofs with Large Language Models
Paper
• 2506.07927
• Published
• 20
ReForm: Reflective Autoformalization with Prospective Bounded Sequence
Optimization
Paper
• 2510.24592
• Published
• 17
FineWeb2: One Pipeline to Scale Them All -- Adapting Pre-Training Data
Processing to Every Language
Paper
• 2506.20920
• Published
• 77
GAIA: a benchmark for General AI Assistants
Paper
• 2311.12983
• Published
• 245
AssetOpsBench: Benchmarking AI Agents for Task Automation in Industrial
Asset Operations and Maintenance
Paper
• 2506.03828
• Published
• 17
MMGR: Multi-Modal Generative Reasoning
Paper
• 2512.14691
• Published
• 119
Next-Embedding Prediction Makes Strong Vision Learners
Paper
• 2512.16922
• Published
• 87
mHC: Manifold-Constrained Hyper-Connections
Paper
• 2512.24880
• Published
• 311
SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model
Paper
• 2502.02737
• Published
• 255