Models
Datasets
Spaces
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2508.18076

Snowflake/Arctic-Text2SQL-R1-7B

8B • Updated May 29 • 5.87k • 42
Reflect, Retry, Reward: Self-Improving LLMs via Reinforcement Learning

Paper • 2505.24726 • Published May 30 • 271
Reinforcement Pre-Training

Paper • 2506.08007 • Published Jun 9 • 260
Drag-and-Drop LLMs: Zero-Shot Prompt-to-Weights

Paper • 2506.16406 • Published Jun 19 • 126

Neither Valid nor Reliable? Investigating the Use of LLMs as Judges

Paper • 2508.18076 • Published 12 days ago • 5

Confidence Is All You Need: Few-Shot RL Fine-Tuning of Language Models

Paper • 2506.06395 • Published Jun 5 • 130
Magistral

Paper • 2506.10910 • Published Jun 12 • 64
Overclocking LLM Reasoning: Monitoring and Controlling Thinking Path Lengths in LLMs

Paper • 2506.07240 • Published Jun 8 • 7
Multiverse: Your Language Models Secretly Decide How to Parallelize and Merge Generation

Paper • 2506.09991 • Published Jun 11 • 56

MM-Vet v2: A Challenging Benchmark to Evaluate Large Multimodal Models for Integrated Capabilities

Paper • 2408.00765 • Published Aug 1, 2024 • 14
Towards Achieving Human Parity on End-to-end Simultaneous Speech Translation via LLM Agent

Paper • 2407.21646 • Published Jul 31, 2024 • 18
LLM-DetectAIve: a Tool for Fine-Grained Machine-Generated Text Detection

Paper • 2408.04284 • Published Aug 8, 2024 • 26
Training Language Models on the Knowledge Graph: Insights on Hallucinations and Their Detectability

Paper • 2408.07852 • Published Aug 14, 2024 • 16

Neither Valid nor Reliable? Investigating the Use of LLMs as Judges

Paper • 2508.18076 • Published 12 days ago • 5

Deep Think with Confidence

Paper • 2508.15260 • Published 16 days ago • 81
Mind the Generation Process: Fine-Grained Confidence Estimation During LLM Generation

Paper • 2508.12040 • Published 21 days ago • 14
InternalInspector I^2: Robust Confidence Estimation in LLMs through Internal States

Paper • 2406.12053 • Published Jun 17, 2024
Neither Valid nor Reliable? Investigating the Use of LLMs as Judges

Paper • 2508.18076 • Published 12 days ago • 5

Applications and Uses

ComfyUI-R1: Exploring Reasoning Models for Workflow Generation

Paper • 2506.09790 • Published Jun 11 • 54
Saffron-1: Towards an Inference Scaling Paradigm for LLM Safety Assurance

Paper • 2506.06444 • Published Jun 6 • 74
DeepResearch Bench: A Comprehensive Benchmark for Deep Research Agents

Paper • 2506.11763 • Published Jun 13 • 70
Agentic Reasoning: Reasoning LLMs with Tools for the Deep Research

Paper • 2502.04644 • Published Feb 7 • 3

Snowflake/Arctic-Text2SQL-R1-7B

8B • Updated May 29 • 5.87k • 42
Reflect, Retry, Reward: Self-Improving LLMs via Reinforcement Learning

Paper • 2505.24726 • Published May 30 • 271
Reinforcement Pre-Training

Paper • 2506.08007 • Published Jun 9 • 260
Drag-and-Drop LLMs: Zero-Shot Prompt-to-Weights

Paper • 2506.16406 • Published Jun 19 • 126

Neither Valid nor Reliable? Investigating the Use of LLMs as Judges

Paper • 2508.18076 • Published 12 days ago • 5

Neither Valid nor Reliable? Investigating the Use of LLMs as Judges

Paper • 2508.18076 • Published 12 days ago • 5

Deep Think with Confidence

Paper • 2508.15260 • Published 16 days ago • 81
Mind the Generation Process: Fine-Grained Confidence Estimation During LLM Generation

Paper • 2508.12040 • Published 21 days ago • 14
InternalInspector I^2: Robust Confidence Estimation in LLMs through Internal States

Paper • 2406.12053 • Published Jun 17, 2024
Neither Valid nor Reliable? Investigating the Use of LLMs as Judges

Paper • 2508.18076 • Published 12 days ago • 5

Confidence Is All You Need: Few-Shot RL Fine-Tuning of Language Models

Paper • 2506.06395 • Published Jun 5 • 130
Magistral

Paper • 2506.10910 • Published Jun 12 • 64
Overclocking LLM Reasoning: Monitoring and Controlling Thinking Path Lengths in LLMs

Paper • 2506.07240 • Published Jun 8 • 7
Multiverse: Your Language Models Secretly Decide How to Parallelize and Merge Generation

Paper • 2506.09991 • Published Jun 11 • 56

Applications and Uses

ComfyUI-R1: Exploring Reasoning Models for Workflow Generation

Paper • 2506.09790 • Published Jun 11 • 54
Saffron-1: Towards an Inference Scaling Paradigm for LLM Safety Assurance

Paper • 2506.06444 • Published Jun 6 • 74
DeepResearch Bench: A Comprehensive Benchmark for Deep Research Agents

Paper • 2506.11763 • Published Jun 13 • 70
Agentic Reasoning: Reasoning LLMs with Tools for the Deep Research

Paper • 2502.04644 • Published Feb 7 • 3

MM-Vet v2: A Challenging Benchmark to Evaluate Large Multimodal Models for Integrated Capabilities

Paper • 2408.00765 • Published Aug 1, 2024 • 14
Towards Achieving Human Parity on End-to-end Simultaneous Speech Translation via LLM Agent

Paper • 2407.21646 • Published Jul 31, 2024 • 18
LLM-DetectAIve: a Tool for Fine-Grained Machine-Generated Text Detection

Paper • 2408.04284 • Published Aug 8, 2024 • 26
Training Language Models on the Knowledge Graph: Insights on Hallucinations and Their Detectability

Paper • 2408.07852 • Published Aug 14, 2024 • 16

Company

TOS Privacy About Jobs

Website

Models Datasets OCR模型免费转Markdown Pricing 模型下载攻略