JuanRafap
's Collections
Interés
updated
WebRL: Training LLM Web Agents via Self-Evolving Online Curriculum
Reinforcement Learning
Paper
•
2411.02337
•
Published
•
35
Mixture-of-Transformers: A Sparse and Scalable Architecture for
Multi-Modal Foundation Models
Paper
•
2411.04996
•
Published
•
51
Large Language Models Orchestrating Structured Reasoning Achieve Kaggle
Grandmaster Level
Paper
•
2411.03562
•
Published
•
66
StructRAG: Boosting Knowledge Intensive Reasoning of LLMs via
Inference-time Hybrid Information Structurization
Paper
•
2410.08815
•
Published
•
48
Game-theoretic LLM: Agent Workflow for Negotiation Games
Paper
•
2411.05990
•
Published
•
7
BlueLM-V-3B: Algorithm and System Co-Design for Multimodal Large
Language Models on Mobile Devices
Paper
•
2411.10640
•
Published
•
45
Puzzle: Distillation-Based NAS for Inference-Optimized LLMs
Paper
•
2411.19146
•
Published
•
17
Snowflake/snowflake-arctic-embed-m-v2.0
Sentence Similarity
•
Updated
•
340k
•
58
Snowflake/snowflake-arctic-embed-l-v2.0
Sentence Similarity
•
Updated
•
90.9k
•
•
124
EXAONE 3.5: Series of Large Language Models for Real-world Use Cases
Paper
•
2412.04862
•
Published
•
50
ruliad/deepthought-8b-llama-v0.01-alpha
Text Generation
•
Updated
•
534
•
143
Critical Tokens Matter: Token-Level Contrastive Estimation Enhence LLM's
Reasoning Capability
Paper
•
2411.19943
•
Published
•
58
OCR Hinders RAG: Evaluating the Cascading Impact of OCR on
Retrieval-Augmented Generation
Paper
•
2412.02592
•
Published
•
22
RL Zero: Zero-Shot Language to Behaviors without any Supervision
Paper
•
2412.05718
•
Published
•
5
VisDoM: Multi-Document QA with Visually Rich Elements Using Multimodal
Retrieval-Augmented Generation
Paper
•
2412.10704
•
Published
•
15
RAG-RewardBench: Benchmarking Reward Models in Retrieval Augmented
Generation for Preference Alignment
Paper
•
2412.13746
•
Published
•
9
Wonderful Matrices: Combining for a More Efficient and Effective
Foundation Model Architecture
Paper
•
2412.11834
•
Published
•
7
Proposer-Agent-Evaluator(PAE): Autonomous Skill Discovery For Foundation
Model Internet Agents
Paper
•
2412.13194
•
Published
•
12
ReMoE: Fully Differentiable Mixture-of-Experts with ReLU Routing
Paper
•
2412.14711
•
Published
•
16
Ensembling Large Language Models with Process Reward-Guided Tree Search
for Better Complex Reasoning
Paper
•
2412.15797
•
Published
•
18
Progressive Multimodal Reasoning via Active Retrieval
Paper
•
2412.14835
•
Published
•
73
MixLLM: LLM Quantization with Global Mixed-precision between
Output-features and Highly-efficient System Design
Paper
•
2412.14590
•
Published
•
14
Learned Compression for Compressed Learning
Paper
•
2412.09405
•
Published
•
13
Token-Budget-Aware LLM Reasoning
Paper
•
2412.18547
•
Published
•
46
ericsonwillians/distilbert-base-uncased-steam-sentiment
Text Classification
•
Updated
•
24
Mulberry: Empowering MLLM with o1-like Reasoning and Reflection via
Collective Monte Carlo Tree Search
Paper
•
2412.18319
•
Published
•
37
Personalized Graph-Based Retrieval for Large Language Models
Paper
•
2501.02157
•
Published
•
29
HuatuoGPT-o1, Towards Medical Complex Reasoning with LLMs
Paper
•
2412.18925
•
Published
•
97
Multi-task retriever fine-tuning for domain-specific and efficient RAG
Paper
•
2501.04652
•
Published
•
10
Search-o1: Agentic Search-Enhanced Large Reasoning Models
Paper
•
2501.05366
•
Published
•
95
DepthMaster: Taming Diffusion Models for Monocular Depth Estimation
Paper
•
2501.02576
•
Published
•
15
REINFORCE++: A Simple and Efficient Approach for Aligning Large Language
Models
Paper
•
2501.03262
•
Published
•
90
BoostStep: Boosting mathematical capability of Large Language Models via
improved single-step reasoning
Paper
•
2501.03226
•
Published
•
39
Evolving Deeper LLM Thinking
Paper
•
2501.09891
•
Published
•
106
Towards Large Reasoning Models: A Survey of Reinforced Reasoning with
Large Language Models
Paper
•
2501.09686
•
Published
•
36
RLHS: Mitigating Misalignment in RLHF with Hindsight Simulation
Paper
•
2501.08617
•
Published
•
10
The Lessons of Developing Process Reward Models in Mathematical
Reasoning
Paper
•
2501.07301
•
Published
•
91
Multimodal LLMs Can Reason about Aesthetics in Zero-Shot
Paper
•
2501.09012
•
Published
•
10
ChemAgent: Self-updating Library in Large Language Models Improves
Chemical Reasoning
Paper
•
2501.06590
•
Published
•
9
CodeElo: Benchmarking Competition-level Code Generation of LLMs with
Human-comparable Elo Ratings
Paper
•
2501.01257
•
Published
•
49
Step-KTO: Optimizing Mathematical Reasoning through Stepwise Binary
Feedback
Paper
•
2501.10799
•
Published
•
15
Control LLM: Controlled Evolution for Intelligence Retention in LLM
Paper
•
2501.10979
•
Published
•
6
Autonomy-of-Experts Models
Paper
•
2501.13074
•
Published
•
41
Chain-of-Reasoning: Towards Unified Mathematical Reasoning in Large
Language Models via a Multi-Paradigm Perspective
Paper
•
2501.11110
•
Published
•
2
Forest-of-Thought: Scaling Test-Time Compute for Enhancing LLM Reasoning
Paper
•
2412.09078
•
Published
LLM2: Let Large Language Models Harness System 2 Reasoning
Paper
•
2412.20372
•
Published
TinyThinker: Distilling Reasoning through Coarse-to-Fine Knowledge
Internalization with Self-Reflection
Paper
•
2412.08024
•
Published
Table as Thought: Exploring Structured Thoughts in LLM Reasoning
Paper
•
2501.02152
•
Published
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via
Reinforcement Learning
Paper
•
2501.12948
•
Published
•
329
Self-supervised Quantized Representation for Seamlessly Integrating
Knowledge Graphs with Large Language Models
Paper
•
2501.18119
•
Published
•
24
Preference Leakage: A Contamination Problem in LLM-as-a-judge
Paper
•
2502.01534
•
Published
•
37
The Differences Between Direct Alignment Algorithms are a Blur
Paper
•
2502.01237
•
Published
•
111
SRMT: Shared Memory for Multi-agent Lifelong Pathfinding
Paper
•
2501.13200
•
Published
•
64
The Jumping Reasoning Curve? Tracking the Evolution of Reasoning
Performance in GPT-[n] and o-[n] Models on Multimodal Puzzles
Paper
•
2502.01081
•
Published
•
14
CODESIM: Multi-Agent Code Generation and Problem Solving through
Simulation-Driven Planning and Debugging
Paper
•
2502.05664
•
Published
•
22
Training Language Models for Social Deduction with Multi-Agent
Reinforcement Learning
Paper
•
2502.06060
•
Published
•
32
Paper
•
2502.06049
•
Published
•
28
Exploring the Limit of Outcome Reward for Learning Mathematical
Reasoning
Paper
•
2502.06781
•
Published
•
59
Can 1B LLM Surpass 405B LLM? Rethinking Compute-Optimal Test-Time
Scaling
Paper
•
2502.06703
•
Published
•
133
CMoE: Fast Carving of Mixture-of-Experts for Efficient LLM Inference
Paper
•
2502.04416
•
Published
•
11
Goku: Flow Based Video Generative Foundation Models
Paper
•
2502.04896
•
Published
•
88
In-Context Retrieval-Augmented Language Models
Paper
•
2302.00083
•
Published
•
1