-
Adapting Vision-Language Models Without Labels: A Comprehensive Survey
Paper • 2508.05547 • Published • 11 -
Pass@k Training for Adaptively Balancing Exploration and Exploitation of Large Reasoning Models
Paper • 2508.10751 • Published • 27 -
SSRL: Self-Search Reinforcement Learning
Paper • 2508.10874 • Published • 91 -
Mind the Generation Process: Fine-Grained Confidence Estimation During LLM Generation
Paper • 2508.12040 • Published • 14
Claude
D-YZ
AI & ML interests
None yet
Recent Activity
updated
a collection
5 days ago
waiting
updated
a collection
6 days ago
waiting
updated
a collection
6 days ago
waiting
Organizations
None yet
Paper
-
OpenCodeInterpreter: Integrating Code Generation with Execution and Refinement
Paper • 2402.14658 • Published • 84 -
KAN: Kolmogorov-Arnold Networks
Paper • 2404.19756 • Published • 114 -
Understanding the performance gap between online and offline alignment algorithms
Paper • 2405.08448 • Published • 20 -
NV-Embed: Improved Techniques for Training LLMs as Generalist Embedding Models
Paper • 2405.17428 • Published • 20
RL
Reasoning
-
Chain-of-Thought Reasoning Without Prompting
Paper • 2402.10200 • Published • 110 -
Teaching Large Language Models to Reason with Reinforcement Learning
Paper • 2403.04642 • Published • 51 -
PERL: Parameter Efficient Reinforcement Learning from Human Feedback
Paper • 2403.10704 • Published • 60 -
MathScale: Scaling Instruction Tuning for Mathematical Reasoning
Paper • 2403.02884 • Published • 17
Multimodal
Model Architecture
waiting
-
Adapting Vision-Language Models Without Labels: A Comprehensive Survey
Paper • 2508.05547 • Published • 11 -
Pass@k Training for Adaptively Balancing Exploration and Exploitation of Large Reasoning Models
Paper • 2508.10751 • Published • 27 -
SSRL: Self-Search Reinforcement Learning
Paper • 2508.10874 • Published • 91 -
Mind the Generation Process: Fine-Grained Confidence Estimation During LLM Generation
Paper • 2508.12040 • Published • 14
Reasoning
-
Chain-of-Thought Reasoning Without Prompting
Paper • 2402.10200 • Published • 110 -
Teaching Large Language Models to Reason with Reinforcement Learning
Paper • 2403.04642 • Published • 51 -
PERL: Parameter Efficient Reinforcement Learning from Human Feedback
Paper • 2403.10704 • Published • 60 -
MathScale: Scaling Instruction Tuning for Mathematical Reasoning
Paper • 2403.02884 • Published • 17
Paper
-
OpenCodeInterpreter: Integrating Code Generation with Execution and Refinement
Paper • 2402.14658 • Published • 84 -
KAN: Kolmogorov-Arnold Networks
Paper • 2404.19756 • Published • 114 -
Understanding the performance gap between online and offline alignment algorithms
Paper • 2405.08448 • Published • 20 -
NV-Embed: Improved Techniques for Training LLMs as Generalist Embedding Models
Paper • 2405.17428 • Published • 20
Multimodal
RL
Model Architecture