Lei Wang's picture

2 171

Lei Wang

demolei

·

https://demoleiwang.github.io/HomePage/

AI & ML interests

LLMs

Recent Activity

updated a model 6 minutes ago

demolei/Qwen2.5-1.5B-Open-R1-Distill

updated a model about 5 hours ago

demolei/Qwen-2.5-7B-Simple-RL

published a model about 7 hours ago

demolei/Qwen-2.5-7B-Simple-RL

View all activity

Organizations

demolei's activity

updated a model 6 minutes ago

demolei/Qwen2.5-1.5B-Open-R1-Distill

Text Generation • Updated 6 minutes ago

updated a model about 5 hours ago

demolei/Qwen-2.5-7B-Simple-RL

Text Generation • Updated about 5 hours ago

published a model about 7 hours ago

demolei/Qwen-2.5-7B-Simple-RL

Text Generation • Updated about 5 hours ago

published a model about 8 hours ago

demolei/DeepSeek-R1-Distill-Qwen-1.5B-GRPO

Updated about 8 hours ago

upvoted 3 papers 2 days ago

How Much Knowledge Can You Pack into a LoRA Adapter without Harming LLM?

Paper • 2502.14502 • Published 3 days ago • 64

MLGym: A New Framework and Benchmark for Advancing AI Research Agents

Paper • 2502.14499 • Published 3 days ago • 146

S*: Test Time Scaling for Code Generation

Paper • 2502.14382 • Published 3 days ago • 49

published a model 2 days ago

demolei/Qwen2.5-1.5B-Open-R1-Distill

Text Generation • Updated 6 minutes ago

upvoted a paper 4 days ago

Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention

Paper • 2502.11089 • Published 7 days ago • 133

upvoted 5 papers 5 days ago

Expect the Unexpected: FailSafe Long Context QA for Finance

Paper • 2502.06329 • Published 13 days ago • 122

Fino1: On the Transferability of Reasoning Enhanced LLMs to Finance

Paper • 2502.08127 • Published 11 days ago • 49

MME-CoT: Benchmarking Chain-of-Thought in Large Multimodal Models for Reasoning Quality, Robustness, and Efficiency

Paper • 2502.09621 • Published 10 days ago • 27

The Stochastic Parrot on LLM's Shoulder: A Summative Assessment of Physical Concept Understanding

Paper • 2502.08946 • Published 10 days ago • 181

The Danger of Overthinking: Examining the Reasoning-Action Dilemma in Agentic Tasks

Paper • 2502.08235 • Published 11 days ago • 53

upvoted 6 papers 11 days ago

Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach

Paper • 2502.05171 • Published 16 days ago • 114

Step Back to Leap Forward: Self-Backtracking for Boosting Reasoning of Language Models

Paper • 2502.04404 • Published 17 days ago • 21

SynthDetoxM: Modern LLMs are Few-Shot Parallel Detoxification Data Annotators

Paper • 2502.06394 • Published 13 days ago • 85

Exploring the Limit of Outcome Reward for Learning Mathematical Reasoning

Paper • 2502.06781 • Published 13 days ago • 59

Can 1B LLM Surpass 405B LLM? Rethinking Compute-Optimal Test-Time Scaling

Paper • 2502.06703 • Published 13 days ago • 134

Teaching Language Models to Critique via Reinforcement Learning

Paper • 2502.03492 • Published 19 days ago • 23