38 37 103

Junlin Zhou

jlzhou

edwardzjl

AI & ML interests

None yet

Recent Activity

liked a dataset 2 days ago

nvidia/HelpSteer2

commented on their article 4 days ago

Distributed SFT with trl and DeepSpeed Part 2: Scaling Locally

upvoted a paper 6 days ago

s1: Simple test-time scaling

View all activity

Organizations

jlzhou's activity

liked a dataset 2 days ago

nvidia/HelpSteer2

Viewer • Updated Dec 18, 2024 • 21.4k • 6.94k • 404

commented on Distributed SFT with trl and DeepSpeed Part 2: Scaling Locally 4 days ago

I'm glad you found it helpful!

Yes, this is planned. I was originally planning to write an article about training with the training operator, but now I'm wondering if I should skip that and focus on training with the new trainer instead.

PS: Kubeflow is migrating their training component from v1 (Kubeflow Training Operator) to v2 (Kubeflow Trainer).

upvoted a paper 6 days ago

s1: Simple test-time scaling

Paper • 2501.19393 • Published 23 days ago • 105

liked a dataset 7 days ago

cognitivecomputations/dolphin-r1

Viewer • Updated 24 days ago • 814k • 5.75k • 263

upvoted a paper 10 days ago

The Differences Between Direct Alignment Algorithms are a Blur

Paper • 2502.01237 • Published 20 days ago • 111

updated a model 10 days ago

tablegpt/TableGPT2-7B

Updated 10 days ago • 1.99k • 162

updated a collection 10 days ago

TableGPT2

Collection

3 items • Updated 10 days ago • 5

commented a paper 12 days ago

PhD Knowledge Not Required: A Reasoning Challenge for Large Language Models

Paper • 2502.01584 • Published 20 days ago • 9 •

reacted to schuler's post with 👍 12 days ago

Post

7217

📢 New Research Alert: Making Language Models Smaller & Smarter!

Thrilled to share the latest technical report demonstrating how to reduce language model parameters by 77% while maintaining performance.

The secret? Grouped pointwise convolutions. Yes. We brought a method from computer vision to the transformers arena.

🔑 Key Findings:
• 77% parameter reduction.
• Maintained model capabilities.
• Improved generalization.

Paper: https://www.researchgate.net/publication/388835829_SAVING_77_OF_THE_PARAMETERS_IN_LARGE_LANGUAGE_MODELS_TECHNICAL_REPORT
Code: https://github.com/joaopauloschuler/less-parameters-llm