Junlin Zhou

jlzhou

AI & ML interests

None yet

Recent Activity

Organizations

TableGPT's profile picture

jlzhou's activity

view reply

I'm glad you found it helpful!

Yes, this is planned. I was originally planning to write an article about training with the training operator, but now I'm wondering if I should skip that and focus on training with the new trainer instead.

PS: Kubeflow is migrating their training component from v1 (Kubeflow Training Operator) to v2 (Kubeflow Trainer).

reacted to schuler's post with πŸ‘ 12 days ago
view post
Post
7217
πŸ“’ New Research Alert: Making Language Models Smaller & Smarter!

Thrilled to share the latest technical report demonstrating how to reduce language model parameters by 77% while maintaining performance.

The secret? Grouped pointwise convolutions. Yes. We brought a method from computer vision to the transformers arena.

πŸ”‘ Key Findings:
β€’ 77% parameter reduction.
β€’ Maintained model capabilities.
β€’ Improved generalization.

Paper: https://www.researchgate.net/publication/388835829_SAVING_77_OF_THE_PARAMETERS_IN_LARGE_LANGUAGE_MODELS_TECHNICAL_REPORT
Code: https://github.com/joaopauloschuler/less-parameters-llm
  • 2 replies
Β·
upvoted an article 12 days ago
view article
Article

DeepSeek-R1 Dissection: Understanding PPO & GRPO Without Any Prior Reinforcement Learning Knowledge

By NormalUhr β€’
β€’ 42
New activity in jlzhou/Qwen2.5-3B-Infinity-Instruct-0625 16 days ago

Adding Evaluation Results

#1 opened 16 days ago by
jlzhou
published an article 16 days ago
view article
Article

Distributed SFT with trl and DeepSpeed Part 2: Scaling Locally

By jlzhou β€’
β€’ 2