Bui Van Hop's picture

Bui Van Hop

hllj

AI & ML interests

Computer Vision, Deep Learning, NLP

Recent Activity

Organizations

Vietnamese VLM's profile picture Hugging Face Discord Community's profile picture Open Medical's profile picture

hllj's activity

upvoted an article 11 days ago
view article
Article

From Zero to Reasoning Hero: How DeepSeek-R1 Leverages Reinforcement Learning to Master Complex Reasoning

By NormalUhr
11
upvoted an article 7 months ago
view article
Article

Fine-tune Llama 3.1 Ultra-Efficiently with Unsloth

By mlabonne
282
reacted to kenshinn's post with ❤️ 7 months ago
view post
Post
2031
Sparse MoE (SMoE) has an unavoidable drawback: the performance of SMoE heavily relies on the choice of hyper-parameters, such as the number of activated experts per token (top-k) and the number of experts.

Also, identifying the optimal hyper-parameter without a sufficient number of ablation studies is challenging. As the size of the models continues to grow, this limitation could result in a significant waste of computational resources, and in turn, could hinder the efficiency of training MoE-based models in practice.

(READ MORE ↓↓↓) Now, our DynMoE addresses these challenges! 🙌 DynMoE incorporates:
(1) a novel gating method that enables each token to automatically determine the number of experts to activate.

(2) An adaptive process automatically adjusts the number of experts during training. Extensive numerical results across Vision, Language, and Vision-Language tasks demonstrate the effectiveness of our approach to achieve competitive performance compared to GMoE for vision and language tasks, and MoE-LLaVA for vision-language tasks, while maintaining efficiency by activating fewer parameters.

Our code is available at https://github.com/LINs-lab/DynMoE, also see the checkpoints at LINs-lab/dynmoe-family-665ed5a331a7e84463cab01a