ProFit: Leveraging High-Value Signals in SFT via Probability-Guided Token Selection Paper • 2601.09195 • Published 6 days ago • 11
ProFit: Leveraging High-Value Signals in SFT via Probability-Guided Token Selection Paper • 2601.09195 • Published 6 days ago • 11
Revisiting Model Interpolation for Efficient Reasoning Paper • 2510.10977 • Published Oct 13, 2025 • 9
Timber: Training-free Instruct Model Refining with Base via Effective Rank Paper • 2509.23595 • Published Sep 28, 2025 • 1
LiT: Delving into a Simplified Linear Diffusion Transformer for Image Generation Paper • 2501.12976 • Published Jan 22, 2025
PhyX: Does Your Model Have the "Wits" for Physical Reasoning? Paper • 2505.15929 • Published May 21, 2025 • 49
LLM-Neo: Parameter Efficient Knowledge Distillation for Large Language Models Paper • 2411.06839 • Published Nov 11, 2024 • 1
TencentPretrain: A Scalable and Flexible Toolkit for Pre-training Models of Different Modalities Paper • 2212.06385 • Published Dec 13, 2022
RIFormer: Keep Your Vision Backbone Effective While Removing Token Mixer Paper • 2304.05659 • Published Apr 12, 2023
Unchosen Experts Can Contribute Too: Unleashing MoE Models' Power by Self-Contrast Paper • 2405.14507 • Published May 23, 2024
Rethinking Kullback-Leibler Divergence in Knowledge Distillation for Large Language Models Paper • 2404.02657 • Published Apr 3, 2024 • 2
Weight-Inherited Distillation for Task-Agnostic BERT Compression Paper • 2305.09098 • Published May 16, 2023