LLM Training - a mphielipp Collection

mphielipp 's Collections

RL for Autoregressive Tasks

CUDA Optimization

Light TTS models

Datasets for Robotic Learning

Diffusion and RL

VLM

Visual Reasoning and LLMs

Diffusion Transformers

Conditional Diffusion

SSMs and Diffusion

Self Pedicting Learning in RL

LLMs Evaluation

CV

VLA

LLM Training

updated 27 days ago

LlamaFactory: Unified Efficient Fine-Tuning of 100+ Language Models

Paper • 2403.13372 • Published Mar 20, 2024 • 135
On the Generalization of SFT: A Reinforcement Learning Perspective with Reward Rectification

Paper • 2508.05629 • Published 29 days ago • 170