Benchmarking Optimizers for Large Language Model Pretraining Paper • 2509.01440 • Published 5 days ago • 21 • 1
Gradient Clipping Improves AdaGrad when the Noise Is Heavy-Tailed Paper • 2406.04443 • Published Jun 6, 2024
Benchmarking Optimizers for Large Language Model Pretraining Paper • 2509.01440 • Published 5 days ago • 21
Just a Simple Transformation is Enough for Data Protection in Vertical Federated Learning Paper • 2412.11689 • Published Dec 16, 2024 • 2
Just a Simple Transformation is Enough for Data Protection in Vertical Federated Learning Paper • 2412.11689 • Published Dec 16, 2024 • 2 • 2
Unpacking SDXL Turbo: Interpreting Text-to-Image Models with Sparse Autoencoders Paper • 2410.22366 • Published Oct 28, 2024 • 84
Accurate Compression of Text-to-Image Diffusion Models via Vector Quantization Paper • 2409.00492 • Published Aug 31, 2024 • 11