-
Learn Your Reference Model for Real Good Alignment
Paper • 2404.09656 • Published • 84 -
Aligning Teacher with Student Preferences for Tailored Training Data Generation
Paper • 2406.19227 • Published • 25 -
Self-Play Preference Optimization for Language Model Alignment
Paper • 2405.00675 • Published • 27 -
CantTalkAboutThis: Aligning Language Models to Stay on Topic in Dialogues
Paper • 2404.03820 • Published • 25
Collections
Discover the best community collections!
Collections including paper arxiv:2402.00742
-
Transforming and Combining Rewards for Aligning Large Language Models
Paper • 2402.00742 • Published • 12 -
UltraFeedback: Boosting Language Models with High-quality Feedback
Paper • 2310.01377 • Published • 5 -
Learn Your Reference Model for Real Good Alignment
Paper • 2404.09656 • Published • 84 -
Prometheus 2: An Open Source Language Model Specialized in Evaluating Other Language Models
Paper • 2405.01535 • Published • 121
-
Fine-Tuning Language Models from Human Preferences
Paper • 1909.08593 • Published • 3 -
Transforming and Combining Rewards for Aligning Large Language Models
Paper • 2402.00742 • Published • 12 -
Leverage the Average: an Analysis of KL Regularization in RL
Paper • 2003.14089 • Published • 2 -
Direct Preference Optimization of Video Large Multimodal Models from Language Model Reward
Paper • 2404.01258 • Published • 12
-
BioMedLM: A 2.7B Parameter Language Model Trained On Biomedical Text
Paper • 2403.18421 • Published • 23 -
Long-form factuality in large language models
Paper • 2403.18802 • Published • 25 -
stanford-crfm/BioMedLM
Text Generation • Updated • 5.31k • 416 -
Direct Preference Optimization: Your Language Model is Secretly a Reward Model
Paper • 2305.18290 • Published • 53
-
Lumiere: A Space-Time Diffusion Model for Video Generation
Paper • 2401.12945 • Published • 85 -
Long-form factuality in large language models
Paper • 2403.18802 • Published • 25 -
ObjectDrop: Bootstrapping Counterfactuals for Photorealistic Object Removal and Insertion
Paper • 2403.18818 • Published • 26 -
TC4D: Trajectory-Conditioned Text-to-4D Generation
Paper • 2403.17920 • Published • 18
-
OLMo: Accelerating the Science of Language Models
Paper • 2402.00838 • Published • 83 -
Efficient Exploration for LLMs
Paper • 2402.00396 • Published • 22 -
Can Large Language Models Understand Context?
Paper • 2402.00858 • Published • 23 -
Transforming and Combining Rewards for Aligning Large Language Models
Paper • 2402.00742 • Published • 12
-
Efficient Tool Use with Chain-of-Abstraction Reasoning
Paper • 2401.17464 • Published • 19 -
Transforming and Combining Rewards for Aligning Large Language Models
Paper • 2402.00742 • Published • 12 -
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models
Paper • 2402.03300 • Published • 97 -
Specialized Language Models with Cheap Inference from Limited Domain Data
Paper • 2402.01093 • Published • 46
-
A Picture is Worth More Than 77 Text Tokens: Evaluating CLIP-Style Models on Dense Captions
Paper • 2312.08578 • Published • 20 -
ZeroQuant(4+2): Redefining LLMs Quantization with a New FP6-Centric Strategy for Diverse Generative Tasks
Paper • 2312.08583 • Published • 12 -
Vision-Language Models as a Source of Rewards
Paper • 2312.09187 • Published • 14 -
StemGen: A music generation model that listens
Paper • 2312.08723 • Published • 48
-
S-LoRA: Serving Thousands of Concurrent LoRA Adapters
Paper • 2311.03285 • Published • 31 -
Tailoring Self-Rationalizers with Multi-Reward Distillation
Paper • 2311.02805 • Published • 7 -
Ultra-Long Sequence Distributed Transformer
Paper • 2311.02382 • Published • 6 -
OpenChat: Advancing Open-source Language Models with Mixed-Quality Data
Paper • 2309.11235 • Published • 15