Align - a CelesteChen Collection

CelesteChen 's Collections

models

code

RAG

others

math

Align

Align

updated about 19 hours ago

PopAlign: Diversifying Contrasting Patterns for a More Comprehensive Alignment

Paper • 2410.13785 • Published Oct 17, 2024 • 19
Aligning Large Language Models via Self-Steering Optimization

Paper • 2410.17131 • Published Oct 22, 2024 • 23
Baichuan Alignment Technical Report

Paper • 2410.14940 • Published Oct 19, 2024 • 52
SemiEvol: Semi-supervised Fine-tuning for LLM Adaptation

Paper • 2410.14745 • Published Oct 17, 2024 • 48
RM-Bench: Benchmarking Reward Models of Language Models with Subtlety and Style

Paper • 2410.16184 • Published Oct 21, 2024 • 25
Skywork-Reward: Bag of Tricks for Reward Modeling in LLMs

Paper • 2410.18451 • Published Oct 24, 2024 • 20
Unleashing Reasoning Capability of LLMs via Scalable Question Synthesis from Scratch

Paper • 2410.18693 • Published Oct 24, 2024 • 43
A Critical Evaluation of AI Feedback for Aligning Large Language Models

Paper • 2402.12366 • Published Feb 19, 2024 • 3
WebRL: Training LLM Web Agents via Self-Evolving Online Curriculum Reinforcement Learning

Paper • 2411.02337 • Published Nov 4, 2024 • 38
SelfCodeAlign: Self-Alignment for Code Generation

Paper • 2410.24198 • Published Oct 31, 2024 • 25
Constraint Back-translation Improves Complex Instruction Following of Large Language Models

Paper • 2410.24175 • Published Oct 31, 2024 • 18
Accelerating Direct Preference Optimization with Prefix Sharing

Paper • 2410.20305 • Published Oct 27, 2024 • 6
Self-Consistency Preference Optimization

Paper • 2411.04109 • Published Nov 6, 2024 • 19
Thanos: Enhancing Conversational Agents with Skill-of-Mind-Infused Large Language Model

Paper • 2411.04496 • Published Nov 7, 2024 • 24
Direct Preference Optimization Using Sparse Feature-Level Constraints

Paper • 2411.07618 • Published Nov 12, 2024 • 17
OpenScholar: Synthesizing Scientific Literature with Retrieval-augmented LMs

Paper • 2411.14199 • Published Nov 21, 2024 • 32
SynFinTabs: A Dataset of Synthetic Financial Tables for Information and Table Extraction

Paper • 2412.04262 • Published Dec 5, 2024 • 5
EXAONE 3.5: Series of Large Language Models for Real-world Use Cases

Paper • 2412.04862 • Published Dec 6, 2024 • 51
MAmmoTH-VL: Eliciting Multimodal Reasoning with Instruction Tuning at Scale

Paper • 2412.05237 • Published Dec 6, 2024 • 48
I Don't Know: Explicit Modeling of Uncertainty with an [IDK] Token

Paper • 2412.06676 • Published Dec 9, 2024 • 9
Phi-4 Technical Report

Paper • 2412.08905 • Published Dec 12, 2024 • 121
Test-Time Preference Optimization: On-the-Fly Alignment via Iterative Textual Feedback

Paper • 2501.12895 • Published Jan 22 • 62
SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training

Paper • 2501.17161 • Published Jan 28 • 123
Data Valuation using Neural Networks for Efficient Instruction Fine-Tuning

Paper • 2502.09969 • Published Feb 14 • 1
DuPO: Enabling Reliable LLM Self-Verification via Dual Preference Optimization

Paper • 2508.14460 • Published 17 days ago • 80
TreePO: Bridging the Gap of Policy Optimization and Efficacy and Inference Efficiency with Heuristic Tree-based Modeling

Paper • 2508.17445 • Published 12 days ago • 77
Inverse IFEval: Can LLMs Unlearn Stubborn Training Conventions to Follow Real Instructions?

Paper • 2509.04292 • Published 1 day ago • 41