Critic-CoT: Boosting the reasoning abilities of large language model via Chain-of-thoughts Critic Paper • 2408.16326 • Published Aug 29, 2024 • 1
Scalable Oversight for Superhuman AI via Recursive Self-Critiquing Paper • 2502.04675 • Published Feb 7
Cheems: A Practical Guidance for Building and Evaluating Chinese Reward Models from Scratch Paper • 2502.17173 • Published Feb 24
On-Policy Self-Alignment with Fine-grained Knowledge Feedback for Hallucination Mitigation Paper • 2406.12221 • Published Jun 18, 2024
Rethinking Reward Model Evaluation: Are We Barking up the Wrong Tree? Paper • 2410.05584 • Published Oct 8, 2024
Offline Pseudo Relevance Feedback for Efficient and Effective Single-pass Dense Retrieval Paper • 2308.10191 • Published Aug 20, 2023
The Devil Is in the Details: Tackling Unimodal Spurious Correlations for Generalizable Multimodal Reward Models Paper • 2503.03122 • Published Mar 5
Type-supervised sequence labeling based on the heterogeneous star graph for named entity recognition Paper • 2210.10240 • Published Oct 19, 2022