Social Bias Benchmark for Generation: A Comparison of Generation and QA-Based Evaluations Paper • 2503.06987 • Published Mar 10 • 1
MUG-Eval: A Proxy Evaluation Framework for Multilingual Generation Capabilities in Any Language Paper • 2505.14395 • Published May 20 • 6
When Does Classical Chinese Help? Quantifying Cross-Lingual Transfer in Hanja and Kanbun Paper • 2411.04822 • Published Nov 7, 2024
HERITAGE: An End-to-End Web Platform for Processing Korean Historical Documents in Hanja Paper • 2501.11951 • Published Jan 21
Survey of Cultural Awareness in Language Models: Text and Beyond Paper • 2411.00860 • Published Oct 30, 2024 • 25
Exploring Cross-Cultural Differences in English Hate Speech Annotations: From Dataset Construction to Analysis Paper • 2308.16705 • Published Aug 31, 2023
BLEnD: A Benchmark for LLMs on Everyday Knowledge in Diverse Cultures and Languages Paper • 2406.09948 • Published Jun 14, 2024 • 2
Perceptions to Beliefs: Exploring Precursory Inferences for Theory of Mind in Large Language Models Paper • 2407.06004 • Published Jul 8, 2024