LMEnt: A Suite for Analyzing Knowledge in Language Models from Pretraining Data to Representations Paper • 2509.03405 • Published 3 days ago • 17
Jump to Conclusions: Short-Cutting Transformers With Linear Transformations Paper • 2303.09435 • Published Mar 16, 2023
DiscoFuse: A Large-Scale Dataset for Discourse-Based Sentence Fusion Paper • 1902.10526 • Published Feb 27, 2019
Did Aristotle Use a Laptop? A Question Answering Benchmark with Implicit Reasoning Strategies Paper • 2101.02235 • Published Jan 6, 2021
What's in your Head? Emergent Behaviour in Multi-Task Transformer Models Paper • 2104.06129 • Published Apr 13, 2021
A Chain-of-Thought Is as Strong as Its Weakest Link: A Benchmark for Verifiers of Reasoning Chains Paper • 2402.00559 • Published Feb 1, 2024 • 3
SCROLLS: Standardized CompaRison Over Long Language Sequences Paper • 2201.03533 • Published Jan 10, 2022 • 1
Transformer Feed-Forward Layers Build Predictions by Promoting Concepts in the Vocabulary Space Paper • 2203.14680 • Published Mar 28, 2022
Backward Lens: Projecting Language Model Gradients into the Vocabulary Space Paper • 2402.12865 • Published Feb 20, 2024 • 1
Inferring Implicit Relations in Complex Questions with Language Models Paper • 2204.13778 • Published Apr 28, 2022
RAVEL: Evaluating Interpretability Methods on Disentangling Language Model Representations Paper • 2402.17700 • Published Feb 27, 2024 • 2
Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models Paper • 2206.04615 • Published Jun 9, 2022 • 5
From Insights to Actions: The Impact of Interpretability and Analysis Research on NLP Paper • 2406.12618 • Published Jun 18, 2024 • 5
Estimating Knowledge in Large Language Models Without Generating a Single Token Paper • 2406.12673 • Published Jun 18, 2024 • 8
Hopping Too Late: Exploring the Limitations of Large Language Models on Multi-Hop Queries Paper • 2406.12775 • Published Jun 18, 2024