Drivel-ology: Challenging LLMs with Interpreting Nonsense with Depth
Abstract
LLMs struggle with understanding the nuanced, context-dependent meanings of Drivelological text, which appears nonsensical but contains deeper semantic layers.
We introduce Drivelology, a unique linguistic phenomenon characterised as "nonsense with depth", utterances that are syntactically coherent yet pragmatically paradoxical, emotionally loaded, or rhetorically subversive. While such expressions may resemble surface-level nonsense, they encode implicit meaning requiring contextual inference, moral reasoning, or emotional interpretation. We find that current large language models (LLMs), despite excelling at many natural language processing (NLP) tasks, consistently fail to grasp the layered semantics of Drivelological text. To investigate this, we construct a small but diverse benchmark dataset of over 1,200 meticulously curated examples, with select instances in English, Mandarin, Spanish, French, Japanese, and Korean. Annotation was especially challenging: each of the examples required careful expert review to verify that it truly reflected Drivelological characteristics. The process involved multiple rounds of discussion and adjudication to address disagreements, highlighting the subtle and subjective nature of the Drivelology. We evaluate a range of LLMs on classification, generation, and reasoning tasks. Our results reveal clear limitations of LLMs: models often confuse Drivelology with shallow nonsense, produce incoherent justifications, or miss the implied rhetorical function altogether. These findings highlight a deeper representational gap in LLMs' pragmatic understanding and challenge the assumption that statistical fluency implies cognitive comprehension. We release our dataset and code to facilitate further research in modelling linguistic depth beyond surface-level coherence.
Community
Introducing Drivelology (幹話文學): a new linguistic phenomenon we define as "nonsense with depth." Our EMNLP 2025 (oral) paper presents a stress test with 1,200+ examples across 5 Drivelology types, revealing distinct failure modes across state-of-the-art LLMs.
Very impressive paper, there seems to be a large gap when trying to translate the Chinese language to English. Hope there are further studies in this area
@Harikyusocials Thanks for the thoughtful question! The gap you're noticing isn't just a Mandarin → English issue. Many Drivelology examples are deliberately "nonsense with depth": syntactically coherent but culturally loaded, paradoxical, or rhetorically subversive. That means some phrases depend heavily on prior cultural knowledge, social cues, or even irony embedded in everyday life.
When such examples are translated, the literal words can cross languages, but the Drivelological sense (the multi-layered humour, paradox, or social critique) often does not. For example, a pun, proverb inversion, or culturally embedded reference may only resonate with readers who share that cultural background. This isn't limited to Mandarin, similar issues arise in other languages as well.
So the difficulty is less about "translation quality" and more about how Drivelology encodes meaning at multiple levels, with implicit cultural or rhetorical signals that don't always carry over neatly. That's exactly why the paper emphasises Drivelology as a benchmark: it highlights the deep gap between surface fluency and genuine cultural-semantic understanding.
Models citing this paper 0
No model linking this paper
Datasets citing this paper 1
Spaces citing this paper 0
No Space linking this paper
Collections including this paper 0
No Collection including this paper