SQL-of-Thought: Multi-agentic Text-to-SQL with Guided Error Correction
Abstract
A multi-agent framework decomposes the Text2SQL task into several components, using in-context learning and taxonomy-guided error modification to achieve state-of-the-art results.
Converting natural language queries into SQL queries is a crucial challenge in both industry and academia, aiming to increase access to databases and large-scale applications. This work examines how in-context learning and chain-of-thought can be utilized to develop a robust solution for text-to-SQL systems. We propose SQL-of-Thought: a multi-agent framework that decomposes the Text2SQL task into schema linking, subproblem identification, query plan generation, SQL generation, and a guided correction loop. Unlike prior systems that rely only on execution-based static correction, we introduce taxonomy-guided dynamic error modification informed by in-context learning. SQL-of-Thought achieves state-of-the-art results on the Spider dataset and its variants, combining guided error taxonomy with reasoning-based query planning.
Community
SQL-of-Thought introduces a multi-agent Text-to-SQL framework that decomposes queries into subproblems, plans before generating SQL, and applies a compact error taxonomy in a correction loop to surpass prior SOTA on Spider benchmarks.
➡️ 𝐊𝐞𝐲 𝐇𝐢𝐠𝐡𝐥𝐢𝐠𝐡𝐭𝐬 𝐨𝐟 𝐨𝐮𝐫 𝐒𝐐𝐋-𝐨𝐟-𝐓𝐡𝐨𝐮𝐠𝐡𝐭 𝐅𝐫𝐚𝐦𝐞𝐰𝐨𝐫𝐤:
🧩 𝑴𝒖𝒍𝒕𝒊-𝑨𝒈𝒆𝒏𝒕 𝑺𝒄𝒉𝒆𝒎𝒂-𝑻𝒐-𝑺𝑸𝑳 𝑷𝒊𝒑𝒆𝒍𝒊𝒏𝒆: Specialized agents handle schema linking, clause-level subproblem decomposition, query planning, SQL synthesis, and correction, mirroring how humans iteratively draft and refine SQL. The separation of planning and generation reduces hallucination and enforces logical structure.
🧠 𝑷𝒍𝒂𝒏-𝑻𝒉𝒆𝒏-𝑮𝒆𝒏𝒆𝒓𝒂𝒕𝒆 𝒘𝒊𝒕𝒉 𝑪𝒉𝒂𝒊𝒏-𝒐𝒇-𝑻𝒉𝒐𝒖𝒈𝒉𝒕: SQL is derived not directly from text but from an intermediate query plan, enabling more faithful reasoning. Chain-of-thought ensures clause consistency and facilitates transparent debugging when corrections are required.
🛠️ 𝑬𝒓𝒓𝒐𝒓 𝑻𝒂𝒙𝒐𝒏𝒐𝒎𝒚-𝑮𝒖𝒊𝒅𝒆𝒅 𝑪𝒐𝒓𝒓𝒆𝒄𝒕𝒊𝒐𝒏 𝑳𝒐𝒐𝒑: Instead of blind regeneration, errors are classified into a concise taxonomy (e.g., missing joins, agg-no-groupby, having-vs-where). Correction agents use this taxonomy to target logical flaws that execution errors alone cannot detect.
📊 𝑺𝒕𝒂𝒕𝒆-𝒐𝒇-𝒕𝒉𝒆-𝑨𝒓𝒕 𝑹𝒆𝒔𝒖𝒍𝒕𝒔 𝒘𝒊𝒕𝒉 𝑹𝒐𝒃𝒖𝒔𝒕 𝑮𝒆𝒏𝒆𝒓𝒂𝒍𝒊𝒛𝒂𝒕𝒊𝒐𝒏: Achieves 91.59% execution accuracy on Spider dev, 90.16% on Spider-Realistic, and 82.01% on Spider-SYN. Ablations confirm that removing the correction loop or intermediate planning significantly degrades accuracy, validating both contributions.
⚡ 𝑷𝒓𝒂𝒄𝒕𝒊𝒄𝒂𝒍 𝑬𝒇𝒇𝒊𝒄𝒊𝒆𝒏𝒄𝒚 + 𝑪𝒐𝒔𝒕 𝑯𝒚𝒃𝒓𝒊𝒅𝒊𝒛𝒂𝒕𝒊𝒐𝒏: Uses a hybrid setup where stronger models (Claude 3 Opus) are invoked selectively in planning/correction, while lighter models handle subproblem decomposition and synthesis, balancing accuracy with token and cost efficiency.
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper