arxiv:2509.00581

SQL-of-Thought: Multi-agentic Text-to-SQL with Guided Error Correction

Published on Aug 30

· Submitted by

amanchadha on Sep 3

Upvote

Authors:

Saumya Chaturvedi ,

Aman Chadha ,

Laurent Bindschaedler

Abstract

A multi-agent framework decomposes the Text2SQL task into several components, using in-context learning and taxonomy-guided error modification to achieve state-of-the-art results.

AI-generated summary

Converting natural language queries into SQL queries is a crucial challenge in both industry and academia, aiming to increase access to databases and large-scale applications. This work examines how in-context learning and chain-of-thought can be utilized to develop a robust solution for text-to-SQL systems. We propose SQL-of-Thought: a multi-agent framework that decomposes the Text2SQL task into schema linking, subproblem identification, query plan generation, SQL generation, and a guided correction loop. Unlike prior systems that rely only on execution-based static correction, we introduce taxonomy-guided dynamic error modification informed by in-context learning. SQL-of-Thought achieves state-of-the-art results on the Spider dataset and its variants, combining guided error taxonomy with reasoning-based query planning.

View arXiv page View PDF Add to collection

Community

amanchadha

Paper author Paper submitter 3 days ago

SQL-of-Thought introduces a multi-agent Text-to-SQL framework that decomposes queries into subproblems, plans before generating SQL, and applies a compact error taxonomy in a correction loop to surpass prior SOTA on Spider benchmarks.

➡️ 𝐊𝐞𝐲 𝐇𝐢𝐠𝐡𝐥𝐢𝐠𝐡𝐭𝐬 𝐨𝐟 𝐨𝐮𝐫 𝐒𝐐𝐋-𝐨𝐟-𝐓𝐡𝐨𝐮𝐠𝐡𝐭 𝐅𝐫𝐚𝐦𝐞𝐰𝐨𝐫𝐤:

🧩 𝑴𝒖𝒍𝒕𝒊-𝑨𝒈𝒆𝒏𝒕 𝑺𝒄𝒉𝒆𝒎𝒂-𝑻𝒐-𝑺𝑸𝑳 𝑷𝒊𝒑𝒆𝒍𝒊𝒏𝒆: Specialized agents handle schema linking, clause-level subproblem decomposition, query planning, SQL synthesis, and correction, mirroring how humans iteratively draft and refine SQL. The separation of planning and generation reduces hallucination and enforces logical structure.

🧠 𝑷𝒍𝒂𝒏-𝑻𝒉𝒆𝒏-𝑮𝒆𝒏𝒆𝒓𝒂𝒕𝒆 𝒘𝒊𝒕𝒉 𝑪𝒉𝒂𝒊𝒏-𝒐𝒇-𝑻𝒉𝒐𝒖𝒈𝒉𝒕: SQL is derived not directly from text but from an intermediate query plan, enabling more faithful reasoning. Chain-of-thought ensures clause consistency and facilitates transparent debugging when corrections are required.

🛠️ 𝑬𝒓𝒓𝒐𝒓 𝑻𝒂𝒙𝒐𝒏𝒐𝒎𝒚-𝑮𝒖𝒊𝒅𝒆𝒅 𝑪𝒐𝒓𝒓𝒆𝒄𝒕𝒊𝒐𝒏 𝑳𝒐𝒐𝒑: Instead of blind regeneration, errors are classified into a concise taxonomy (e.g., missing joins, agg-no-groupby, having-vs-where). Correction agents use this taxonomy to target logical flaws that execution errors alone cannot detect.

📊 𝑺𝒕𝒂𝒕𝒆-𝒐𝒇-𝒕𝒉𝒆-𝑨𝒓𝒕 𝑹𝒆𝒔𝒖𝒍𝒕𝒔 𝒘𝒊𝒕𝒉 𝑹𝒐𝒃𝒖𝒔𝒕 𝑮𝒆𝒏𝒆𝒓𝒂𝒍𝒊𝒛𝒂𝒕𝒊𝒐𝒏: Achieves 91.59% execution accuracy on Spider dev, 90.16% on Spider-Realistic, and 82.01% on Spider-SYN. Ablations confirm that removing the correction loop or intermediate planning significantly degrades accuracy, validating both contributions.

⚡ 𝑷𝒓𝒂𝒄𝒕𝒊𝒄𝒂𝒍 𝑬𝒇𝒇𝒊𝒄𝒊𝒆𝒏𝒄𝒚 + 𝑪𝒐𝒔𝒕 𝑯𝒚𝒃𝒓𝒊𝒅𝒊𝒛𝒂𝒕𝒊𝒐𝒏: Uses a hybrid setup where stronger models (Claude 3 Opus) are invoked selectively in planning/correction, while lighter models handle subproblem decomposition and synthesis, balancing accuracy with token and cost efficiency.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2509.00581 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2509.00581 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2509.00581 in a Space README.md to link it from this page.