--- license: apache-2.0 language: - en base_model: - seeklhy/OmniSQL-7B --- # GradeSQL-7B — Outcome Reward Model for Text-to-SQL ## Model Description **GradeSQL-7B** is an Outcome Reward Model (ORM) designed to evaluate the semantic correctness of SQL queries generated from natural language questions in Text-to-SQL tasks. Rather than relying on syntactic heuristics or majority votes, GradeSQL-7B assigns a confidence score indicating whether a candidate SQL query faithfully answers the user's question based on the database schema. Built on top of the **OmniSQL-7B** base model and finetuned on the **SPIDER** dataset, GradeSQL-7B provides a robust semantic scoring mechanism to improve query selection and alignment with user intent. ## Intended Use - **Reranking Candidate SQL Queries:** Use GradeSQL-7B to assign semantic correctness scores and select the best SQL query among multiple candidates generated by LLMs. - **Enhancing Text-to-SQL Pipelines:** Integrate as a reward or reranking model to improve execution accuracy and semantic fidelity in Text-to-SQL systems. - **Evaluation and Research:** Analyze the semantic alignment of SQL queries with natural language questions to identify and mitigate errors. ## Finetuning Configuration The following **LoRA configuration** was used to train this model: - **R**: `16` (rank of the low-rank matrices) - **Lora Alpha**: `64` (scaling factor for the low-rank update) - **Target Modules**: `{q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj}` - **Lora Dropout**: `0.05` - **Bias**: `"none"` (bias terms are frozen) - **FP16**: `True` (half-precision training) - **Learning Rate**: `7e-5` - **Train Batch Size**: `5` - **Num. Train Epochs**: `50` ## Usage Example ```python import torch from transformers import AutoModelForCausalLM, AutoTokenizer from peft import PeftModel prompt = """Question: What is the total horses record for each farm, sorted ascending? CREATE TABLE competition_record ( Competition_ID number, -- example: [1, 2] Farm_ID number, -- example: [2, 3] Rank number, -- example: [1, 2] PRIMARY KEY (Competition_ID), CONSTRAINT fk_competition_record_competition_id FOREIGN KEY (Competition_ID) REFERENCES farm_competition (Competition_ID), CONSTRAINT fk_competition_record_farm_id FOREIGN KEY (Farm_ID) REFERENCES farm (Farm_ID) ); CREATE TABLE city ( City_ID number, -- example: [1, 2] Status text, -- example: ['Town', 'Village'] PRIMARY KEY (City_ID) ); CREATE TABLE farm_competition ( Competition_ID number, -- example: [1, 2] Host_city_ID number, -- example: [1, 2] PRIMARY KEY (Competition_ID), CONSTRAINT fk_farm_competition_host_city_id FOREIGN KEY (Host_city_ID) REFERENCES city (City_ID) ); CREATE TABLE farm ( Farm_ID number, -- example: [1, 2] Total_Horses number, -- example: [5056.5, 5486.9] Total_Cattle number, -- example: [8374.5, 8604.8] PRIMARY KEY (Farm_ID) ); What is the total horses record for each farm, sorted ascending? SQL: SELECT SUM(Total_Horses) AS Total_Horses, Farm_ID FROM farm GROUP BY Farm_ID ORDER BY SUM(Total_Horses) ASC; Is the SQL correct?""" base_model = AutoModelForCausalLM.from_pretrained("seeklhy/OmniSQL-7B", torch_dtype="auto", device_map="auto") peft_model = PeftModel.from_pretrained(base_model, "sisinflab-ai/GradeSQL-7B-ORM-Spider") orm_model = peft_model.merge_and_unload() orm_tokenizer = AutoTokenizer.from_pretrained("seeklhy/OmniSQL-7B", use_fast=True) del base_model del peft_model inputs = orm_tokenizer(prompt, return_tensors="pt").to(orm_model.device) with torch.no_grad(): outputs = orm_model.generate(**inputs, max_new_tokens=1, return_dict_in_generate=True, output_scores=True, use_cache=False) generated_ids = outputs.sequences[0, len(inputs.input_ids[0]):] yes_token_id = orm_tokenizer.convert_tokens_to_ids("ĠYes") no_token_id = orm_tokenizer.convert_tokens_to_ids("ĠNo") yes_no_pos = None for i, token_id in enumerate(generated_ids): if token_id in [yes_token_id, no_token_id]: yes_no_pos = i break if yes_no_pos is None: print("[Warning]: No 'Yes' or 'No' token found in the generated output.") print("[Score]: 0.5") logits = outputs.scores[yes_no_pos] probs = torch.softmax(logits, dim=-1) yes_prob = probs[0, yes_token_id].item() generated_answer = "Yes" if generated_ids[yes_no_pos] == yes_token_id else "No" if generated_answer == "Yes": print("[Score]: ", yes_prob) elif generated_answer == "No": print("[Score]: ", 0) ``` If you use **GradeSQL** in your research, please cite the following paper: ```bibtex @misc{gradesqloutcomerewardmodels2025, title={GradeSQL: Outcome Reward Models for Ranking SQL Queries from Large Language Models}, author={Mattia Tritto and Giuseppe Farano and Dario Di Palma and Gaetano Rossiello and Fedelucio Narducci and Dharmashankar Subramanian and Tommaso Di Noia}, year={2025}, eprint={2509.01308}, archivePrefix={arXiv}, primaryClass={cs.AI}, url={https://arxiv.org/abs/2509.01308}, } ```