|
--- |
|
language: en |
|
pipeline_tag: text2text-generation |
|
tags: |
|
- text-to-sql |
|
- t5 |
|
- natural-language-processing |
|
- sql |
|
license: apache-2.0 |
|
datasets: |
|
- gretelai/synthetic_text_to_sql |
|
base_model: |
|
- Salesforce/codet5-base |
|
--- |
|
|
|
# Text-to-SQL T5 Model (`16pramodh/t2s_model`) |
|
|
|
## Model Description |
|
This is a **T5-based text-to-SQL model** trained to convert **natural language questions** into **SQL queries**. |
|
It works by taking in: |
|
|
|
natural language query [SEP] table schema |
|
|
|
and producing a SQL statement based on the provided database schema. |
|
|
|
The model is based on `T5ForConditionalGeneration` and supports **text2text-generation** via the Hugging Face Inference API. |
|
|
|
--- |
|
|
|
## Intended Use |
|
- **Input:** English natural language question **plus** the database schema. |
|
- **Output:** SQL query that can be executed on the described database. |
|
|
|
--- |
|
|
|
## Example |
|
|
|
**Input:** |
|
Get the names and emails of all customers who signed up after January 1, 2024 [SEP] CREATE TABLE customers (customer_id INT PRIMARY KEY, name VARCHAR(50), email VARCHAR(100), signup_date DATE); |
|
|
|
**Output:** |
|
SELECT name, email FROM customers WHERE signup_date > '2024-01-01'; |
|
|
|
--- |
|
|
|
## How to Use |
|
|
|
### Hugging Face Inference API |
|
```bash |
|
curl -X POST \ |
|
-H "Authorization: Bearer YOUR_HF_TOKEN" \ |
|
-H "Content-Type: application/json" \ |
|
-d '{"inputs": "Get the names and emails of all customers who signed up after January 1, 2024 [SEP] CREATE TABLE customers (customer_id INT PRIMARY KEY, name VARCHAR(50), email VARCHAR(100), signup_date DATE);"}' \ |
|
https://api-inference.huggingface.co/models/16pramodh/t2s_model |
|
``` |
|
|
|
### Python (Transformers) |
|
``` |
|
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM |
|
|
|
model_name = "16pramodh/t2s_model" |
|
tokenizer = AutoTokenizer.from_pretrained(model_name) |
|
model = AutoModelForSeq2SeqLM.from_pretrained(model_name) |
|
|
|
input_text = "Get the names and emails of all customers who signed up after January 1, 2024 [SEP] CREATE TABLE customers (customer_id INT PRIMARY KEY, name VARCHAR(50), email VARCHAR(100), signup_date DATE);" |
|
inputs = tokenizer(input_text, return_tensors="pt") |
|
outputs = model.generate(**inputs) |
|
print(tokenizer.decode(outputs[0], skip_special_tokens=True)) |
|
``` |
|
|
|
--- |