nielsr HF Staff commited on
Commit
9663c41
·
verified ·
1 Parent(s): 2572033

Improve model card with GitHub link and sample usage

Browse files

This PR enhances the model card by:
- Adding a direct link to the GitHub repository in the "Important Links" section for easier access to the code.
- Clarifying "HuggingFace" and "ModelScope" links as "Collections" in the "Important Links" section.
- Refactoring the "Introduction" section by removing the blockquote formatting and eliminating the redundant sentence about the GitHub release, now that a dedicated link is available.
- Including a practical Python code snippet in a new "Sample Usage" section, demonstrating how to perform Text-to-SQL generation using the `transformers` library, `torch_dtype=torch.bfloat16`, and the model's specific chat template.

Files changed (1) hide show
  1. README.md +69 -20
README.md CHANGED
@@ -1,20 +1,17 @@
1
  ---
2
- pipeline_tag: text-generation
3
  library_name: transformers
4
  license: cc-by-nc-4.0
 
5
  tags:
6
  - text-to-sql
7
  - reinforcement-learning
8
  ---
9
 
10
-
11
  # SLM-SQL: An Exploration of Small Language Models for Text-to-SQL
12
 
13
  ### Important Links
14
 
15
- 📖[Arxiv Paper](https://arxiv.org/abs/2507.22478) |
16
- 🤗[HuggingFace](https://huggingface.co/collections/cycloneboy/slm-sql-688b02f99f958d7a417658dc) |
17
- 🤖[ModelScope](https://modelscope.cn/collections/SLM-SQL-624bb6a60e9643) |
18
 
19
  ## News
20
 
@@ -23,20 +20,7 @@ tags:
23
 
24
  ## Introduction
25
 
26
- > Large language models (LLMs) have demonstrated strong performance in translating natural language questions into SQL
27
- > queries (Text-to-SQL). In contrast, small language models (SLMs) ranging from 0.5B to 1.5B parameters currently
28
- > underperform on Text-to-SQL tasks due to their limited logical reasoning capabilities. However, SLMs offer inherent
29
- > advantages in inference speed and suitability for edge deployment. To explore their potential in Text-to-SQL
30
- > applications, we leverage recent advancements in post-training techniques. Specifically, we used the open-source
31
- > SynSQL-2.5M dataset to construct two derived datasets: SynSQL-Think-916K for SQL generation and
32
- > SynSQL-Merge-Think-310K
33
- > for SQL merge revision. We then applied supervised fine-tuning and reinforcement learning-based post-training to the
34
- > SLM, followed by inference using a corrective self-consistency approach. Experimental results validate the
35
- > effectiveness
36
- > and generalizability of our method, SLM-SQL. On the BIRD development set, the five evaluated models achieved an
37
- > average
38
- > improvement of 31.4 points. Notably, the 0.5B model reached 56.87\% execution accuracy (EX), while the 1.5B model
39
- > achieved 67.08\% EX. We will release our dataset, model, and code to github: https://github.com/CycloneBoy/slm_sql.
40
 
41
  ### Framework
42
 
@@ -62,7 +46,7 @@ Performance Comparison of different Text-to-SQL methods on BIRD dev and test dat
62
  | SLM-SQL-Base-0.5B | Qwen2.5-Coder-0.5B-Instruct | SFT | [🤖 Modelscope](https://modelscope.cn/models/cycloneboy/SLM-SQL-Base-0.5B) | [🤗 HuggingFace](https://huggingface.co/cycloneboy/SLM-SQL-Base-0.5B) |
63
  | SLM-SQL-0.5B | Qwen2.5-Coder-0.5B-Instruct | SFT + GRPO | [🤖 Modelscope](https://modelscope.cn/models/cycloneboy/SLM-SQL-0.5B) | [🤗 HuggingFace](https://huggingface.co/cycloneboy/SLM-SQL-0.5B) |
64
  | CscSQL-Merge-Qwen2.5-Coder-0.5B-Instruct | Qwen2.5-Coder-0.5B-Instruct | SFT + GRPO | [🤖 Modelscope](https://modelscope.cn/models/cycloneboy/CscSQL-Merge-Qwen2.5-Coder-0.5B-Instruct) | [🤗 HuggingFace](https://huggingface.co/cycloneboy/CscSQL-Merge-Qwen2.5-Coder-0.5B-Instruct) |
65
- | SLM-SQL-Base-1.5B | Qwen2.5-Coder-1.5B-Instruct | SFT | [🤖 Modelscope](https://modelscope.cn/models/cycloneboy/SLM-SQL-Base-1.5B) | [🤗 HuggingFace](https://huggingface.co/cycloneboy/SLM-SQL-Base-1.5B) |
66
  | SLM-SQL-1.5B | Qwen2.5-Coder-1.5B-Instruct | SFT + GRPO | [🤖 Modelscope](https://modelscope.cn/models/cycloneboy/SLM-SQL-1.5B) | [🤗 HuggingFace](https://huggingface.co/cycloneboy/SLM-SQL-1.5B) |
67
  | CscSQL-Merge-Qwen2.5-Coder-1.5B-Instruct | Qwen2.5-Coder-1.5B-Instruct | SFT + GRPO | [🤖 Modelscope](https://modelscope.cn/models/cycloneboy/CscSQL-Merge-Qwen2.5-Coder-1.5B-Instruct) | [🤗 HuggingFace](https://huggingface.co/cycloneboy/CscSQL-Merge-Qwen2.5-Coder-1.5B-Instruct) |
68
  | SLM-SQL-Base-0.6B | Qwen3-0.6B | SFT | [🤖 Modelscope](https://modelscope.cn/models/cycloneboy/SLM-SQL-Base-0.6B) | [🤗 HuggingFace](https://huggingface.co/cycloneboy/SLM-SQL-Base-0.6B) |
@@ -71,6 +55,71 @@ Performance Comparison of different Text-to-SQL methods on BIRD dev and test dat
71
  | SLM-SQL-1.3B | deepseek-coder-1.3b-instruct | SFT + GRPO | [🤖 Modelscope](https://modelscope.cn/models/cycloneboy/SLM-SQL-1.3B ) | [🤗 HuggingFace](https://huggingface.co/cycloneboy/SLM-SQL-1.3B ) |
72
  | SLM-SQL-Base-1B | Llama-3.2-1B-Instruct | SFT | [🤖 Modelscope](https://modelscope.cn/models/cycloneboy/SLM-SQL-Base-1B ) | [🤗 HuggingFace](https://huggingface.co/cycloneboy/SLM-SQL-Base-1B ) |
73
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
74
  ## Dataset
75
 
76
  | **Dataset** | Modelscope | HuggingFace |
 
1
  ---
 
2
  library_name: transformers
3
  license: cc-by-nc-4.0
4
+ pipeline_tag: text-generation
5
  tags:
6
  - text-to-sql
7
  - reinforcement-learning
8
  ---
9
 
 
10
  # SLM-SQL: An Exploration of Small Language Models for Text-to-SQL
11
 
12
  ### Important Links
13
 
14
+ 📖[Paper](https://arxiv.org/abs/2507.22478) | 💻[GitHub](https://github.com/CycloneBoy/slm_sql) | 🤗[HuggingFace Collection](https://huggingface.co/collections/cycloneboy/slm-sql-688b02f99f958d7a417658dc) | 🤖[ModelScope Collection](https://modelscope.cn/collections/SLM-SQL-624bb6a60e9643) |
 
 
15
 
16
  ## News
17
 
 
20
 
21
  ## Introduction
22
 
23
+ Large language models (LLMs) have demonstrated strong performance in translating natural language questions into SQL queries (Text-to-SQL). In contrast, small language models (SLMs) ranging from 0.5B to 1.5B parameters currently underperform on Text-to-SQL tasks due to their limited logical reasoning capabilities. However, SLMs offer inherent advantages in inference speed and suitability for edge deployment. To explore their potential in Text-to-SQL applications, we leverage recent advancements in post-training techniques. Specifically, we used the open-source SynSQL-2.5M dataset to construct two derived datasets: SynSQL-Think-916K for SQL generation and SynSQL-Merge-Think-310K for SQL merge revision. We then applied supervised fine-tuning and reinforcement learning-based post-training to the SLM, followed by inference using a corrective self-consistency approach. Experimental results validate the effectiveness and generalizability of our method, SLM-SQL. On the BIRD development set, the five evaluated models achieved an average improvement of 31.4 points. Notably, the 0.5B model reached 56.87\% execution accuracy (EX), while the 1.5B model achieved 67.08\% EX.
 
 
 
 
 
 
 
 
 
 
 
 
 
24
 
25
  ### Framework
26
 
 
46
  | SLM-SQL-Base-0.5B | Qwen2.5-Coder-0.5B-Instruct | SFT | [🤖 Modelscope](https://modelscope.cn/models/cycloneboy/SLM-SQL-Base-0.5B) | [🤗 HuggingFace](https://huggingface.co/cycloneboy/SLM-SQL-Base-0.5B) |
47
  | SLM-SQL-0.5B | Qwen2.5-Coder-0.5B-Instruct | SFT + GRPO | [🤖 Modelscope](https://modelscope.cn/models/cycloneboy/SLM-SQL-0.5B) | [🤗 HuggingFace](https://huggingface.co/cycloneboy/SLM-SQL-0.5B) |
48
  | CscSQL-Merge-Qwen2.5-Coder-0.5B-Instruct | Qwen2.5-Coder-0.5B-Instruct | SFT + GRPO | [🤖 Modelscope](https://modelscope.cn/models/cycloneboy/CscSQL-Merge-Qwen2.5-Coder-0.5B-Instruct) | [🤗 HuggingFace](https://huggingface.co/cycloneboy/CscSQL-Merge-Qwen2.5-Coder-0.5B-Instruct) |
49
+ | SLM-SQL-Base-1.5B | Qwen2.5-Coder-1.5B-Instruct | SFT | [🤖 Modelscope](https://modelscope.cn/models/cycloneboy/SLM-SQL-Base-1.5B) | [🤗 HuggingFace](https://huggingface.co/cycloneboy/SLM-SQL-Base-1.5B) |\
50
  | SLM-SQL-1.5B | Qwen2.5-Coder-1.5B-Instruct | SFT + GRPO | [🤖 Modelscope](https://modelscope.cn/models/cycloneboy/SLM-SQL-1.5B) | [🤗 HuggingFace](https://huggingface.co/cycloneboy/SLM-SQL-1.5B) |
51
  | CscSQL-Merge-Qwen2.5-Coder-1.5B-Instruct | Qwen2.5-Coder-1.5B-Instruct | SFT + GRPO | [🤖 Modelscope](https://modelscope.cn/models/cycloneboy/CscSQL-Merge-Qwen2.5-Coder-1.5B-Instruct) | [🤗 HuggingFace](https://huggingface.co/cycloneboy/CscSQL-Merge-Qwen2.5-Coder-1.5B-Instruct) |
52
  | SLM-SQL-Base-0.6B | Qwen3-0.6B | SFT | [🤖 Modelscope](https://modelscope.cn/models/cycloneboy/SLM-SQL-Base-0.6B) | [🤗 HuggingFace](https://huggingface.co/cycloneboy/SLM-SQL-Base-0.6B) |
 
55
  | SLM-SQL-1.3B | deepseek-coder-1.3b-instruct | SFT + GRPO | [🤖 Modelscope](https://modelscope.cn/models/cycloneboy/SLM-SQL-1.3B ) | [🤗 HuggingFace](https://huggingface.co/cycloneboy/SLM-SQL-1.3B ) |
56
  | SLM-SQL-Base-1B | Llama-3.2-1B-Instruct | SFT | [🤖 Modelscope](https://modelscope.cn/models/cycloneboy/SLM-SQL-Base-1B ) | [🤗 HuggingFace](https://huggingface.co/cycloneboy/SLM-SQL-Base-1B ) |
57
 
58
+ ## Sample Usage
59
+
60
+ This model can be easily loaded and used with the `transformers` library. The following example demonstrates how to perform Text-to-SQL generation.
61
+
62
+ ```python
63
+ import torch
64
+ from transformers import AutoTokenizer, AutoModelForCausalLM
65
+
66
+ model_id = "cycloneboy/SLM-SQL-0.5B" # You can choose any of the models from the table above
67
+ tokenizer = AutoTokenizer.from_pretrained(model_id)
68
+ model = AutoModelForCausalLM.from_pretrained(
69
+ model_id,
70
+ torch_dtype=torch.bfloat16, # Use torch.bfloat16 as specified in the model's config
71
+ device_map="auto" # Automatically maps the model to available devices (e.g., GPU)
72
+ )
73
+
74
+ # Example SQL schema (simplified for demonstration)
75
+ schema = """
76
+ CREATE TABLE employees (
77
+ employee_id INT,
78
+ first_name VARCHAR,
79
+ last_name VARCHAR,
80
+ department VARCHAR,
81
+ salary INT
82
+ );
83
+ """
84
+
85
+ # Natural language query
86
+ query = "Show me the first name and last name of employees in the 'Sales' department earning more than 50000."
87
+
88
+ # Construct the prompt using the model's chat template format
89
+ # The chat template automatically adds system/user tags if available.
90
+ messages = [
91
+ {"role": "user", "content": f"Translate the following natural language query into SQL:\
92
+ Schema: {schema}\
93
+ Query: {query}"}
94
+ ]
95
+ prompt_text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
96
+
97
+ inputs = tokenizer(prompt_text, return_tensors="pt").to(model.device)
98
+
99
+ # Generate the SQL query
100
+ outputs = model.generate(**inputs, max_new_tokens=256, pad_token_id=tokenizer.eos_token_id)
101
+ generated_text = tokenizer.decode(outputs[0], skip_special_tokens=True)
102
+
103
+ # Extracting only the generated SQL part (assuming the model responds only with SQL after "### Response:")
104
+ # The model's chat template is `### Instruction:
105
+ ...
106
+ ### Response:
107
+ ...<|EOT|>`
108
+ # We need to trim the input prompt and the <|EOT|> token.
109
+ if "### Response:" in generated_text:
110
+ sql_start_index = generated_text.find("### Response:") + len("### Response:")
111
+ generated_sql = generated_text[sql_start_index:].strip()
112
+ if "<|EOT|>" in generated_sql:
113
+ generated_sql = generated_sql.split("<|EOT|>")[0].strip()
114
+ else:
115
+ generated_sql = generated_text # Fallback if response format is unexpected
116
+
117
+ print(generated_sql)
118
+
119
+ # Expected output (may vary slightly based on model's exact generation):
120
+ # SELECT first_name, last_name FROM employees WHERE department = 'Sales' AND salary > 50000;
121
+ ```
122
+
123
  ## Dataset
124
 
125
  | **Dataset** | Modelscope | HuggingFace |