DuckDB-NSQL-7B-v0.1-mlx

This is an MLX-optimized version of motherduckdb/DuckDB-NSQL-7B-v0.1, converted for efficient inference on Apple Silicon (M1/M2/M3/M4).

Model Description

DuckDB-NSQL-7B is a 7-billion parameter language model fine-tuned for generating DuckDB SQL queries from natural language questions. This MLX conversion provides significant performance improvements on Apple Silicon Macs.

Conversion Details

  • Base Model: motherduckdb/DuckDB-NSQL-7B-v0.1
  • Precision: Float16 (FP16)
  • Framework: MLX
  • Optimized for: Apple Silicon (M1/M2/M3/M4)
  • Model Size: ~13.5 GB
  • Converted by: aikhan1

Installation

pip install mlx-lm

Usage

Basic Inference

from mlx_lm import load, generate

# Load the model
model, tokenizer = load("aikhan1/DuckDB-NSQL-7B-v0.1-mlx")

# Example schema
schema = """
CREATE TABLE hospitals (
    hospital_id BIGINT PRIMARY KEY,
    hospital_name VARCHAR,
    region VARCHAR,
    bed_capacity INTEGER
);

CREATE TABLE patients (
    patient_id BIGINT PRIMARY KEY,
    full_name VARCHAR,
    gender VARCHAR,
    date_of_birth DATE,
    region VARCHAR
);
"""

# Example question
question = "How many patients are there in each region?"

# Build prompt
prompt = f"""You are an assistant that writes valid DuckDB SQL queries.

### Schema:
{schema}

### Question:
{question}

### Response (DuckDB SQL only):"""

# Generate SQL
response = generate(model, tokenizer, prompt=prompt, max_tokens=200, temp=0.0)
print(response)

Using MLX Server

# Start the server
mlx_lm.server --model aikhan1/DuckDB-NSQL-7B-v0.1-mlx --port 8080

# In another terminal, make requests
curl -X POST http://localhost:8080/v1/completions \
  -H "Content-Type: application/json" \
  -d '{
    "prompt": "CREATE TABLE patients(...)\n\nQuestion: Count patients by region\n\nSQL:",
    "max_tokens": 200,
    "temperature": 0
  }'

Performance Comparison

On Apple Silicon M-series chips, MLX provides:

  • Faster inference compared to PyTorch CPU
  • Lower memory usage through efficient memory management
  • Native acceleration using Metal Performance Shaders

Typical generation speed: ~30-50 tokens/second on M1 Pro/Max, ~50-80 tokens/second on M2/M3 series.

Prompt Format

The model expects prompts in this format:

You are an assistant that writes valid DuckDB SQL queries.

### Schema:
CREATE TABLE table_name (
  column1 TYPE,
  column2 TYPE
);

### Question:
[Your natural language question]

### Response (DuckDB SQL only):

Limitations

  • The model is trained specifically for DuckDB SQL syntax
  • Complex queries may require post-processing
  • The model may occasionally generate invalid SQL for complex schemas
  • Best performance on well-defined schemas with clear column names

License

This model inherits the Llama 2 Community License Agreement from the base model.

Citation

@misc{duckdb-nsql-mlx,
  title={DuckDB-NSQL-7B MLX Conversion},
  author={aikhan1},
  year={2025},
  publisher={Hugging Face},
  howpublished={\url{https://huggingface.co/aikhan1/DuckDB-NSQL-7B-v0.1-mlx}}
}

Original model:

@misc{duckdb-nsql,
  title={DuckDB-NSQL-7B: Natural Language to SQL for DuckDB},
  author={MotherDuck},
  year={2024},
  publisher={Hugging Face},
  howpublished={\url{https://huggingface.co/motherduckdb/DuckDB-NSQL-7B-v0.1}}
}

Acknowledgments

Downloads last month
21
Safetensors
Model size
7B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for aikhan1/DuckDB-NSQL-7B-v0.1-mlx

Finetuned
(2)
this model