File size: 6,963 Bytes

---
library_name: transformers
tags:
- text-to-SQL
- SQL
- code-generation
- NLQ-to-SQL
- text2SQL
inference:
  parameters:
    max_length: 200
widget:
- text: |-
    CREATE TABLE Loans 
    { 
        loan_id number, 
        client_id number, 
        budget real, 
        duration number, 
        interest real, 
        status varchar 
    } 
     CREATE TABLE Clients 
    { 
        client_id number, 
        first_name varchar, 
        last_name varchar, 
        email varchar, 
        city varchar, 
        year_of_birth number 
    } 
     CREATE TABLE Accounts 
    { 
        account_id number, 
        client_id number, 
        balance real, 
        type varchar 
    } 
     CREATE TABLE Deposits 
    { 
        deposit_id number, 
        account_id number, 
        source varchar, 
        amount real 
    } 
     -- Using valid SQLite, answer the following question for the tables provided above. 
     -- What is the duration and budget of the loan id 16342? 
     SELECT 
  example_title: Loan duration
- text: |-
    CREATE TABLE Transactions 
    { 
        transaction_id number, 
        timestamp_id number, 
        primary_contract_id number, 
        client_id number, 
        beneficiary_id number, 
        transaction_amount real, 
        is_fraudulent boolean, 
        product_family_code varchar, 
        amount_currency varchar 
    } 
     CREATE TABLE Beneficiary 
    { 
        beneficiary_id number, 
        bank_branch_id number, 
        country_name varchar, 
        country_code varchar 
    } 
     CREATE TABLE Source 
    { 
        primary_contract_id number, 
        client_id number, 
        counterparty_bank_branch_id number, 
        counterparty_donor_id number 
    } 
     CREATE TABLE Time 
    { 
        timestamp_id number, 
        week_number number, 
        day_number number, 
        hour_number number, 
        day_name varchar, 
        year number, 
        month_number number 
    } 
     -- Using valid SQLite, answer the following question for the tables provided above. 
     -- How many transactions for the client id 15482? 
     SELECT 
  example_title: Client Transactions
datasets:
- salmane11/BanQies
language:
- en
base_model:
- bigcode/starcoderbase-1b
---

# BanQL-1B

## Model Description

BanQL is a family of Code LLMs dedicated solely for the text-to-SQL task in the Financial domain. 

The checkpoint included in this repository is based on [bigcode/starcoderbase](https://huggingface.co/bigcode/starcoderbase) and further finetuned on [BanQies](https://huggingface.co/datasets/salmane11/BanQies), a dataset generated using [SelectCraft](https://github.com/ezzini/SelectCraft) compose of NLQ-SQL pairs in the financial domain.


## Finetuning Procedure

BanQL was fine-tuned using PEFT (Parameter-Efficient Fine-Tuning) techniques, specifically LoRA (Low-Rank Adaptation) adapters. 

## Intended Use and Limitations

The model was designed as a use case to prove the efficiency of SelectCraft in generating large-scale good quality domain-specific text-to-SQL datasets. The model is mainly finetuned on the database schemas displayed above. The prompt format is defined below.

## How to Use

Example 1: Loans_DB

```python
from transformers import AutoTokenizer, AutoModelForCausalLM

device="cuda"
tokenizer = AutoTokenizer.from_pretrained("salmane11/BanQL-1b")
model = AutoModelForCausalLM.from_pretrained("salmane11/BanQL-1b").to(device)

input_text = """
      CREATE TABLE Loans {
            loan_id number, 
            client_id number, 
            budget real, 
            duration number, 
            interest real, 
            status varchar
        }

        CREATE TABLE Clients {
            client_id number, 
            first_name varchar, 
            last_name varchar, 
            email varchar, 
            city varchar, 
            year_of_birth number
        }

        CREATE TABLE Accounts {
            account_id number, 
            client_id number, 
            balance real, 
            type varchar
        }

        CREATE TABLE Deposits{
            deposit_id number, 
            account_id number, 
            source varchar, 
            amount real
        }

        -- Using valid SQLite, answer the following question for the tables provided above.

        -- What is the duration and budget of the loan id 16342?

        SELECT"""

encoding = tokenizer.encode_plus(input_text, return_tensors="pt").to(device)
input_ids, attention_masks = encoding["input_ids"].to(device), encoding["attention_mask"].to(device)
    

outputs = model.generate(
    input_ids=input_ids, attention_mask=attention_masks,
    max_length=512,
    do_sample=True,
    top_k=120,
    top_p=0.95,
    early_stopping=True,
)
line = tokenizer.decode(outputs[0], skip_special_tokens=True,clean_up_tokenization_spaces=True)
query_begining = line.find("SELECT")
print(line[query_begining:])
```

Example 2: Transactions_DB

```python
from transformers import AutoTokenizer, AutoModelForCausalLM

device="cuda"
tokenizer = AutoTokenizer.from_pretrained("salmane11/BanQL-1b")
model = AutoModelForCausalLM.from_pretrained("salmane11/BanQL-1b").to(device)

input_text = """
        CREATE TABLE Transactions {
            transaction_id number, 
            timestamp_id number, 
            primary_contract_id number, 
            client_id number, 
            beneficiary_id number, 
            transaction_amount real, 
            is_fraudulent boolean, 
            product_family_code varchar, 
            amount_currency varchar
        }

        CREATE TABLE Beneficiary {
            beneficiary_id number, 
            bank_branch_id number, 
            country_name varchar, 
            country_code varchar, 
        }

        CREATE TABLE Source {
            primary_contract_id number, 
            client_id number, 
            counterparty_bank_branch_id number, 
            counterparty_donor_id number, 
        }

        CREATE TABLE Time{
            timestamp_id number, 
            week_number number, 
            day_number number, 
            hour_number number, 
            day_name varchar, 
            year number,
            month_number number
        }

        -- Using valid SQLite, answer the following question for the tables provided above.

        -- How many transactions for the client id 15482?

        SELECT"""


encoding = tokenizer.encode_plus(input_text, return_tensors="pt").to(device)
input_ids, attention_masks = encoding["input_ids"].to(device), encoding["attention_mask"].to(device)
    

outputs = model.generate(
    input_ids=input_ids, attention_mask=attention_masks,
    max_length=512,
    do_sample=True,
    top_k=120,
    top_p=0.95,
    early_stopping=True,
)
line = tokenizer.decode(outputs[0], skip_special_tokens=True,clean_up_tokenization_spaces=True)
query_begining = line.find("SELECT")
print(line[query_begining:])
```



## Cite our work

Citation