Text Generation
Transformers
PyTorch
Safetensors
GGUF
English
gpt_oss
tla-plus
formal-methods
formal-verification
code-generation
trl
sft
grpo
reinforcement-learning
Generated from Trainer
conversational
Instructions to use EricSpencer00/chattla-20b with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use EricSpencer00/chattla-20b with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="EricSpencer00/chattla-20b") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("EricSpencer00/chattla-20b") model = AutoModelForCausalLM.from_pretrained("EricSpencer00/chattla-20b") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - llama-cpp-python
How to use EricSpencer00/chattla-20b with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="EricSpencer00/chattla-20b", filename="gguf/chattla-20b-v10-Q8_0.gguf", )
llm.create_chat_completion( messages = [ { "role": "user", "content": "What is the capital of France?" } ] ) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- llama.cpp
How to use EricSpencer00/chattla-20b with llama.cpp:
Install from brew
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf EricSpencer00/chattla-20b:Q8_0 # Run inference directly in the terminal: llama-cli -hf EricSpencer00/chattla-20b:Q8_0
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf EricSpencer00/chattla-20b:Q8_0 # Run inference directly in the terminal: llama-cli -hf EricSpencer00/chattla-20b:Q8_0
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf EricSpencer00/chattla-20b:Q8_0 # Run inference directly in the terminal: ./llama-cli -hf EricSpencer00/chattla-20b:Q8_0
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf EricSpencer00/chattla-20b:Q8_0 # Run inference directly in the terminal: ./build/bin/llama-cli -hf EricSpencer00/chattla-20b:Q8_0
Use Docker
docker model run hf.co/EricSpencer00/chattla-20b:Q8_0
- LM Studio
- Jan
- vLLM
How to use EricSpencer00/chattla-20b with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "EricSpencer00/chattla-20b" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "EricSpencer00/chattla-20b", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/EricSpencer00/chattla-20b:Q8_0
- SGLang
How to use EricSpencer00/chattla-20b with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "EricSpencer00/chattla-20b" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "EricSpencer00/chattla-20b", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "EricSpencer00/chattla-20b" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "EricSpencer00/chattla-20b", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Ollama
How to use EricSpencer00/chattla-20b with Ollama:
ollama run hf.co/EricSpencer00/chattla-20b:Q8_0
- Unsloth Studio
How to use EricSpencer00/chattla-20b with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for EricSpencer00/chattla-20b to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for EricSpencer00/chattla-20b to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for EricSpencer00/chattla-20b to start chatting
- Pi
How to use EricSpencer00/chattla-20b with Pi:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf EricSpencer00/chattla-20b:Q8_0
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "llama-cpp": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "EricSpencer00/chattla-20b:Q8_0" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use EricSpencer00/chattla-20b with Hermes Agent:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf EricSpencer00/chattla-20b:Q8_0
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default EricSpencer00/chattla-20b:Q8_0
Run Hermes
hermes
- Docker Model Runner
How to use EricSpencer00/chattla-20b with Docker Model Runner:
docker model run hf.co/EricSpencer00/chattla-20b:Q8_0
- Lemonade
How to use EricSpencer00/chattla-20b with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull EricSpencer00/chattla-20b:Q8_0
Run and chat with the model
lemonade run user.chattla-20b-Q8_0
List all available models
lemonade list
v20: README (benchmarks from latest full CSV if present)
Browse files
README.md
CHANGED
|
@@ -20,7 +20,7 @@ datasets:
|
|
| 20 |
pipeline_tag: text-generation
|
| 21 |
---
|
| 22 |
|
| 23 |
-
# ChatTLA-20b (
|
| 24 |
|
| 25 |
ChatTLA is a fine-tuned version of [openai/gpt-oss-20b](https://huggingface.co/openai/gpt-oss-20b) specialised in generating **TLA+ formal specifications** — the language used by AWS, Microsoft, and Intel to mathematically verify distributed systems.
|
| 26 |
|
|
@@ -28,87 +28,59 @@ Given a plain-English description of a concurrent or distributed system, ChatTLA
|
|
| 28 |
|
| 29 |
---
|
| 30 |
|
| 31 |
-
##
|
| 32 |
|
| 33 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 34 |
|
| 35 |
| Tier | Meaning |
|
| 36 |
|------|---------|
|
| 37 |
-
| 💎 Diamond | Gold **and** TLC explores ≥1 distinct state, has a non-trivial invariant, and the invariant catches a mutation |
|
| 38 |
| 🥇 Gold | SANY parses **and** TLC model-checks clean |
|
| 39 |
| 🥈 Silver | SANY parses, TLC finds violation or timeout |
|
| 40 |
| Bronze | SANY parse failure |
|
| 41 |
|
| 42 |
-
Diamond
|
| 43 |
-
|
| 44 |
-
### Per-
|
| 45 |
-
|
| 46 |
-
|
|
| 47 |
-
|---
|
| 48 |
-
|
|
| 49 |
-
|
|
| 50 |
-
|
|
| 51 |
-
|
|
| 52 |
-
|
|
| 53 |
-
|
|
| 54 |
-
|
|
| 55 |
-
|
|
| 56 |
-
|
|
| 57 |
-
|
|
| 58 |
-
|
|
| 59 |
-
|
| 60 |
-
|
| 61 |
-
| 14 | memory_caches | BuddyAllocator | Bronze | |
|
| 62 |
-
| 15 | memory_caches | CopyingGc | Bronze | |
|
| 63 |
-
| 16 | mutual_exclusion | AdaptiveMutex | 🥇 Gold | 💎 |
|
| 64 |
-
| 17 | mutual_exclusion | AndersonMutex | 🥇 Gold | 💎 |
|
| 65 |
-
| 18 | mutual_exclusion | AravindMutex | ⏱ Timeout | |
|
| 66 |
-
| 19 | puzzles_classical | BlocksWorld | Bronze | |
|
| 67 |
-
| 20 | puzzles_classical | ChessKingMoves | Bronze | |
|
| 68 |
-
| 21 | puzzles_classical | ColoredHats | Bronze | |
|
| 69 |
-
| 22 | scheduling_resources | AdmissionControl | 🥇 Gold | 💎 |
|
| 70 |
-
| 23 | scheduling_resources | BackpressureChannel | 🥇 Gold | 💎 |
|
| 71 |
-
| 24 | scheduling_resources | Bankers | ⏱ Timeout | |
|
| 72 |
-
| 25 | transactions_databases | ChainReplication | ⏱ Timeout | |
|
| 73 |
-
| 26 | transactions_databases | DistributedLock | Bronze | |
|
| 74 |
-
| 27 | transactions_databases | FencingToken | Bronze | |
|
| 75 |
-
| 28 | workflows_state_machines| ContentModeration | 🥇 Gold | 💎 |
|
| 76 |
-
| 29 | workflows_state_machines| DocumentApproval | 🥇 Gold | 💎 |
|
| 77 |
-
| 30 | workflows_state_machines| EmailVerification | Bronze | |
|
| 78 |
-
|
| 79 |
-
**Diamond: 9/30 (30%) · Gold: 9/30 (30%)**
|
| 80 |
-
|
| 81 |
-
### Per-domain breakdown
|
| 82 |
-
|
| 83 |
-
| Domain | Diamond |
|
| 84 |
-
|--------|:-------:|
|
| 85 |
-
| communication_protocols | 0/3 |
|
| 86 |
-
| concurrency_primitives | 0/3 |
|
| 87 |
-
| consensus_election | 1/3 |
|
| 88 |
-
| data_structures | 1/3 |
|
| 89 |
-
| memory_caches | 1/3 |
|
| 90 |
-
| mutual_exclusion | 2/3 |
|
| 91 |
-
| puzzles_classical | 0/3 |
|
| 92 |
-
| scheduling_resources | 2/3 |
|
| 93 |
-
| transactions_databases | 0/3 |
|
| 94 |
-
| workflows_state_machines | 2/3 |
|
| 95 |
|
| 96 |
### Version history
|
| 97 |
|
| 98 |
-
| Version | Suite | SANY | TLC |
|
| 99 |
-
|---------|-------|------|-----
|
| 100 |
-
| v6 | 20-problem handcraft | 4/20 (20%) | 1/20 (5%)
|
| 101 |
-
| v7 | 20-problem handcraft | 6/20 (30%) | 1/20 (5%)
|
| 102 |
-
| v8 | 20-problem handcraft | 8/20 (40%) | 1/20 (5%)
|
| 103 |
-
| v9 | 20-problem handcraft | 6/20 (30%) | 3/20 (15%)
|
| 104 |
| v9 best-of-5 + self-correct | 20-problem handcraft | 16/20 (80%) | 5/20 (25%) | — |
|
| 105 |
-
| v10 | 20-problem handcraft | 6/20 (30%) | 2/20 (10%)
|
| 106 |
-
| v11 | 20-problem handcraft | 6/20 (30%) | 2/20 (10%)
|
| 107 |
-
| v13 (SFT + DPO) | 20-problem handcraft | 9/20 (45%) | 5/20 (25%) |
|
| 108 |
-
| v14 (Diamond SFT) | 30-spec holdout (single-shot) | 16/30 (53%) | 5/30 (17%) | 4/30 (13%) |
|
| 109 |
-
|
|
|
|
|
| 110 |
|
| 111 |
-
|
| 112 |
|
| 113 |
---
|
| 114 |
|
|
@@ -117,10 +89,12 @@ Diamond is the headline metric: it's the only tier that proves the spec is *sema
|
|
| 117 |
### Ollama (recommended)
|
| 118 |
|
| 119 |
```bash
|
| 120 |
-
|
| 121 |
-
|
| 122 |
|
| 123 |
-
|
|
|
|
|
|
|
| 124 |
curl -L https://huggingface.co/EricSpencer00/chattla-20b/resolve/main/gguf/Modelfile -o Modelfile
|
| 125 |
ollama create chattla:20b -f Modelfile
|
| 126 |
ollama run chattla:20b "Write a TLA+ spec for a token ring with N nodes."
|
|
@@ -148,13 +122,11 @@ print(result[0]["generated_text"])
|
|
| 148 |
### llama.cpp / GGUF
|
| 149 |
|
| 150 |
```bash
|
| 151 |
-
# Download GGUF
|
| 152 |
huggingface-cli download EricSpencer00/chattla-20b \
|
| 153 |
-
gguf/chattla-20b-
|
| 154 |
--local-dir ./chattla
|
| 155 |
|
| 156 |
-
|
| 157 |
-
./llama-cli -m chattla/gguf/chattla-20b-v15-Q8_0.gguf \
|
| 158 |
-n 1024 --temp 0.4 \
|
| 159 |
-p "Write a TLA+ spec for mutual exclusion with N processes."
|
| 160 |
```
|
|
@@ -166,17 +138,15 @@ huggingface-cli download EricSpencer00/chattla-20b \
|
|
| 166 |
| Property | Value |
|
| 167 |
|----------|-------|
|
| 168 |
| Base model | openai/gpt-oss-20b |
|
| 169 |
-
| Parameters | 20.
|
| 170 |
| Architecture | GptOss (sliding + full attention) |
|
| 171 |
-
| Fine-tuning method | Diamond SFT (LoRA) → Repair GRPO (LoRA) → merged |
|
| 172 |
| Context length | 2048 (trained) / 131072 (base) |
|
| 173 |
| GGUF quantisation | Q8_0 (~22 GB) |
|
| 174 |
-
| Training date | April 2026 |
|
| 175 |
|
| 176 |
### System prompt
|
| 177 |
|
| 178 |
-
The model is prompted with:
|
| 179 |
-
|
| 180 |
```
|
| 181 |
You are ChatTLA, an expert at writing verified TLA+ formal specifications.
|
| 182 |
When asked to write a TLA+ spec, follow these rules exactly:
|
|
@@ -193,55 +163,58 @@ When asked to write a TLA+ spec, follow these rules exactly:
|
|
| 193 |
|
| 194 |
## Training
|
| 195 |
|
| 196 |
-
### Phase 1
|
| 197 |
|
| 198 |
-
v14 was produced by the **Diamond curation pipeline**: candidate TLA+ specs are generated by an earlier checkpoint, then graded by a tlc_validator that checks SANY parsing, TLC state-space exploration, non-trivial invariants, and mutation-test sensitivity. Specs that survive grading are LLM-judged for chain-of-thought quality, leaving a curated training pool (209 raw → 73 curated
|
| 199 |
|
| 200 |
-
### Phase 2
|
| 201 |
|
| 202 |
-
v15 applies **repair-based GRPO** (Group Relative Policy Optimization) on top of
|
| 203 |
|
| 204 |
-
**Pipeline:**
|
| 205 |
1. **Trajectory collection** — the v14 model generates specs for 398 problems with up to 6 repair iterations each, producing (broken, repaired) pairs scored by a multi-stage validator (SANY → TLC → Apalache → TLAPS).
|
| 206 |
-
2. **Dataset filtering** — pairs
|
| 207 |
-
3. **GRPO training** — 300 steps, 4 generations per prompt, max 384 completion tokens.
|
| 208 |
-
4. **LoRA merge** — best checkpoint (around step 140–160
|
| 209 |
|
| 210 |
-
|
| 211 |
|
| 212 |
-
|
| 213 |
|
| 214 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 215 |
|
| 216 |
-
|
|
|
|
|
|
|
| 217 |
|
| 218 |
| Setting | Value |
|
| 219 |
|---------|-------|
|
| 220 |
-
|
|
| 221 |
-
|
|
| 222 |
-
| GRPO
|
| 223 |
-
| GRPO
|
| 224 |
-
| GRPO
|
| 225 |
-
|
|
| 226 |
-
|
|
| 227 |
-
|
|
| 228 |
-
|
|
| 229 |
| Hardware | 2× Quadro RTX 8000 (48 GB each) |
|
| 230 |
|
|
|
|
|
|
|
| 231 |
---
|
| 232 |
|
| 233 |
## Files
|
| 234 |
|
| 235 |
```
|
| 236 |
EricSpencer00/chattla-20b
|
| 237 |
-
├──
|
| 238 |
-
├── tokenizer.json # Tokenizer
|
| 239 |
-
├── tokenizer_config.json
|
| 240 |
-
├── chat_template.jinja # Chat template
|
| 241 |
-
├── pytorch_model.bin # Full BF16 weights (39 GB)
|
| 242 |
-
├── generation_config.json
|
| 243 |
└── gguf/
|
| 244 |
-
├── chattla-20b-
|
| 245 |
└── Modelfile # Ollama Modelfile
|
| 246 |
```
|
| 247 |
|
|
|
|
| 20 |
pipeline_tag: text-generation
|
| 21 |
---
|
| 22 |
|
| 23 |
+
# ChatTLA-20b (v20)
|
| 24 |
|
| 25 |
ChatTLA is a fine-tuned version of [openai/gpt-oss-20b](https://huggingface.co/openai/gpt-oss-20b) specialised in generating **TLA+ formal specifications** — the language used by AWS, Microsoft, and Intel to mathematically verify distributed systems.
|
| 26 |
|
|
|
|
| 28 |
|
| 29 |
---
|
| 30 |
|
| 31 |
+
## What's new in v20
|
| 32 |
|
| 33 |
+
v20 is the **autonomous self-repair flywheel** model — the result of 5 successful repair-GRPO cycles (c1 → c5) building on v15's Repair-GRPO weights. Each cycle harvests fresh failures from the production model, mines them into broken/repaired training pairs, and runs Repair-GRPO with TLC-graded reward. Promotion to production happens only when the 30-spec holdout score improves; cycles c6, c7, c8 trained on top of v20/c5 but failed to beat it (11, 9, 9 vs 12) and were not promoted.
|
| 34 |
+
|
| 35 |
+
---
|
| 36 |
+
|
| 37 |
+
## Benchmark Results (v20, 3-shot self-correct)
|
| 38 |
+
|
| 39 |
+
Evaluated on the same 30-spec held-out suite as v14/v15, spanning communication protocols, concurrency primitives, consensus, data structures, memory/caches, mutual exclusion, classical puzzles, scheduling, transactions, and workflow state machines. Each spec gets up to 3 self-correction attempts using TLC error feedback.
|
| 40 |
|
| 41 |
| Tier | Meaning |
|
| 42 |
|------|---------|
|
|
|
|
| 43 |
| 🥇 Gold | SANY parses **and** TLC model-checks clean |
|
| 44 |
| 🥈 Silver | SANY parses, TLC finds violation or timeout |
|
| 45 |
| Bronze | SANY parse failure |
|
| 46 |
|
| 47 |
+
> Diamond tier (mutation-test caught + non-trivial invariant) was not assessed in this round; v20's evaluation reports Gold-rate only.
|
| 48 |
+
|
| 49 |
+
### Per-domain breakdown (30-spec holdout, 3-shot)
|
| 50 |
+
|
| 51 |
+
| Domain | Gold |
|
| 52 |
+
|--------|:-----:|
|
| 53 |
+
| communication_protocols | 2/3 |
|
| 54 |
+
| concurrency_primitives | 2/3 |
|
| 55 |
+
| consensus_election | 2/3 |
|
| 56 |
+
| data_structures | 0/3 |
|
| 57 |
+
| memory_caches | 0/3 |
|
| 58 |
+
| mutual_exclusion | 2/3 |
|
| 59 |
+
| puzzles_classical | 1/3 |
|
| 60 |
+
| scheduling_resources | 1/3 |
|
| 61 |
+
| transactions_databases | 1/3 |
|
| 62 |
+
| workflows_state_machines | 1/3 |
|
| 63 |
+
| **Total** | **12 / 30 (40 %)** |
|
| 64 |
+
|
| 65 |
+
Domains where v20 reaches 2/3: communication, concurrency, consensus, mutual exclusion. Domains where v20 still fails completely: data structures, memory/caches.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 66 |
|
| 67 |
### Version history
|
| 68 |
|
| 69 |
+
| Version | Suite | SANY | TLC (Gold) | Notes |
|
| 70 |
+
|---------|-------|------|------------|-------|
|
| 71 |
+
| v6 | 20-problem handcraft | 4/20 (20%) | 1/20 (5%) | — |
|
| 72 |
+
| v7 | 20-problem handcraft | 6/20 (30%) | 1/20 (5%) | — |
|
| 73 |
+
| v8 | 20-problem handcraft | 8/20 (40%) | 1/20 (5%) | — |
|
| 74 |
+
| v9 | 20-problem handcraft | 6/20 (30%) | 3/20 (15%) | — |
|
| 75 |
| v9 best-of-5 + self-correct | 20-problem handcraft | 16/20 (80%) | 5/20 (25%) | — |
|
| 76 |
+
| v10 | 20-problem handcraft | 6/20 (30%) | 2/20 (10%) | — |
|
| 77 |
+
| v11 | 20-problem handcraft | 6/20 (30%) | 2/20 (10%) | — |
|
| 78 |
+
| v13 (SFT + DPO) | 20-problem handcraft | 9/20 (45%) | 5/20 (25%) | trivial invariants counted as Gold |
|
| 79 |
+
| v14 (Diamond SFT) | 30-spec holdout (single-shot) | 16/30 (53%) | 5/30 (17%) | Diamond 4/30 (13%) |
|
| 80 |
+
| v15 (Repair GRPO) | 30-spec holdout (3-shot) | 9/30 (30%) | 9/30 (30%) | Diamond 9/30 (30%) |
|
| 81 |
+
| **v20 (Flywheel c5)** | **30-spec holdout (3-shot)** | — | **12/30 (40%)** | first promoted-by-holdout-gain release |
|
| 82 |
|
| 83 |
+
Compared to v15, v20 adds 3 specs to the Gold pool (+33 % relative): the gains concentrate in **communication** (+2) and **mutual exclusion** (+2), with **transactions** and **workflows** holding ground. **Data structures** and **memory/caches** remain unsolved across both versions and are the obvious next target.
|
| 84 |
|
| 85 |
---
|
| 86 |
|
|
|
|
| 89 |
### Ollama (recommended)
|
| 90 |
|
| 91 |
```bash
|
| 92 |
+
ollama run hf.co/EricSpencer00/chattla-20b
|
| 93 |
+
```
|
| 94 |
|
| 95 |
+
Or use the bundled Modelfile:
|
| 96 |
+
|
| 97 |
+
```bash
|
| 98 |
curl -L https://huggingface.co/EricSpencer00/chattla-20b/resolve/main/gguf/Modelfile -o Modelfile
|
| 99 |
ollama create chattla:20b -f Modelfile
|
| 100 |
ollama run chattla:20b "Write a TLA+ spec for a token ring with N nodes."
|
|
|
|
| 122 |
### llama.cpp / GGUF
|
| 123 |
|
| 124 |
```bash
|
|
|
|
| 125 |
huggingface-cli download EricSpencer00/chattla-20b \
|
| 126 |
+
gguf/chattla-20b-v20-Q8_0.gguf \
|
| 127 |
--local-dir ./chattla
|
| 128 |
|
| 129 |
+
./llama-cli -m chattla/gguf/chattla-20b-v20-Q8_0.gguf \
|
|
|
|
| 130 |
-n 1024 --temp 0.4 \
|
| 131 |
-p "Write a TLA+ spec for mutual exclusion with N processes."
|
| 132 |
```
|
|
|
|
| 138 |
| Property | Value |
|
| 139 |
|----------|-------|
|
| 140 |
| Base model | openai/gpt-oss-20b |
|
| 141 |
+
| Parameters | 20.9 B |
|
| 142 |
| Architecture | GptOss (sliding + full attention) |
|
| 143 |
+
| Fine-tuning method | Diamond SFT (LoRA) → Repair GRPO (LoRA) → Self-Repair Flywheel (5× LoRA) → merged |
|
| 144 |
| Context length | 2048 (trained) / 131072 (base) |
|
| 145 |
| GGUF quantisation | Q8_0 (~22 GB) |
|
| 146 |
+
| Training date | April – May 2026 |
|
| 147 |
|
| 148 |
### System prompt
|
| 149 |
|
|
|
|
|
|
|
| 150 |
```
|
| 151 |
You are ChatTLA, an expert at writing verified TLA+ formal specifications.
|
| 152 |
When asked to write a TLA+ spec, follow these rules exactly:
|
|
|
|
| 163 |
|
| 164 |
## Training
|
| 165 |
|
| 166 |
+
### Phase 1 — Diamond SFT (v14)
|
| 167 |
|
| 168 |
+
v14 was produced by the **Diamond curation pipeline**: candidate TLA+ specs are generated by an earlier checkpoint, then graded by a tlc_validator that checks SANY parsing, TLC state-space exploration, non-trivial invariants, and mutation-test sensitivity. Specs that survive grading are LLM-judged for chain-of-thought quality, leaving a curated training pool (209 raw → 73 curated). The model is fine-tuned with LoRA on this pool and merged.
|
| 169 |
|
| 170 |
+
### Phase 2 — Repair GRPO (v15)
|
| 171 |
|
| 172 |
+
v15 applies **repair-based GRPO** (Group Relative Policy Optimization) on top of v14: instead of training on gold-standard specs alone, the model learns to *fix broken specs* using TLC error feedback as reward signal.
|
| 173 |
|
|
|
|
| 174 |
1. **Trajectory collection** — the v14 model generates specs for 398 problems with up to 6 repair iterations each, producing (broken, repaired) pairs scored by a multi-stage validator (SANY → TLC → Apalache → TLAPS).
|
| 175 |
+
2. **Dataset filtering** — pairs filtered to keep the "learnable middle" (`min_before_score=0.10`, `max_before_score=0.80`), yielding ~430 gradable pairs centered on score ≈ 0.45.
|
| 176 |
+
3. **GRPO training** — 300 steps, 4 generations per prompt, max 384 completion tokens. Reward is the score-improvement delta `after − before`, normalized by group. lr=3e-6, KL β=0.02, temp=0.5.
|
| 177 |
+
4. **LoRA merge** — best checkpoint (around step 140–160) merged back into full weights.
|
| 178 |
|
| 179 |
+
### Phase 3 — Self-Repair Flywheel (v20)
|
| 180 |
|
| 181 |
+
v20 wraps Phase 2 in an **autonomous outer loop** that keeps running on the production GPU pool. Each cycle:
|
| 182 |
|
| 183 |
+
1. **Failure harvest.** Sample 400 random NL prompts, call the *current* production model, classify the outputs (Gold / Silver / Bronze).
|
| 184 |
+
2. **Pair construction.** Bootstrap (broken ��� repaired) pairs from the bronze and silver outputs; the repaired side comes from the same model under a stricter retry budget.
|
| 185 |
+
3. **Repair-GRPO step.** 160 steps on the harvested pairs, LoRA r=8 / α=16, lr=3e-6, KL β=0.02, on the current best merged base.
|
| 186 |
+
4. **Merge → GGUF → Ollama** as `chattla:20b-c{N}`.
|
| 187 |
+
5. **Holdout eval.** 30-spec 3-shot benchmark against the same held-out suite.
|
| 188 |
+
6. **Promote-on-improvement.** If `score_cN > best_score`, update the production tag `chattla:20b-repair`. Otherwise keep prior; the failed candidate stays as `chattla:20b-c{N}` for analysis.
|
| 189 |
|
| 190 |
+
v20 is **cycle 5** of this flywheel: c1 → 5/30, c2 → 10/30, c3 → 8/30, c4 → 5/30, **c5 → 12/30 (promoted)**. Cycles c6 (11/30), c7 (9/30), c8 (9/30) did not promote.
|
| 191 |
+
|
| 192 |
+
### Training configuration (v20 incremental cycle)
|
| 193 |
|
| 194 |
| Setting | Value |
|
| 195 |
|---------|-------|
|
| 196 |
+
| Method | Repair GRPO with LoRA |
|
| 197 |
+
| LoRA rank / α / dropout | 8 / 16 / 0.0 |
|
| 198 |
+
| GRPO steps | 160 per cycle |
|
| 199 |
+
| GRPO generations / prompt | 4 |
|
| 200 |
+
| GRPO max completion length | 384 tokens |
|
| 201 |
+
| Learning rate | 3e-6 |
|
| 202 |
+
| KL β | 0.02 |
|
| 203 |
+
| Temperature | 0.5 |
|
| 204 |
+
| Failures harvested / cycle | 400 (filtered to ~150–250 gradable pairs) |
|
| 205 |
| Hardware | 2× Quadro RTX 8000 (48 GB each) |
|
| 206 |
|
| 207 |
+
DPO/KTO refinement was used in v11–v13 but was deprecated in the Diamond overhaul: 0/484 specs from those preference-trained checkpoints actually passed Diamond, indicating the model had learned TLA+ syntax without learning semantics.
|
| 208 |
+
|
| 209 |
---
|
| 210 |
|
| 211 |
## Files
|
| 212 |
|
| 213 |
```
|
| 214 |
EricSpencer00/chattla-20b
|
| 215 |
+
├── README.md
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 216 |
└── gguf/
|
| 217 |
+
├── chattla-20b-v20-Q8_0.gguf # Quantised GGUF for Ollama / llama.cpp (~22 GB)
|
| 218 |
└── Modelfile # Ollama Modelfile
|
| 219 |
```
|
| 220 |
|