Instructions to use EricSpencer00/chattla-20b with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use EricSpencer00/chattla-20b with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="EricSpencer00/chattla-20b")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("EricSpencer00/chattla-20b")
model = AutoModelForCausalLM.from_pretrained("EricSpencer00/chattla-20b")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

llama-cpp-python

How to use EricSpencer00/chattla-20b with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="EricSpencer00/chattla-20b",
	filename="gguf/chattla-20b-v10-Q8_0.gguf",
)

llm.create_chat_completion(
	messages = [
		{
			"role": "user",
			"content": "What is the capital of France?"
		}
	]
)

Notebooks
Google Colab
Kaggle
Local Apps Settings

llama.cpp

How to use EricSpencer00/chattla-20b with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf EricSpencer00/chattla-20b:Q8_0
# Run inference directly in the terminal:
llama-cli -hf EricSpencer00/chattla-20b:Q8_0

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf EricSpencer00/chattla-20b:Q8_0
# Run inference directly in the terminal:
llama-cli -hf EricSpencer00/chattla-20b:Q8_0

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf EricSpencer00/chattla-20b:Q8_0
# Run inference directly in the terminal:
./llama-cli -hf EricSpencer00/chattla-20b:Q8_0

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf EricSpencer00/chattla-20b:Q8_0
# Run inference directly in the terminal:
./build/bin/llama-cli -hf EricSpencer00/chattla-20b:Q8_0

Use Docker

docker model run hf.co/EricSpencer00/chattla-20b:Q8_0

LM Studio
Jan

vLLM

How to use EricSpencer00/chattla-20b with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "EricSpencer00/chattla-20b"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "EricSpencer00/chattla-20b",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/EricSpencer00/chattla-20b:Q8_0

SGLang

How to use EricSpencer00/chattla-20b with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "EricSpencer00/chattla-20b" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "EricSpencer00/chattla-20b",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "EricSpencer00/chattla-20b" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "EricSpencer00/chattla-20b",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Ollama
How to use EricSpencer00/chattla-20b with Ollama:
```
ollama run hf.co/EricSpencer00/chattla-20b:Q8_0
```

Unsloth Studio

How to use EricSpencer00/chattla-20b with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for EricSpencer00/chattla-20b to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for EricSpencer00/chattla-20b to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for EricSpencer00/chattla-20b to start chatting

How to use EricSpencer00/chattla-20b with Pi:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf EricSpencer00/chattla-20b:Q8_0

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "llama-cpp": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "EricSpencer00/chattla-20b:Q8_0"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Hermes Agent new

How to use EricSpencer00/chattla-20b with Hermes Agent:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf EricSpencer00/chattla-20b:Q8_0

Configure Hermes

# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup
# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default EricSpencer00/chattla-20b:Q8_0

Run Hermes

hermes

Docker Model Runner
How to use EricSpencer00/chattla-20b with Docker Model Runner:
```
docker model run hf.co/EricSpencer00/chattla-20b:Q8_0
```

Lemonade

How to use EricSpencer00/chattla-20b with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull EricSpencer00/chattla-20b:Q8_0

Run and chat with the model

lemonade run user.chattla-20b-Q8_0

List all available models

lemonade list

EricSpencer00 commited on May 2

Commit

c1a3e8b

verified ·

1 Parent(s): 29897c4

v20: README (benchmarks from latest full CSV if present)

Browse files

Files changed (1) hide show

README.md +82 -109

README.md CHANGED Viewed

@@ -20,7 +20,7 @@ datasets:
 pipeline_tag: text-generation
 ---
-# ChatTLA-20b (v15)
 ChatTLA is a fine-tuned version of [openai/gpt-oss-20b](https://huggingface.co/openai/gpt-oss-20b) specialised in generating **TLA+ formal specifications** — the language used by AWS, Microsoft, and Intel to mathematically verify distributed systems.
@@ -28,87 +28,59 @@ Given a plain-English description of a concurrent or distributed system, ChatTLA
 ---
-## Benchmark Results (v15, 3-shot self-correct)
-Evaluated on a 30-spec held-out suite spanning communication protocols, concurrency primitives, consensus, data structures, memory/caches, mutual exclusion, classical puzzles, scheduling, transactions, and workflow state machines. Each spec gets up to 3 self-correction attempts using TLC error feedback. Tiers are defined by what the spec actually does under SANY and TLC, not just whether it parses:
 | Tier | Meaning |
 |------|---------|
-| 💎 Diamond | Gold **and** TLC explores ≥1 distinct state, has a non-trivial invariant, and the invariant catches a mutation |
 | 🥇 Gold | SANY parses **and** TLC model-checks clean |
 | 🥈 Silver | SANY parses, TLC finds violation or timeout |
 | Bronze | SANY parse failure |
-Diamond is the headline metric: it's the only tier that proves the spec is *semantically* useful rather than just syntactically valid.
-### Per-spec results (30-spec holdout)
-| # | Batch | Module | Tier | Diamond |
-|---|-------|--------|------|:------:|
-|  1 | communication_protocols | AlternatingBit            | Bronze      |    |
-|  2 | communication_protocols | Arp                       | Bronze      |    |
-|  3 | communication_protocols | AtomicRegister            | Bronze      |    |
-|  4 | concurrency_primitives  | BinarySemaphore           | Bronze      |    |
-|  5 | concurrency_primitives  | Channel                   | Bronze      |    |
-|  6 | concurrency_primitives  | CountDownLatch            | Bronze      |    |
-|  7 | consensus_election      | AtomicCommit              | Bronze      |    |
-|  8 | consensus_election      | BullyElection             | 🥇 Gold     | 💎 |
-|  9 | consensus_election      | ByzantineQuorum           | Bronze      |    |
-| 10 | data_structures         | BinaryHeap                | Bronze      |    |
-| 11 | data_structures         | BloomCounter              | 🥇 Gold     | 💎 |
-| 12 | data_structures         | BloomFilter               | ⏱ Timeout   |    |
-| 13 | memory_caches           | ArenaAllocator            | 🥇 Gold     | 💎 |
-| 14 | memory_caches           | BuddyAllocator            | Bronze      |    |
-| 15 | memory_caches           | CopyingGc                 | Bronze      |    |
-| 16 | mutual_exclusion        | AdaptiveMutex             | 🥇 Gold     | 💎 |
-| 17 | mutual_exclusion        | AndersonMutex             | 🥇 Gold     | 💎 |
-| 18 | mutual_exclusion        | AravindMutex              | ⏱ Timeout   |    |
-| 19 | puzzles_classical       | BlocksWorld               | Bronze      |    |
-| 20 | puzzles_classical       | ChessKingMoves            | Bronze      |    |
-| 21 | puzzles_classical       | ColoredHats               | Bronze      |    |
-| 22 | scheduling_resources    | AdmissionControl          | 🥇 Gold     | 💎 |
-| 23 | scheduling_resources    | BackpressureChannel       | 🥇 Gold     | 💎 |
-| 24 | scheduling_resources    | Bankers                   | ⏱ Timeout   |    |
-| 25 | transactions_databases  | ChainReplication          | ⏱ Timeout   |    |
-| 26 | transactions_databases  | DistributedLock           | Bronze      |    |
-| 27 | transactions_databases  | FencingToken              | Bronze      |    |
-| 28 | workflows_state_machines| ContentModeration         | 🥇 Gold     | 💎 |
-| 29 | workflows_state_machines| DocumentApproval          | 🥇 Gold     | 💎 |
-| 30 | workflows_state_machines| EmailVerification         | Bronze      |    |
-**Diamond: 9/30 (30%) · Gold: 9/30 (30%)**
-### Per-domain breakdown
-| Domain | Diamond |
-|--------|:-------:|
-| communication_protocols | 0/3 |
-| concurrency_primitives | 0/3 |
-| consensus_election | 1/3 |
-| data_structures | 1/3 |
-| memory_caches | 1/3 |
-| mutual_exclusion | 2/3 |
-| puzzles_classical | 0/3 |
-| scheduling_resources | 2/3 |
-| transactions_databases | 0/3 |
-| workflows_state_machines | 2/3 |
 ### Version history
-| Version | Suite | SANY | TLC | Diamond / Notes |
-|---------|-------|------|-----|-----------------|
-| v6  | 20-problem handcraft     | 4/20 (20%)  | 1/20 (5%)  | — |
-| v7  | 20-problem handcraft     | 6/20 (30%)  | 1/20 (5%)  | — |
-| v8  | 20-problem handcraft     | 8/20 (40%)  | 1/20 (5%)  | — |
-| v9  | 20-problem handcraft     | 6/20 (30%)  | 3/20 (15%) | — |
 | v9 best-of-5 + self-correct | 20-problem handcraft | 16/20 (80%) | 5/20 (25%) | — |
-| v10 | 20-problem handcraft     | 6/20 (30%)  | 2/20 (10%) | — |
-| v11 | 20-problem handcraft     | 6/20 (30%)  | 2/20 (10%) | — |
-| v13 (SFT + DPO) | 20-problem handcraft | 9/20 (45%) | 5/20 (25%) | not measured (trivial invariants counted as Gold) |
-| v14 (Diamond SFT) | 30-spec holdout (single-shot) | 16/30 (53%) | 5/30 (17%) | 4/30 (13%) |
-| **v15 (Repair GRPO)** | **30-spec holdout (3-shot)** | 9/30 (30%) | 9/30 (30%) | **9/30 (30%)** |
-> v15 applies repair-based GRPO (Group Relative Policy Optimization) on top of v14's Diamond SFT weights. The model learns to fix its own broken specs by training on (broken → repaired) trajectory pairs with TLC-graded improvement reward. v15 eval uses 3-shot self-correction with TLC error feedback, matching realistic usage; v14 was evaluated single-shot, so SANY/TLC rates are not directly comparable. Diamond is the metric to track going forward.
 ---
@@ -117,10 +89,12 @@ Diamond is the headline metric: it's the only tier that proves the spec is *sema
 ### Ollama (recommended)
 ```bash
-# Pull and run directly
-ollama run EricSpencer00/chattla-20b
-# Or use the bundled Modelfile
 curl -L https://huggingface.co/EricSpencer00/chattla-20b/resolve/main/gguf/Modelfile -o Modelfile
 ollama create chattla:20b -f Modelfile
 ollama run chattla:20b "Write a TLA+ spec for a token ring with N nodes."
@@ -148,13 +122,11 @@ print(result[0]["generated_text"])
 ### llama.cpp / GGUF
 ```bash
-# Download GGUF
 huggingface-cli download EricSpencer00/chattla-20b \
-    gguf/chattla-20b-v15-Q8_0.gguf \
     --local-dir ./chattla
-# Run with llama.cpp
-./llama-cli -m chattla/gguf/chattla-20b-v15-Q8_0.gguf \
     -n 1024 --temp 0.4 \
     -p "Write a TLA+ spec for mutual exclusion with N processes."
 ```
@@ -166,17 +138,15 @@ huggingface-cli download EricSpencer00/chattla-20b \
 | Property | Value |
 |----------|-------|
 | Base model | openai/gpt-oss-20b |
-| Parameters | 20.9B |
 | Architecture | GptOss (sliding + full attention) |
-| Fine-tuning method | Diamond SFT (LoRA) → Repair GRPO (LoRA) → merged |
 | Context length | 2048 (trained) / 131072 (base) |
 | GGUF quantisation | Q8_0 (~22 GB) |
-| Training date | April 2026 |
 ### System prompt
-The model is prompted with:
 ```
 You are ChatTLA, an expert at writing verified TLA+ formal specifications.
 When asked to write a TLA+ spec, follow these rules exactly:
@@ -193,55 +163,58 @@ When asked to write a TLA+ spec, follow these rules exactly:
 ## Training
-### Phase 1: Diamond SFT (v14)
-v14 was produced by the **Diamond curation pipeline**: candidate TLA+ specs are generated by an earlier checkpoint, then graded by a tlc_validator that checks SANY parsing, TLC state-space exploration, non-trivial invariants, and mutation-test sensitivity. Specs that survive grading are LLM-judged for chain-of-thought quality, leaving a curated training pool (209 raw → 73 curated for the v14 SFT round). The model is fine-tuned with LoRA on this pool and merged.
-### Phase 2: Repair GRPO (v15)
-v15 applies **repair-based GRPO** (Group Relative Policy Optimization) on top of the v14 checkpoint. The key insight: instead of training on gold-standard specs alone, the model learns to *fix broken specs* using TLC error feedback as reward signal.
-**Pipeline:**
 1. **Trajectory collection** — the v14 model generates specs for 398 problems with up to 6 repair iterations each, producing (broken, repaired) pairs scored by a multi-stage validator (SANY → TLC → Apalache → TLAPS).
-2. **Dataset filtering** — pairs are filtered to keep the "learnable middle": `min_before_score=0.10` (drop unparseable) and `max_before_score=0.80` (drop already-good), yielding ~430 gradable pairs centered on score ≈ 0.45.
-3. **GRPO training** — 300 steps, 4 generations per prompt, max 384 completion tokens. The reward is the improvement delta: `after_score - before_score`, normalized by group. Learning rate 3e-6, KL penalty β=0.02, temperature 0.5.
-4. **LoRA merge** — best checkpoint (around step 140–160 where reward peaked) merged back into full weights.
-Reward peaked at steps 140–160 with `reward_std ≈ 0.25` (vs 0.0 in prior full-spec GRPO attempts that had zero variance). This was the first successful RL run on TLA+ spec generation.
-**R2 regression and R3 (in progress).** A second flywheel round (R2) continued GRPO from v15's merged weights on a freshly harvested dataset and regressed to 6/30 (20%). Post-mortem: the Phase 2 merge deduped pairs on `(nl[:80], round(before_score, 1))`, a score-bucket width of 0.1 that collapsed most of the learnable-middle band; combined with a raised `min_before_score = 0.10`, the usable training set fell from 433 → 179 pairs, shifted hard (mean before_score 0.26 → 0.42), and the model overtrained past its 150-step peak over 300 steps. Regressions concentrated in `mutual_exclusion` and `workflows_state_machines` (2/3 lost each). R3 pulls only the data and step-budget levers: dedup key widened to `(nl[:120], round(before_score, 2))`, score floor restored to 0.02, `--max-iters` raised 6 → 9 to grow the raw pool, and `--max-steps` cut to 175 with a checkpoint picker that selects the save closest to step 150. v15 remains the production checkpoint until R3 beats 9/30.
-DPO/KTO refinement was used in v11–v13 but was deprecated in the Diamond overhaul: 0/484 specs from those preference-trained checkpoints actually passed Diamond, indicating the model had learned TLA+ syntax without learning semantics.
-### Training configuration
 | Setting | Value |
 |---------|-------|
-| SFT method | LoRA (lora_dropout=0) |
-| GRPO method | LoRA, 4 generations, 384 max completion |
-| GRPO learning rate | 3e-6 |
-| GRPO KL β | 0.02 |
-| GRPO steps | 300 (best checkpoint ~150) |
-| Max sequence length | 2048 |
-| TRL | 0.28.0 |
-| Transformers | 5.2.0 |
-| PyTorch | 2.10.0 |
 | Hardware | 2× Quadro RTX 8000 (48 GB each) |
 ---
 ## Files
 ```
 EricSpencer00/chattla-20b
-├── config.json              # Model architecture
-├── tokenizer.json           # Tokenizer
-├── tokenizer_config.json
-├── chat_template.jinja      # Chat template
-├── pytorch_model.bin        # Full BF16 weights (39 GB)
-├── generation_config.json
 └── gguf/
-    ├── chattla-20b-v15-Q8_0.gguf   # Quantised GGUF for Ollama / llama.cpp
     └── Modelfile                    # Ollama Modelfile
 ```

 pipeline_tag: text-generation
 ---
+# ChatTLA-20b (v20)
 ChatTLA is a fine-tuned version of [openai/gpt-oss-20b](https://huggingface.co/openai/gpt-oss-20b) specialised in generating **TLA+ formal specifications** — the language used by AWS, Microsoft, and Intel to mathematically verify distributed systems.
 ---
+## What's new in v20
+v20 is the **autonomous self-repair flywheel** model — the result of 5 successful repair-GRPO cycles (c1 → c5) building on v15's Repair-GRPO weights. Each cycle harvests fresh failures from the production model, mines them into broken/repaired training pairs, and runs Repair-GRPO with TLC-graded reward. Promotion to production happens only when the 30-spec holdout score improves; cycles c6, c7, c8 trained on top of v20/c5 but failed to beat it (11, 9, 9 vs 12) and were not promoted.
+---
+## Benchmark Results (v20, 3-shot self-correct)
+Evaluated on the same 30-spec held-out suite as v14/v15, spanning communication protocols, concurrency primitives, consensus, data structures, memory/caches, mutual exclusion, classical puzzles, scheduling, transactions, and workflow state machines. Each spec gets up to 3 self-correction attempts using TLC error feedback.
 | Tier | Meaning |
 |------|---------|
 | 🥇 Gold | SANY parses **and** TLC model-checks clean |
 | 🥈 Silver | SANY parses, TLC finds violation or timeout |
 | Bronze | SANY parse failure |
+> Diamond tier (mutation-test caught + non-trivial invariant) was not assessed in this round; v20's evaluation reports Gold-rate only.
+### Per-domain breakdown (30-spec holdout, 3-shot)
+| Domain | Gold |
+|--------|:-----:|
+| communication_protocols  | 2/3 |
+| concurrency_primitives   | 2/3 |
+| consensus_election       | 2/3 |
+| data_structures          | 0/3 |
+| memory_caches            | 0/3 |
+| mutual_exclusion         | 2/3 |
+| puzzles_classical        | 1/3 |
+| scheduling_resources     | 1/3 |
+| transactions_databases   | 1/3 |
+| workflows_state_machines | 1/3 |
+| **Total**                | **12 / 30 (40 %)** |
+Domains where v20 reaches 2/3: communication, concurrency, consensus, mutual exclusion. Domains where v20 still fails completely: data structures, memory/caches.
 ### Version history
+| Version | Suite | SANY | TLC (Gold) | Notes |
+|---------|-------|------|------------|-------|
+| v6  | 20-problem handcraft     | 4/20 (20%)  | 1/20 (5%)   | — |
+| v7  | 20-problem handcraft     | 6/20 (30%)  | 1/20 (5%)   | — |
+| v8  | 20-problem handcraft     | 8/20 (40%)  | 1/20 (5%)   | — |
+| v9  | 20-problem handcraft     | 6/20 (30%)  | 3/20 (15%)  | — |
 | v9 best-of-5 + self-correct | 20-problem handcraft | 16/20 (80%) | 5/20 (25%) | — |
+| v10 | 20-problem handcraft     | 6/20 (30%)  | 2/20 (10%)  | — |
+| v11 | 20-problem handcraft     | 6/20 (30%)  | 2/20 (10%)  | — |
+| v13 (SFT + DPO) | 20-problem handcraft | 9/20 (45%) | 5/20 (25%) | trivial invariants counted as Gold |
+| v14 (Diamond SFT) | 30-spec holdout (single-shot) | 16/30 (53%) | 5/30 (17%) | Diamond 4/30 (13%) |
+| v15 (Repair GRPO) | 30-spec holdout (3-shot) | 9/30 (30%)  | 9/30 (30%)  | Diamond 9/30 (30%) |
+| **v20 (Flywheel c5)** | **30-spec holdout (3-shot)** | — | **12/30 (40%)** | first promoted-by-holdout-gain release |
+Compared to v15, v20 adds 3 specs to the Gold pool (+33 % relative): the gains concentrate in **communication** (+2) and **mutual exclusion** (+2), with **transactions** and **workflows** holding ground. **Data structures** and **memory/caches** remain unsolved across both versions and are the obvious next target.
 ---
 ### Ollama (recommended)
 ```bash
+ollama run hf.co/EricSpencer00/chattla-20b
+```
+Or use the bundled Modelfile:
+```bash
 curl -L https://huggingface.co/EricSpencer00/chattla-20b/resolve/main/gguf/Modelfile -o Modelfile
 ollama create chattla:20b -f Modelfile
 ollama run chattla:20b "Write a TLA+ spec for a token ring with N nodes."
 ### llama.cpp / GGUF
 ```bash
 huggingface-cli download EricSpencer00/chattla-20b \
+    gguf/chattla-20b-v20-Q8_0.gguf \
     --local-dir ./chattla
+./llama-cli -m chattla/gguf/chattla-20b-v20-Q8_0.gguf \
     -n 1024 --temp 0.4 \
     -p "Write a TLA+ spec for mutual exclusion with N processes."
 ```
 | Property | Value |
 |----------|-------|
 | Base model | openai/gpt-oss-20b |
+| Parameters | 20.9 B |
 | Architecture | GptOss (sliding + full attention) |
+| Fine-tuning method | Diamond SFT (LoRA) → Repair GRPO (LoRA) → Self-Repair Flywheel (5× LoRA) → merged |
 | Context length | 2048 (trained) / 131072 (base) |
 | GGUF quantisation | Q8_0 (~22 GB) |
+| Training date | April – May 2026 |
 ### System prompt
 ```
 You are ChatTLA, an expert at writing verified TLA+ formal specifications.
 When asked to write a TLA+ spec, follow these rules exactly:
 ## Training
+### Phase 1 — Diamond SFT (v14)
+v14 was produced by the **Diamond curation pipeline**: candidate TLA+ specs are generated by an earlier checkpoint, then graded by a tlc_validator that checks SANY parsing, TLC state-space exploration, non-trivial invariants, and mutation-test sensitivity. Specs that survive grading are LLM-judged for chain-of-thought quality, leaving a curated training pool (209 raw → 73 curated). The model is fine-tuned with LoRA on this pool and merged.
+### Phase 2 — Repair GRPO (v15)
+v15 applies **repair-based GRPO** (Group Relative Policy Optimization) on top of v14: instead of training on gold-standard specs alone, the model learns to *fix broken specs* using TLC error feedback as reward signal.
 1. **Trajectory collection** — the v14 model generates specs for 398 problems with up to 6 repair iterations each, producing (broken, repaired) pairs scored by a multi-stage validator (SANY → TLC → Apalache → TLAPS).
+2. **Dataset filtering** — pairs filtered to keep the "learnable middle" (`min_before_score=0.10`, `max_before_score=0.80`), yielding ~430 gradable pairs centered on score ≈ 0.45.
+3. **GRPO training** — 300 steps, 4 generations per prompt, max 384 completion tokens. Reward is the score-improvement delta `after − before`, normalized by group. lr=3e-6, KL β=0.02, temp=0.5.
+4. **LoRA merge** — best checkpoint (around step 140–160) merged back into full weights.
+### Phase 3 — Self-Repair Flywheel (v20)
+v20 wraps Phase 2 in an **autonomous outer loop** that keeps running on the production GPU pool. Each cycle:
+1. **Failure harvest.** Sample 400 random NL prompts, call the *current* production model, classify the outputs (Gold / Silver / Bronze).
+2. **Pair construction.** Bootstrap (broken ��� repaired) pairs from the bronze and silver outputs; the repaired side comes from the same model under a stricter retry budget.
+3. **Repair-GRPO step.** 160 steps on the harvested pairs, LoRA r=8 / α=16, lr=3e-6, KL β=0.02, on the current best merged base.
+4. **Merge → GGUF → Ollama** as `chattla:20b-c{N}`.
+5. **Holdout eval.** 30-spec 3-shot benchmark against the same held-out suite.
+6. **Promote-on-improvement.** If `score_cN > best_score`, update the production tag `chattla:20b-repair`. Otherwise keep prior; the failed candidate stays as `chattla:20b-c{N}` for analysis.
+v20 is **cycle 5** of this flywheel: c1 → 5/30, c2 → 10/30, c3 → 8/30, c4 → 5/30, **c5 → 12/30 (promoted)**. Cycles c6 (11/30), c7 (9/30), c8 (9/30) did not promote.
+### Training configuration (v20 incremental cycle)
 | Setting | Value |
 |---------|-------|
+| Method | Repair GRPO with LoRA |
+| LoRA rank / α / dropout | 8 / 16 / 0.0 |
+| GRPO steps | 160 per cycle |
+| GRPO generations / prompt | 4 |
+| GRPO max completion length | 384 tokens |
+| Learning rate | 3e-6 |
+| KL β | 0.02 |
+| Temperature | 0.5 |
+| Failures harvested / cycle | 400 (filtered to ~150–250 gradable pairs) |
 | Hardware | 2× Quadro RTX 8000 (48 GB each) |
+DPO/KTO refinement was used in v11–v13 but was deprecated in the Diamond overhaul: 0/484 specs from those preference-trained checkpoints actually passed Diamond, indicating the model had learned TLA+ syntax without learning semantics.
 ---
 ## Files
 ```
 EricSpencer00/chattla-20b
+├── README.md
 └── gguf/
+    ├── chattla-20b-v20-Q8_0.gguf   # Quantised GGUF for Ollama / llama.cpp (~22 GB)
     └── Modelfile                    # Ollama Modelfile
 ```