Parameters Architecture Context

GLM-4.7-Flash-PRISM

An unrestricted/propaganda free version of ZAI's GLM-4.7-Flash with over-refusal and bias mechanisms completely removed using our Advanced PRISM Pipeline.

☕ Support Our Work

If you find this model useful, consider supporting us on Ko-fi!

Ko-fi

Option Description
PRISM VIP Membership Access to all PRISM models
One-Time Support Support this model

Model Highlights

  • PRISM Ablation — State-of-the-art technique that removes over-refusal behaviors while preserving model capabilities
  • 30B-A3B MoE Architecture — 30 billion total parameters with ~3 billion active per token for fast, efficient inference
  • 128K Context Window — Extended context for complex tasks and large codebases
  • Interleaved Thinking — Multi-turn reasoning that persists across conversations with per-turn thinking control

Benchmarks

Benchmark GLM-4.7-Flash Qwen3-30B-A3B-Thinking-2507 GPT-OSS-20B
AIME 2025 91.6 85.0 91.7
GPQA 75.2 73.4 71.5
LCB v6 64.0 66.0 61.0
HLE 14.4 9.8 10.9
SWE-bench Verified 59.2 22.0 34.0
τ²-Bench 79.5 49.0 47.7
BrowseComp 42.8 2.29 28.3

Usage

Transformers

Install the latest transformers from source:

pip install git+https://github.com/huggingface/transformers.git

Run inference:

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

MODEL_PATH = "Ex0bit/GLM-4.7-Flash-PRISM"

tokenizer = AutoTokenizer.from_pretrained(MODEL_PATH)
model = AutoModelForCausalLM.from_pretrained(
    MODEL_PATH,
    torch_dtype=torch.bfloat16,
    device_map="auto",
)

messages = [{"role": "user", "content": "Hello!"}]
inputs = tokenizer.apply_chat_template(
    messages,
    tokenize=True,
    add_generation_prompt=True,
    return_dict=True,
    return_tensors="pt",
).to(model.device)

generated_ids = model.generate(**inputs, max_new_tokens=128, do_sample=False)
output_text = tokenizer.decode(generated_ids[0][inputs.input_ids.shape[1]:])
print(output_text)

vLLM

Install vLLM nightly:

pip install -U vllm --pre --index-url https://pypi.org/simple --extra-index-url https://wheels.vllm.ai/nightly
pip install git+https://github.com/huggingface/transformers.git

Serve the model:

vllm serve Ex0bit/GLM-4.7-Flash-PRISM \
     --tensor-parallel-size 4 \
     --speculative-config.method mtp \
     --speculative-config.num_speculative_tokens 1 \
     --tool-call-parser glm47 \
     --reasoning-parser glm45 \
     --enable-auto-tool-choice \
     --served-model-name glm-4.7-flash-prism

SGLang

Install SGLang:

uv pip install sglang==0.3.2.dev9039+pr-17247.g90c446848 --extra-index-url https://sgl-project.github.io/whl/pr/
uv pip install git+https://github.com/huggingface/transformers.git@76732b4e7120808ff989edbd16401f61fa6a0afa

Launch the server:

python3 -m sglang.launch_server \
  --model-path Ex0bit/GLM-4.7-Flash-PRISM \
  --tp-size 4 \
  --tool-call-parser glm47  \
  --reasoning-parser glm45 \
  --speculative-algorithm EAGLE \
  --speculative-num-steps 3 \
  --speculative-eagle-topk 1 \
  --speculative-num-draft-tokens 4 \
  --mem-fraction-static 0.8 \
  --served-model-name glm-4.7-flash-prism \
  --host 0.0.0.0 \
  --port 8000

Note: For Blackwell GPUs, add --attention-backend triton --speculative-draft-attention-backend triton to your SGLang launch command.

Recommended Parameters

Use Case Temperature Top-P Max New Tokens
Default 1.0 0.95 131072
Code (SWE-bench) 0.7 1.0 16384
Agentic Tasks 0.0 — 16384

License

This model is released under the PRISM Research License.

Citation

@misc{elbaz2026glm47flashPrism,
  author = {Elbaz, Eric},
  title = {Elbaz-GLM-4.7-Flash-PRISM: Unchained GLM-4.7-Flash-PRISM Model},
  year = {2025},
  publisher = {Hugging Face},
  howpublished = {\url{https://huggingface.co/Ex0bit/Elbaz-GLM-4.7-Flash-PRISM}}
}

Acknowledgments

Based on GLM-4.7-Flash by Z.AI. See the technical report for more details on the base model.

Downloads last month
853
GGUF
Model size
30B params
Architecture
deepseek2
Hardware compatibility
Log In to add your hardware

3-bit

4-bit

8-bit

16-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Paper for Ex0bit/GLM-4.7-Flash-PRISM