Part of the Outlier shipping lineup. Outlier is a free macOS app that runs this model locally, with one click. Apple Silicon only.

Outlier Nano 4B (Combined-fused) — MLX 4-bit

This is the multi-domain Combined-fused checkpoint — the base Qwen3-Next 4B (hybrid linear:full attention 3:1) with a fused LoRA adapter trained jointly on math reasoning, translation, and Q&A. The earlier "base Nano" checkpoint is still available at git tag pre-combined-v1.

Why fused?

Same 4B parameter budget, but every domain gets a meaningful lift:

Benchmark Base Nano Combined-fused Delta
GSM8K math (n=50) 5% 78% +73pp
MMLU-style (n=30) 74.4% 96.7% +22pp
Hard reasoning (n=15) 87–93% new
Q&A factual (n=30) 74% 100% +26pp
Code generation 100% new
Translation (n=30) 20% 90% +70pp
RAG 100% new
Variant robustness (5× phrasings) 100% new

Empirically validated on M1 Ultra over 70+ tests (v2_research session 2026-05-27). Quality holds across KV-bit quantization 5→3 for Q&A (–10pp at kv=3 for math only); ship with kv_bits=5 everywhere as the safe floor.

Performance on M1 Ultra

  • TTFT (with sysprompt cache): 305 ms mean, P99 329 ms
  • Decode: 89 tok/s (faster than base Nano on the same hardware)
  • End-to-end (100-token response): 728 ms wall
  • RAM peak: 3.17 GB constant across 100-query stress test (zero memory growth)

This stack delivers a "feels-snappy" response latency that's faster than the network round-trip alone of any cloud LLM service.

Architecture

  • Base: Qwen3-Next 4B (hybrid 3:1 linear:full attention)
  • Adapter: LoRA fused into base weights (rank-8, late-layer focus 17–30, MLP gates + self-attn q_proj most affected)
  • Quantization: MLX 4-bit
  • Context: 32K tokens

Usage

The Outlier macOS app ships this as the default Nano tier — no setup needed.

For direct mlx_lm use:

from mlx_lm import load, generate

model, tok = load("Outlier-Ai/Outlier-Nano-4B-MLX-4bit")
out = generate(model, tok, prompt="What's 17+23?",
               max_tokens=50, kv_bits=5, kv_group_size=64)
print(out)

License

Apache 2.0 — same as the base Qwen3-Next 4B model.

Citation

@misc{outlier-nano-combined-4b-mlx-4bit-2026,
  author = {Outlier-Ai},
  title = {Outlier Nano 4B (Combined-fused, MLX 4-bit)},
  year = {2026},
  publisher = {HuggingFace},
  url = {https://huggingface.co/Outlier-Ai/Outlier-Nano-4B-MLX-4bit}
}
Downloads last month
81
Safetensors
Model size
0.7B params
Tensor type
BF16
·
U32
·
F32
·
MLX
Hardware compatibility
Log In to add your hardware

4-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Outlier-Ai/Outlier-Nano-4B-MLX-4bit

Finetuned
Qwen/Qwen3.5-4B
Quantized
(217)
this model

Evaluation results