Instructions to use PXIN/Ouroboros-9B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- llama-cpp-python
How to use PXIN/Ouroboros-9B with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="PXIN/Ouroboros-9B", filename="mmproj-BF16.gguf", )
llm.create_chat_completion( messages = "No input example has been defined for this model task." )
- Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- llama.cpp
How to use PXIN/Ouroboros-9B with llama.cpp:
Install from brew
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf PXIN/Ouroboros-9B:BF16 # Run inference directly in the terminal: llama-cli -hf PXIN/Ouroboros-9B:BF16
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf PXIN/Ouroboros-9B:BF16 # Run inference directly in the terminal: llama-cli -hf PXIN/Ouroboros-9B:BF16
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf PXIN/Ouroboros-9B:BF16 # Run inference directly in the terminal: ./llama-cli -hf PXIN/Ouroboros-9B:BF16
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf PXIN/Ouroboros-9B:BF16 # Run inference directly in the terminal: ./build/bin/llama-cli -hf PXIN/Ouroboros-9B:BF16
Use Docker
docker model run hf.co/PXIN/Ouroboros-9B:BF16
- LM Studio
- Jan
- Ollama
How to use PXIN/Ouroboros-9B with Ollama:
ollama run hf.co/PXIN/Ouroboros-9B:BF16
- Unsloth Studio
How to use PXIN/Ouroboros-9B with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for PXIN/Ouroboros-9B to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for PXIN/Ouroboros-9B to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for PXIN/Ouroboros-9B to start chatting
- Pi
How to use PXIN/Ouroboros-9B with Pi:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf PXIN/Ouroboros-9B:BF16
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "llama-cpp": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "PXIN/Ouroboros-9B:BF16" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use PXIN/Ouroboros-9B with Hermes Agent:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf PXIN/Ouroboros-9B:BF16
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default PXIN/Ouroboros-9B:BF16
Run Hermes
hermes
- Docker Model Runner
How to use PXIN/Ouroboros-9B with Docker Model Runner:
docker model run hf.co/PXIN/Ouroboros-9B:BF16
- Lemonade
How to use PXIN/Ouroboros-9B with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull PXIN/Ouroboros-9B:BF16
Run and chat with the model
lemonade run user.Ouroboros-9B-BF16
List all available models
lemonade list
🐉 Ouroboros-9B: The Recursive Reasoning Experiment
Ouroboros-9B is an independent research project focused on pushing the boundaries of recursive optimization and architectural efficiency. It represents a paradigm shift in how high-parameter models can be deployed and refined on consumer-grade hardware.
🚀 The Vision
Ouroboros is built on the principle of recursive refinement. By utilizing extreme 1.58-bit ternary compression as a foundation, the project aims to explore the intersection of large-scale reasoning and minimal-bit representations. Ouroboros doesn't just run on edge hardware; it is designed to evolve there.
🌳 Lineage & Architecture
Ouroboros-9B is built upon a high-performance logic foundation:
- Ouroboros-9B (Ternary Architectural Baseline)
- ↳ OmniCoder-9B (Advanced Coding & Reasoning Logic)
- ↳ Qwen 3.5 9B (Underlying Transformer Architecture)
🛠️ Technical Specifications
This initial baseline release utilizes Ternary (1.58-bit) Quantization via the TQ1_0 format.
- Quantization: TQ1_0 (BitNet 1.58-bit Ternary)
- Extreme Footprint: Weights are crushed down to
{-1, 0, 1}, reducing the model size from ~18GB to a compact 2.7GB. - Memory Efficiency: Over 85% reduction in VRAM/RAM requirements compared to BF16.
- Multimodal Engine: Integrated vision projectors enable visual reasoning and code-from-image analysis.
- Hardware Acceleration: Native optimization for the QVAC Fabric engine using specialized Vulkan and Metal kernels.
🖼️ Multimodal Capabilities
Ouroboros-9B includes high-fidelity vision projectors from the Unsloth collection, enabling it to process visual inputs such as code screenshots, diagrams, and UI layouts.
mmproj-BF16.gguf: Optimized for modern GPUs with native bfloat16 support.mmproj-F16.gguf: Universal high-precision projector for all backends.
🔬 Experimental Roadmap
Ouroboros is designed to "consume itself" to grow stronger through successive training phases.
- Phase 1 (Active): Establishing the Ternary Baseline. Deployment of the 1.58-bit architectural shift.
- Phase 2: Recovery Fine-tuning. Utilizing QVAC Fabric native low-bit training to restore logic and perplexity lost during the initial quantization.
- Phase 3: Recursive self-optimization and specialized forks for autonomous agentic workflows.
🛠️ Usage (QVAC Fabric)
To achieve the intended performance and use the ternary kernels, use the QVAC Fabric Engine.
Build:
git clone https://github.com/tetherto/qvac-fabric-llm.cpp.git
cd qvac-fabric-llm.cpp
cmake -B build -DGGML_VULKAN=ON
cmake --build build --config Release
Run:
# Text Inference
./build/bin/llama-cli -m ouroboros-9b-TQ1.gguf -p "Write a recursive function in Rust to..."
# Multimodal Inference
./build/bin/llama-minicpmv-cli -m ouroboros-9b-TQ1.gguf --mmproj mmproj-BF16.gguf --image screen.png -p "Explain the logic flow in this diagram."
🔗 Credits
This project is made possible by the following foundational works:
- Coding Logic: Tesslate/OmniCoder-9B
- Base Architecture: Qwen/Qwen3.5-9B
- Vision Projectors: unsloth/Qwen3.5-9B-GGUF
- Quantization Engine: QVAC Fabric
Disclaimer: This is an experimental research artifact. Logic performance may vary compared to higher-bit versions until recovery fine-tuning is complete.
- Downloads last month
- 422
We're not able to determine the quantization variants.