Tiny Aya Global — Tool-Calling GGUF

A corrected, tool-calling-ready GGUF of CohereLabs/tiny-aya-global for Ollama and llama.cpp.

Part of the Tiny Facade collection — an open-source effort to bring reliable multilingual tool calling to on-device AI.

What This Fixes

The official Tiny Aya GGUFs on Ollama ship with the wrong chat template (Command-R's template instead of Tiny Aya's own). This causes:

End-token leakage — <|END_OF_TURN_TOKEN|> and <|END_RESPONSE|> printed as visible text in responses
No tool-calling support — the default template has no provisions for function calling
Broken conversation flow — responses don't terminate cleanly

This GGUF ships with a corrected Modelfile that uses Tiny Aya's actual template, adds proper stop tokens, and injects structured tool-calling support.

Quick Start (Ollama)

# Download the Modelfile
# Then create the model pointing to the GGUF
ollama create tiny-aya-global-tools -f tiny-aya-global-tools.Modelfile

Or if you've downloaded the GGUF directly, update the FROM line in the Modelfile to point to your local file:

FROM ./tiny-aya-global-tools.GGUF

Then:

ollama create tiny-aya-global-tools -f tiny-aya-global-tools.Modelfile
ollama run tiny-aya-global-tools

Tool Calling

The corrected template supports Ollama's native tool calling. Define tools in your API call and the model will respond with structured <tool_call> blocks.

Example (Python + Ollama)

import ollama

response = ollama.chat(
    model='tiny-aya-global-tools',
    messages=[
        {'role': 'user', 'content': 'What is the weather in Kampala?'}
    ],
    tools=[
        {
            'type': 'function',
            'function': {
                'name': 'get_weather',
                'description': 'Get current weather for a location',
                'parameters': {
                    'type': 'object',
                    'properties': {
                        'location': {
                            'type': 'string',
                            'description': 'City name'
                        }
                    },
                    'required': ['location']
                }
            }
        }
    ]
)

print(response['message'])

Multilingual Tool Calling

The model handles tool calls from prompts in 70+ languages. Examples:

Language	Prompt	Expected Tool Call
English	"What's the weather in Nairobi?"	`get_weather(location="Nairobi")`
Swahili	"Hali ya hewa Dar es Salaam ikoje?"	`get_weather(location="Dar es Salaam")`
Luganda	"Embeera y'obudde mu Kampala eri etya?"	`get_weather(location="Kampala")`

Model Details

Property	Value
Base Model	CohereLabs/tiny-aya-global
Parameters	3.35B
Quantization	Q4_K_M
File Size	~2.0 GB
Languages	70+ (optimized for English, Swahili, Luganda)
License	CC-BY-NC-4.0 (inherited from Tiny Aya)

What's in This Repo

tiny-aya-global-tools.GGUF — The quantized model weights (Q4_K_M)
tiny-aya-global-tools.Modelfile — Corrected Ollama Modelfile with tool-calling template

The Corrected Template

The key fix is using Tiny Aya's native chat format with proper token boundaries:

<|START_OF_TURN_TOKEN|><|SYSTEM_TOKEN|>...system prompt...<|END_OF_TURN_TOKEN|>
<|START_OF_TURN_TOKEN|><|USER_TOKEN|>...user message...<|END_OF_TURN_TOKEN|>
<|START_OF_TURN_TOKEN|><|CHATBOT_TOKEN|><|START_RESPONSE|>...response...<|END_RESPONSE|><|END_OF_TURN_TOKEN|>

Both <|END_OF_TURN_TOKEN|> and <|END_RESPONSE|> are registered as stop tokens, preventing leakage.

Tool definitions are injected into the system prompt inside <tools>...</tools> tags, and the model is instructed to respond with <tool_call> blocks when appropriate.

Tiny Facade Project

Tiny Facade is an open-source research project investigating whether Tiny Aya can serve as a shared multilingual tool-calling service on Android devices. Instead of every app bundling its own 2GB language model, Facade loads the model once and exposes a shared interface through Android's AIDL system.

Research Focus:

Multilingual tool-calling accuracy (English, Swahili, Luganda)
Shared on-device inference architecture
LoRA fine-tuning for structured function-call generation

Authors: Bronson Bakunga, Kato Steven Mubiru Affiliation: Crane AI Labs / Cohere Labs Community Part of: Expedition Tiny Aya (Cohere Labs)

All Variants

Variant	Description	Repo
Global	Broadest language coverage	Bronsn/tiny-aya-global-tools-GGUF
Earth	Optimized for African languages	Bronsn/tiny-aya-earth-tools-GGUF
Fire	Optimized for South/Southeast Asian languages	Bronsn/tiny-aya-fire-tools-GGUF
Water	Optimized for European languages	Bronsn/tiny-aya-water-tools-GGUF

Citation

If you use these models, please cite the original Tiny Aya work:

@article{cohere2026tinyaya,
  title={Tiny Aya: Democratizing Multilingual AI for On-Device Use},
  author={Cohere Labs},
  year={2026}
}

Downloads last month: -; Downloads are not tracked for this model. How to track

Model tree for Bronsn/tiny-aya-global-tools-GGUF

Base model

CohereLabs/tiny-aya-base

Finetuned

CohereLabs/tiny-aya-global

Quantized

CohereLabs/tiny-aya-global-GGUF

Finetuned

(1)

this model

Collection including Bronsn/tiny-aya-global-tools-GGUF

Tiny Facade — Multilingual Tool-Calling Models

Collection

Corrected tool-calling GGUFs of Tiny Aya for Ollama/llama.cpp. Fixes chat templates and adds structured function calling. • 4 items • Updated 3 days ago • 1