Tiny Aya Global β€” Tool-Calling GGUF

A corrected, tool-calling-ready GGUF of CohereLabs/tiny-aya-global for Ollama and llama.cpp.

Part of the Tiny Facade collection β€” an open-source effort to bring reliable multilingual tool calling to on-device AI.

What This Fixes

The official Tiny Aya GGUFs on Ollama ship with the wrong chat template (Command-R's template instead of Tiny Aya's own). This causes:

  • End-token leakage β€” <|END_OF_TURN_TOKEN|> and <|END_RESPONSE|> printed as visible text in responses
  • No tool-calling support β€” the default template has no provisions for function calling
  • Broken conversation flow β€” responses don't terminate cleanly

This GGUF ships with a corrected Modelfile that uses Tiny Aya's actual template, adds proper stop tokens, and injects structured tool-calling support.

Quick Start (Ollama)

# Download the Modelfile
# Then create the model pointing to the GGUF
ollama create tiny-aya-global-tools -f tiny-aya-global-tools.Modelfile

Or if you've downloaded the GGUF directly, update the FROM line in the Modelfile to point to your local file:

FROM ./tiny-aya-global-tools.GGUF

Then:

ollama create tiny-aya-global-tools -f tiny-aya-global-tools.Modelfile
ollama run tiny-aya-global-tools

Tool Calling

The corrected template supports Ollama's native tool calling. Define tools in your API call and the model will respond with structured <tool_call> blocks.

Example (Python + Ollama)

import ollama

response = ollama.chat(
    model='tiny-aya-global-tools',
    messages=[
        {'role': 'user', 'content': 'What is the weather in Kampala?'}
    ],
    tools=[
        {
            'type': 'function',
            'function': {
                'name': 'get_weather',
                'description': 'Get current weather for a location',
                'parameters': {
                    'type': 'object',
                    'properties': {
                        'location': {
                            'type': 'string',
                            'description': 'City name'
                        }
                    },
                    'required': ['location']
                }
            }
        }
    ]
)

print(response['message'])

Multilingual Tool Calling

The model handles tool calls from prompts in 70+ languages. Examples:

Language Prompt Expected Tool Call
English "What's the weather in Nairobi?" get_weather(location="Nairobi")
Swahili "Hali ya hewa Dar es Salaam ikoje?" get_weather(location="Dar es Salaam")
Luganda "Embeera y'obudde mu Kampala eri etya?" get_weather(location="Kampala")

Model Details

Property Value
Base Model CohereLabs/tiny-aya-global
Parameters 3.35B
Quantization Q4_K_M
File Size ~2.0 GB
Languages 70+ (optimized for English, Swahili, Luganda)
License CC-BY-NC-4.0 (inherited from Tiny Aya)

What's in This Repo

  • tiny-aya-global-tools.GGUF β€” The quantized model weights (Q4_K_M)
  • tiny-aya-global-tools.Modelfile β€” Corrected Ollama Modelfile with tool-calling template

The Corrected Template

The key fix is using Tiny Aya's native chat format with proper token boundaries:

<|START_OF_TURN_TOKEN|><|SYSTEM_TOKEN|>...system prompt...<|END_OF_TURN_TOKEN|>
<|START_OF_TURN_TOKEN|><|USER_TOKEN|>...user message...<|END_OF_TURN_TOKEN|>
<|START_OF_TURN_TOKEN|><|CHATBOT_TOKEN|><|START_RESPONSE|>...response...<|END_RESPONSE|><|END_OF_TURN_TOKEN|>

Both <|END_OF_TURN_TOKEN|> and <|END_RESPONSE|> are registered as stop tokens, preventing leakage.

Tool definitions are injected into the system prompt inside <tools>...</tools> tags, and the model is instructed to respond with <tool_call> blocks when appropriate.

Tiny Facade Project

Tiny Facade is an open-source research project investigating whether Tiny Aya can serve as a shared multilingual tool-calling service on Android devices. Instead of every app bundling its own 2GB language model, Facade loads the model once and exposes a shared interface through Android's AIDL system.

Research Focus:

  • Multilingual tool-calling accuracy (English, Swahili, Luganda)
  • Shared on-device inference architecture
  • LoRA fine-tuning for structured function-call generation

Authors: Bronson Bakunga, Kato Steven Mubiru Affiliation: Crane AI Labs / Cohere Labs Community Part of: Expedition Tiny Aya (Cohere Labs)

All Variants

Variant Description Repo
Global Broadest language coverage Bronsn/tiny-aya-global-tools-GGUF
Earth Optimized for African languages Bronsn/tiny-aya-earth-tools-GGUF
Fire Optimized for South/Southeast Asian languages Bronsn/tiny-aya-fire-tools-GGUF
Water Optimized for European languages Bronsn/tiny-aya-water-tools-GGUF

Citation

If you use these models, please cite the original Tiny Aya work:

@article{cohere2026tinyaya,
  title={Tiny Aya: Democratizing Multilingual AI for On-Device Use},
  author={Cohere Labs},
  year={2026}
}
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for Bronsn/tiny-aya-global-tools-GGUF

Finetuned
(1)
this model

Collection including Bronsn/tiny-aya-global-tools-GGUF