Qwen3-VL-8B-Interleave-Thinking (v0.1)

Qwen3-VL-8B-Interleave-Thinking is a specialized agentic model fine-tuned on top of Qwen/Qwen3-VL-8B-Thinking. It is designed to provide an experience similar to the OpenAI Agent SDK, featuring interleaved thinking capabilities where the model generates internal thought processes before executing function calls.

Model Details

Base Model: Qwen/Qwen3-VL-8B-Thinking
Fine-tuning Dataset: hxssgaa/xlam-interleave-thinking-40k
Methodology: Distilled from MiniMax M2.1, specifically targeting agentic behaviors and reasoning chains.
Version: v0.1 (SFT Only). Future versions will incorporate large-scale Reinforcement Learning (RL) to further enhance agentic capabilities.

Key Features

Interleaved Thinking: The model is trained to "think" before acting. It generates a reasoning trace (thought chain) before emitting a function call, allowing for better error correction and planning.
Long-Horizon Function Calling: Capable of handling complex, multi-step tasks by maintaining a coherent thought process throughout the interaction.
Agentic Focus: Optimized for tool use and complex scenarios where the model needs to decide why and how to use a tool effectively.

Usage

Please use vllm to serve the model, you may need to write a custom agent framework to check whether the agent's output contain <tool_call>xxx</tool_call>, to judge whether LLM has finished function calling.

Note one user query could result in multiple <tool_call>xxx</tool_call>

vllm serve hxssgaa/Qwen3-VL-8B-Interleave-Thinking \
  --trust-remote-code \
  --host 0.0.0.0 \
  --port 8000

Dataset & Training

The model was fine-tuned on xlam-interleave-thinking-40k, a dataset containing 40,000 high-quality examples of interleaved thinking and tool usage distilled from the MiniMax M2.1 model. This dataset ensures the model adopts a rigorous thinking pattern suitable for autonomous agents.

Future Work

This v0.1 release represents the initial Supervised Fine-Tuning (SFT) phase. Subsequent releases will focus on:

Large-scale Reinforcement Learning (RL) to refine policy optimization.
Enhanced robustness in edge-case handling.

Citation

If you use this model, please cite the original Qwen3-VL work and the xlam-interleave-thinking dataset.

Downloads last month: -

Safetensors

Model size

9B params

Tensor type

BF16

Model tree for hxssgaa/Qwen3-VL-8B-Interleave-Thinking

Base model

Qwen/Qwen3-VL-8B-Thinking

Finetuned

(14)

this model

Quantizations

2 models

hxssgaa
/

Qwen3-VL-8B-Interleave-Thinking