Qwen3-VL-8B-Interleave-Thinking (v0.1)

Qwen3-VL-8B-Interleave-Thinking is a specialized agentic model fine-tuned on top of Qwen/Qwen3-VL-8B-Thinking. It is designed to provide an experience similar to the OpenAI Agent SDK, featuring interleaved thinking capabilities where the model generates internal thought processes before executing function calls.

Model Details

  • Base Model: Qwen/Qwen3-VL-8B-Thinking
  • Fine-tuning Dataset: hxssgaa/xlam-interleave-thinking-40k
  • Methodology: Distilled from MiniMax M2.1, specifically targeting agentic behaviors and reasoning chains.
  • Version: v0.1 (SFT Only). Future versions will incorporate large-scale Reinforcement Learning (RL) to further enhance agentic capabilities.

Key Features

  • Interleaved Thinking: The model is trained to "think" before acting. It generates a reasoning trace (thought chain) before emitting a function call, allowing for better error correction and planning.
  • Long-Horizon Function Calling: Capable of handling complex, multi-step tasks by maintaining a coherent thought process throughout the interaction.
  • Agentic Focus: Optimized for tool use and complex scenarios where the model needs to decide why and how to use a tool effectively.

Usage

Please use vllm to serve the model, you may need to write a custom agent framework to check whether the agent's output contain <tool_call>xxx</tool_call>, to judge whether LLM has finished function calling.

Note one user query could result in multiple <tool_call>xxx</tool_call>

vllm serve hxssgaa/Qwen3-VL-8B-Interleave-Thinking \
  --trust-remote-code \
  --host 0.0.0.0 \
  --port 8000

Dataset & Training

The model was fine-tuned on xlam-interleave-thinking-40k, a dataset containing 40,000 high-quality examples of interleaved thinking and tool usage distilled from the MiniMax M2.1 model. This dataset ensures the model adopts a rigorous thinking pattern suitable for autonomous agents.

Future Work

This v0.1 release represents the initial Supervised Fine-Tuning (SFT) phase. Subsequent releases will focus on:

  • Large-scale Reinforcement Learning (RL) to refine policy optimization.
  • Enhanced robustness in edge-case handling.

Citation

If you use this model, please cite the original Qwen3-VL work and the xlam-interleave-thinking dataset.

Downloads last month
-
Safetensors
Model size
9B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for hxssgaa/Qwen3-VL-8B-Interleave-Thinking

Finetuned
(14)
this model
Quantizations
2 models

Dataset used to train hxssgaa/Qwen3-VL-8B-Interleave-Thinking