Qwen3-VL-8B-Interleave-Thinking (v0.1)
Qwen3-VL-8B-Interleave-Thinking is a specialized agentic model fine-tuned on top of Qwen/Qwen3-VL-8B-Thinking. It is designed to provide an experience similar to the OpenAI Agent SDK, featuring interleaved thinking capabilities where the model generates internal thought processes before executing function calls.
Model Details
- Base Model: Qwen/Qwen3-VL-8B-Thinking
- Fine-tuning Dataset: hxssgaa/xlam-interleave-thinking-40k
- Methodology: Distilled from MiniMax M2.1, specifically targeting agentic behaviors and reasoning chains.
- Version: v0.1 (SFT Only). Future versions will incorporate large-scale Reinforcement Learning (RL) to further enhance agentic capabilities.
Key Features
- Interleaved Thinking: The model is trained to "think" before acting. It generates a reasoning trace (thought chain) before emitting a function call, allowing for better error correction and planning.
- Long-Horizon Function Calling: Capable of handling complex, multi-step tasks by maintaining a coherent thought process throughout the interaction.
- Agentic Focus: Optimized for tool use and complex scenarios where the model needs to decide why and how to use a tool effectively.
Usage
Please use vllm to serve the model, you may need to write a custom agent framework to check whether the agent's output contain <tool_call>xxx</tool_call>, to judge whether LLM has finished function calling.
Note one user query could result in multiple <tool_call>xxx</tool_call>
vllm serve hxssgaa/Qwen3-VL-8B-Interleave-Thinking \
--trust-remote-code \
--host 0.0.0.0 \
--port 8000
Dataset & Training
The model was fine-tuned on xlam-interleave-thinking-40k, a dataset containing 40,000 high-quality examples of interleaved thinking and tool usage distilled from the MiniMax M2.1 model. This dataset ensures the model adopts a rigorous thinking pattern suitable for autonomous agents.
Future Work
This v0.1 release represents the initial Supervised Fine-Tuning (SFT) phase. Subsequent releases will focus on:
- Large-scale Reinforcement Learning (RL) to refine policy optimization.
- Enhanced robustness in edge-case handling.
Citation
If you use this model, please cite the original Qwen3-VL work and the xlam-interleave-thinking dataset.
- Downloads last month
- -