| | --- |
| | license: apache-2.0 |
| | language: |
| | - en |
| | base_model: |
| | - Qwen/Qwen3-8B |
| | pipeline_tag: image-text-to-text |
| | tags: |
| | - Bee-8B |
| | - Fully-Open-MLLMs |
| | datasets: |
| | - Open-Bee/Honey-Data-15M |
| | library_name: transformers |
| | --- |
| | # Bee: A High-Quality Corpus and Full-Stack Suite to Unlock Advanced Fully Open MLLMs |
| |
|
| | [[🏠 Homepage](https://open-bee.github.io/)] [[📖 Arxiv Paper](https://arxiv.org/pdf/2510.13795)] [[🤗 Models & Datasets](https://huggingface.co/collections/Open-Bee/bee-8b-68ecbf10417810d90fbd9995)] [[💻 Code(coming soon)](https://github.com/Open-Bee)] |
| |
|
| | ## Introduction |
| |
|
| | We introduce **Bee-8B**, a new state-of-the-art, fully open 8B Multimodal Large Language Model (MLLM) designed to close the performance gap with proprietary models by focusing on data quality. |
| |
|
| | Bee-8B is trained on our new **Honey-Data-15M** corpus, a high-quality supervised fine-tuning (SFT) dataset of approximately 15 million samples. This dataset was meticulously created with our transparent, adaptable, and open-source data curation pipeline, **HoneyPipe**, which systematically cleans noisy data and enriches it with a novel dual-level (short and long) Chain-of-Thought (CoT) strategy. |
| |
|
| | This dataset enables Bee-8B to achieve exceptional performance, particularly in complex reasoning, establishing a new standard for fully open MLLMs. |
| |
|
| | ## Key Features |
| |
|
| | - **High-Quality, Large-Scale Dataset:** We release **Honey-Data-15M**, a new 15M-sample SFT corpus. It has undergone extensive cleaning to remove widespread noise and has been enriched with dual-level CoT reasoning to enhance advanced problem-solving capabilities. |
| | - **Fully Open-Source Data Curation Suite:** We provide not just the data, but the entire methodology. **HoneyPipe** and its underlying framework **DataStudio** offer the community a transparent and reproducible pipeline, moving beyond static dataset releases. |
| | - **State-of-the-Art Open Model:** Our model, **Bee-8B**, achieves state-of-the-art performance among fully open MLLMs and is highly competitive with recent semi-open models like InternVL3.5-8B, demonstrating the power of high-quality data. |
| |
|
| | ## News |
| |
|
| | - **[2025.12.17]** 🔥 We have released all data and model weights across different stages. For the final stage (RL data), you can directly merge [ViRL39K](https://huggingface.co/datasets/TIGER-Lab/ViRL39K) and [MMK12](https://huggingface.co/datasets/FanqingM/MMK12) and use the [VeRL](https://github.com/volcengine/verl) framework for training. |
| |
|
| | - **[2025.11.03]** 📊 **[Honey-Data-15M](https://huggingface.co/datasets/Open-Bee/Honey-Data-15M) & [Honey-Data-1M](https://huggingface.co/datasets/Open-Bee/Honey-Data-1M) is Released\!** You can download the 15M full version and the 1M efficient version from [HuggingFace]((https://huggingface.co/collections/Open-Bee/bee-8b-68ecbf10417810d90fbd9995)). |
| |
|
| | - **[2025.10.20]** 🚀 **vLLM Support is Here!** Bee-8B now supports high-performance inference with [vLLM](https://github.com/vllm-project/vllm), enabling faster and more efficient deployment for production use cases. |
| |
|
| | - **[2025.10.13]** 🐝 **Bee-8B is Released\!** Our model is now publicly available. You can download it from [Hugging Face](https://huggingface.co/collections/Open-Bee/bee-8b-68ecbf10417810d90fbd9995). |
| |
|
| | ## Quickstart |
| |
|
| | > [!NOTE] |
| | > Below, we provide simple examples to show how to use Bee-8B with 🤗 Transformers. |
| | > You can dynamically control the model's response by selecting one of two modes: set `enable_thinking=True` for `thinking` mode, or `enable_thinking=False` for `non-thinking` mode. The default is `thinking` mode. |
| |
|
| |
|
| | ### Using 🤗 Transformers to Chat |
| |
|
| | ```python |
| | import requests |
| | import torch |
| | from PIL import Image |
| | from transformers import AutoModel, AutoProcessor |
| | |
| | model_path = "Open-Bee/Bee-8B-Stage3" |
| | |
| | # Load model |
| | model = AutoModel.from_pretrained( |
| | model_path, |
| | torch_dtype=torch.bfloat16, |
| | trust_remote_code=True, |
| | ).to("cuda") |
| | |
| | # Load processor |
| | processor = AutoProcessor.from_pretrained(model_path, trust_remote_code=True) |
| | |
| | # Define conversation messages |
| | messages = [{ |
| | "role": |
| | "user", |
| | "content": [ |
| | { |
| | "type": "image", |
| | "image": "https://huggingface.co/Open-Bee/Bee-8B-Stage3/resolve/main/assets/logo.png", |
| | }, |
| | { |
| | "type": "text", |
| | "text": "Based on this picture, write an advertising slogan about Bee-8B (a Fully Open Multimodal Large Language Model)." |
| | }, |
| | ], |
| | }] |
| | |
| | # Apply chat template |
| | text = processor.apply_chat_template(messages, |
| | tokenize=False, |
| | add_generation_prompt=True, |
| | enable_thinking=True) |
| | |
| | # Load image |
| | image_url = "https://huggingface.co/Open-Bee/Bee-8B-Stage3/resolve/main/assets/logo.png" |
| | image = Image.open(requests.get(image_url, stream=True).raw) |
| | |
| | # Process inputs |
| | inputs = processor(images=image, text=text, return_tensors="pt").to("cuda") |
| | |
| | # Generate output |
| | generated_ids = model.generate(**inputs, max_new_tokens=16384, temperature=0.6) |
| | output_ids = generated_ids[0][len(inputs.input_ids[0]):] |
| | |
| | # Decode output |
| | output_text = processor.decode(output_ids, skip_special_tokens=True) |
| | |
| | # Print result |
| | print(output_text) |
| | ``` |
| |
|
| | ### Using vLLM for High-Performance Inference |
| |
|
| | #### Install vLLM |
| |
|
| | > [!IMPORTANT] |
| | > Bee-8B support will be officially available in vLLM **v0.11.1**. Until then, please install vLLM from source: |
| |
|
| | ```bash |
| | git clone https://github.com/vllm-project/vllm.git |
| | cd vllm |
| | VLLM_USE_PRECOMPILED=1 uv pip install --editable . |
| | ``` |
| |
|
| | Once vLLM v0.11.1 is released, you will be able to install it directly via pip: |
| | ```bash |
| | pip install vllm>=0.11.1 |
| | ``` |
| |
|
| |
|
| | #### Offline Inference |
| | ```python |
| | from transformers import AutoProcessor |
| | from vllm import LLM, SamplingParams |
| | from PIL import Image |
| | import requests |
| | |
| | |
| | def main(): |
| | |
| | model_path = "Open-Bee/Bee-8B-Stage3" |
| | |
| | llm = LLM( |
| | model=model_path, |
| | limit_mm_per_prompt={"image": 5}, |
| | trust_remote_code=True, |
| | tensor_parallel_size=1, |
| | gpu_memory_utilization=0.8, |
| | ) |
| | |
| | sampling_params = SamplingParams( |
| | temperature=0.6, |
| | max_tokens=16384, |
| | ) |
| | |
| | image_url = "https://huggingface.co/Open-Bee/Bee-8B-Stage3/resolve/main/assets/logo.png" |
| | image = Image.open(requests.get(image_url, stream=True).raw) |
| | |
| | messages = [ |
| | { |
| | "role": |
| | "user", |
| | "content": [ |
| | { |
| | "type": "image", |
| | "image": image |
| | }, |
| | { |
| | "type": |
| | "text", |
| | "text": |
| | "Based on this picture, write an advertising slogan about Bee-8B (a Fully Open Multimodal Large Language Model)." |
| | }, |
| | ], |
| | }, |
| | ] |
| | |
| | processor = AutoProcessor.from_pretrained(model_path, |
| | trust_remote_code=True) |
| | prompt = processor.apply_chat_template( |
| | messages, |
| | tokenize=False, |
| | add_generation_prompt=True, |
| | enable_thinking=True, |
| | ) |
| | |
| | mm_data = {"image": image} |
| | llm_inputs = { |
| | "prompt": prompt, |
| | "multi_modal_data": mm_data, |
| | } |
| | |
| | outputs = llm.generate([llm_inputs], sampling_params=sampling_params) |
| | generated_text = outputs[0].outputs[0].text |
| | |
| | print(generated_text) |
| | |
| | |
| | if __name__ == '__main__': |
| | main() |
| | ``` |
| |
|
| | #### Online Serving |
| | - Start the server |
| | ```bash |
| | vllm serve \ |
| | Open-Bee/Bee-8B-Stage3 \ |
| | --served-model-name Bee-8B-Stage3 \ |
| | --tensor-parallel-size 8 \ |
| | --gpu-memory-utilization 0.8 \ |
| | --host 0.0.0.0 \ |
| | --port 8000 \ |
| | --trust-remote-code |
| | ``` |
| |
|
| | - Using OpenAI Python Client to Query the server |
| | ```python |
| | from openai import OpenAI |
| | |
| | # Set OpenAI's API key and API base to use vLLM's API server. |
| | openai_api_key = "EMPTY" |
| | openai_api_base = "http://localhost:8000/v1" |
| | |
| | client = OpenAI( |
| | api_key=openai_api_key, |
| | base_url=openai_api_base, |
| | ) |
| | |
| | # image url |
| | image_messages = [ |
| | { |
| | "role": |
| | "user", |
| | "content": [ |
| | { |
| | "type": "image_url", |
| | "image_url": { |
| | "url": |
| | "https://huggingface.co/Open-Bee/Bee-8B-Stage3/resolve/main/assets/logo.png" |
| | }, |
| | }, |
| | { |
| | "type": |
| | "text", |
| | "text": |
| | "Based on this picture, write an advertising slogan about Bee-8B (a Fully Open Multimodal Large Language Model)." |
| | }, |
| | ], |
| | }, |
| | ] |
| | |
| | chat_response = client.chat.completions.create( |
| | model="Bee-8B-Stage3", |
| | messages=image_messages, |
| | max_tokens=16384, |
| | extra_body={ |
| | "chat_template_kwargs": { |
| | "enable_thinking": True |
| | }, |
| | }, |
| | ) |
| | print("Chat response:", chat_response.choices[0].message.content) |
| | ``` |
| |
|
| | ## Experimental Results |
| |
|
| | <figure align="center"> |
| | <img src="assets/results.png" alt="logo"/> |
| | <figcaption>Evaluation of Bee-8B against other MLLMs. We distinguish between fully open (*) and semi-open (†) models. The <strong>top</strong> and <strong>second-best</strong> scores for each benchmark are highlighted.</figcaption> |
| | </figure> |
| | |
| | 1. **New State-of-the-Art:** Bee-8B establishes a new performance standard for fully open MLLMs, proving highly competitive with recent semi-open models across a wide array of benchmarks. |
| | 2. **Excellence in Complex Reasoning:** Thanks to the CoT-enriched Honey-Data-15M, Bee-8B shows its most significant advancements in complex math and reasoning. It achieves top scores on challenging benchmarks like **MathVerse**, **LogicVista**, and **DynaMath**. |
| | 3. **Superior Document and Chart Understanding:** The model demonstrates powerful capabilities in analyzing structured visual data, securing the top rank on the **CharXiv** benchmark for both descriptive and reasoning questions. |
| | |
| | ## Acknowledgements |
| | |
| | Bee-8B is developed based on the architectures and codebases of the following projects: [R-4B](https://huggingface.co/YannQi/R-4B), [LLaVA-OneVision](https://github.com/LLaVA-VL/LLaVA-NeXT), [SigLIP2](https://huggingface.co/google/siglip2-so400m-patch14-384), [Qwen3](https://github.com/QwenLM/Qwen3), and evaluated using [VLMEvalKit](https://github.com/open-compass/VLMEvalKit). We sincerely thank these projects for their outstanding contributions to the open-source community. |