Instructions to use nvidia/EGM-8B-SFT with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use nvidia/EGM-8B-SFT with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("image-text-to-text", model="nvidia/EGM-8B-SFT")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoProcessor, AutoModelForImageTextToText

processor = AutoProcessor.from_pretrained("nvidia/EGM-8B-SFT")
model = AutoModelForImageTextToText.from_pretrained("nvidia/EGM-8B-SFT")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
inputs = processor.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use nvidia/EGM-8B-SFT with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "nvidia/EGM-8B-SFT"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "nvidia/EGM-8B-SFT",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker

docker model run hf.co/nvidia/EGM-8B-SFT

SGLang

How to use nvidia/EGM-8B-SFT with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "nvidia/EGM-8B-SFT" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "nvidia/EGM-8B-SFT",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "nvidia/EGM-8B-SFT" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "nvidia/EGM-8B-SFT",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Docker Model Runner
How to use nvidia/EGM-8B-SFT with Docker Model Runner:
```
docker model run hf.co/nvidia/EGM-8B-SFT
```

EGM-Qwen3-VL-8B-SFT

[Project Page] [Code]

Model Summary

EGM-Qwen3-VL-8B-SFT is the supervised fine-tuning (SFT) checkpoint from the first stage of the EGM (Efficient Visual Grounding Language Models) training pipeline. It is built on top of Qwen3-VL-8B-Thinking.

This is an intermediate checkpoint intended for further reinforcement learning training. For the final model with best performance, see nvidia/EGM-8B.

Training Details

SFT Stage

In the SFT stage, a proprietary VLM generates detailed chain-of-thought reasoning steps for visual grounding training data. The base Qwen3-VL-8B-Thinking model is then fine-tuned on this reasoning-augmented data to learn structured visual grounding with explicit reasoning.

This SFT checkpoint serves as the initialization for the subsequent RL stage (GRPO), which yields the final EGM-8B model.

How to Use for RL Training

pip install -U huggingface_hub
huggingface-cli download nvidia/EGM-8B-SFT --local-dir ./models/EGM-8B-SFT

Then follow the installation instructions in the EGM repository, prepare the RL data and start training:

export BASE_DIR=$(pwd)
export MODEL_PATH="${BASE_DIR}/models/EGM-8B-SFT"
export OUTPUT_DIR="${BASE_DIR}/checkpoint/"
export DATA_DIR="${BASE_DIR}/data/EGM_Datasets/processed_rl_data/"

cd verl
bash scripts/grounding_qwen.sh

See the EGM repository for full RL training instructions.

Model Architecture

Component	Details
Architecture	Qwen3VLForConditionalGeneration
Precision	bfloat16
Text Hidden Size	4096
Text Layers	36
Attention Heads	32 (8 KV heads)
Text Intermediate Size	12,288
Vision Hidden Size	1152
Vision Layers	27
Patch Size	16 x 16
Max Position Embeddings	262,144
Vocabulary Size	151,936

Related Models

Model	Description
nvidia/EGM-8B	Final RL-trained model (best performance)
nvidia/EGM-4B-SFT	SFT checkpoint for the 4B variant
nvidia/EGM-4B	Final RL-trained 4B model

Citation

@article{zhan2026EGM,
    author = {Zhan, Guanqi and Li, Changye and Liu, Zhijian and Lu, Yao and Wu, Yi and Han, Song and Zhu, Ligeng},
    title = {EGM: Efficient Visual Grounding Language Models},
    booktitle = {arXiv},
    year = {2026}
}

Acknowledgment

This repository benefits from Qwen3-VL, InternVL, verl and verl-internvl.

Downloads last month: 262

Safetensors

Model size

770k params

Tensor type

BF16

Model tree for nvidia/EGM-8B-SFT

Base model

Qwen/Qwen3-VL-8B-Thinking

Finetuned

(48)

this model

Collection including nvidia/EGM-8B-SFT

NVIDIA EGM

Collection

Efficient Grounding Models • 4 items • Updated 8 days ago • 8