Instructions to use deepseek-ai/DeepSeek-R1-0528 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use deepseek-ai/DeepSeek-R1-0528 with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="deepseek-ai/DeepSeek-R1-0528", trust_remote_code=True)
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("deepseek-ai/DeepSeek-R1-0528", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained("deepseek-ai/DeepSeek-R1-0528", trust_remote_code=True)
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Inference
HuggingChat
Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use deepseek-ai/DeepSeek-R1-0528 with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "deepseek-ai/DeepSeek-R1-0528"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "deepseek-ai/DeepSeek-R1-0528",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/deepseek-ai/DeepSeek-R1-0528

SGLang

How to use deepseek-ai/DeepSeek-R1-0528 with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "deepseek-ai/DeepSeek-R1-0528" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "deepseek-ai/DeepSeek-R1-0528",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "deepseek-ai/DeepSeek-R1-0528" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "deepseek-ai/DeepSeek-R1-0528",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use deepseek-ai/DeepSeek-R1-0528 with Docker Model Runner:
```
docker model run hf.co/deepseek-ai/DeepSeek-R1-0528
```

Do you have deepseek-r1-0528-awq plan?

#68

by oliver0102 - opened May 29, 2025

Discussion

oliver0102

May 29, 2025

•

edited May 29, 2025

awq version runs smoothly on 8*H20, which is the most powerful grphics card i have.

tunglinwood

May 29, 2025

@v2ray from https://huggingface.co/cognitivecomputations will follow up with awq soon

erichartford

May 29, 2025

Maybe? awq has moved to vllm-compressor, and it doesn't 100% work for moe yet,
we will give it a try

adamo1139

May 30, 2025

I am doing DeepSeek-R1-Zero AWQ quants now. If that will finish and work successfully I'll try to make AWQ quants of this model.

oliver0102

May 30, 2025

Maybe? awq has moved to vllm-compressor, and it doesn't 100% work for moe yet,
we will give it a try

I am using your DeepSeek-R1-AWQ everyday. I can run up to 220 tokens/s with 8*H20, and I didn't feel any difference compared with orignal one. AWQ is a really good quantation method. So I am looking for any AWQ version released soon after this LITTLE upgrade of R1 released. DeepSeek-R1 version is good at programing but has big hallucination issue and prevent us to serve it as base model as one of our code-related agent. This one, hopefully, resolved this issue.

erichartford

May 30, 2025

I will try

adamo1139

May 31, 2025

@oliver0102 @erichartford

Here's an AWQ quant for the new DeepSeek R1-0528 - https://huggingface.co/adamo1139/DeepSeek-R1-0528-AWQ

Eric, thanks for your notes in the discussion here, they were really helpful.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment