Instructions to use cerebras/btlm-3b-8k-base with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use cerebras/btlm-3b-8k-base with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="cerebras/btlm-3b-8k-base", trust_remote_code=True)# Load model directly from transformers import AutoModelForCausalLM model = AutoModelForCausalLM.from_pretrained("cerebras/btlm-3b-8k-base", trust_remote_code=True, dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use cerebras/btlm-3b-8k-base with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "cerebras/btlm-3b-8k-base" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "cerebras/btlm-3b-8k-base", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/cerebras/btlm-3b-8k-base
- SGLang
How to use cerebras/btlm-3b-8k-base with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "cerebras/btlm-3b-8k-base" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "cerebras/btlm-3b-8k-base", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "cerebras/btlm-3b-8k-base" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "cerebras/btlm-3b-8k-base", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use cerebras/btlm-3b-8k-base with Docker Model Runner:
docker model run hf.co/cerebras/btlm-3b-8k-base
Adding `safetensors` variant of this model
#27 opened over 2 years ago
by
SFconvertbot
Do we have a plan on posting the evaluation results to `open_llm_leaderboard`
3
#26 opened over 2 years ago
by
mpsk
Context length schedule and performance
3
#25 opened over 2 years ago
by
baffo32
Adding `safetensors` variant of this model
1
#24 opened over 2 years ago
by
SFconvertbot
HF version
#23 opened almost 3 years ago
by
edmond
Pretraining hyperparameters?
#21 opened almost 3 years ago
by
PY007
How to run on colab's CPU?
1
#20 opened almost 3 years ago
by
deepakkaura26
Qlora finetuning
1
#19 opened almost 3 years ago
by
TinyPixel
Why need get_mup_param_groups instead of default one in Huggingface?
#18 opened almost 3 years ago
by
sanqiang
No Cuda Information / nvidia-smi / nvtop
1
#17 opened almost 3 years ago
by
nudelbrot
How to reproduce quantized memory usage?
6
#16 opened almost 3 years ago
by
tarasglek
What is the inference time? On my Apple M1 Max completions take > 6 min
9
#15 opened almost 3 years ago
by
vedtam
Fine-tuning on coding tasks
1
#14 opened almost 3 years ago
by
sgaseretto
Your 3b model is very exciting and proves that data improvement works!
#13 opened almost 3 years ago
by
win10
Any plans on releasing GPTQ or GGML versions of this?
👍 6
4
#12 opened almost 3 years ago
by
FriendlyVisage
why we can not make this fully HF ready?
8
#11 opened almost 3 years ago
by
CUIGuy
LoraConfig's target_modul with peft ?
8
#10 opened almost 3 years ago
by
Handgun1773
include fastchat-t5 in the benchmark which is also 3B parameter
👍 3
#9 opened almost 3 years ago
by
vasilee
Recommendations for additional pretraining?
4
#8 opened almost 3 years ago
by
ZQ-Dev