Instructions to use bigscience/bloomz-3b with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use bigscience/bloomz-3b with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="bigscience/bloomz-3b")# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("bigscience/bloomz-3b") model = AutoModelForCausalLM.from_pretrained("bigscience/bloomz-3b") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use bigscience/bloomz-3b with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "bigscience/bloomz-3b" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "bigscience/bloomz-3b", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/bigscience/bloomz-3b
- SGLang
How to use bigscience/bloomz-3b with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "bigscience/bloomz-3b" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "bigscience/bloomz-3b", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "bigscience/bloomz-3b" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "bigscience/bloomz-3b", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use bigscience/bloomz-3b with Docker Model Runner:
docker model run hf.co/bigscience/bloomz-3b
Are there advantages or disadvantages in changing the format for translation?
Translate to X: text text text.
vs.
Translate from Y: text text text to X.
etc...
Is the example in the card the best approach?
Also, if I want to do tuning, prompt tuning for example, what is the optimal format of the training data?
Translate from English: text text text to Spanish: spanishtext spanishtext spanishtext?
How should I format the training data?
Is the example in the card the best approach?
No most likely not. You can very likely find a better approach via prompt engineering.
what is the optimal format of the training data? How should I format the training data?
I think using a variety of formats, not a single one, will likely yield the best model.
Is the example in the card the best approach?
No most likely not. You can very likely find a better approach via prompt engineering.
what is the optimal format of the training data? How should I format the training data?
I think using a variety of formats, not a single one, will likely yield the best model.
Do you have any insight for how training data should be structured for this specific model?
I started with the idea that the source language texts would be input, and the target language corresponding texts would be labels.
But as I look at the model maybe that's not possible.
I probably have to fashion a single text sample that contains both, the source, and the target translation.
Do you have any insight how it needs to be structured?
Or if my original thought was correct, then where exactly do I stick the target translation tokens?
I recommend taking a look at the data that was used to train this model: https://huggingface.co/datasets/bigscience/xP3