Instructions to use HuggingFaceTB/SmolLM-135M with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use HuggingFaceTB/SmolLM-135M with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="HuggingFaceTB/SmolLM-135M")

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("HuggingFaceTB/SmolLM-135M")
model = AutoModelForCausalLM.from_pretrained("HuggingFaceTB/SmolLM-135M")

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use HuggingFaceTB/SmolLM-135M with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "HuggingFaceTB/SmolLM-135M"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "HuggingFaceTB/SmolLM-135M",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/HuggingFaceTB/SmolLM-135M

SGLang

How to use HuggingFaceTB/SmolLM-135M with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "HuggingFaceTB/SmolLM-135M" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "HuggingFaceTB/SmolLM-135M",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "HuggingFaceTB/SmolLM-135M" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "HuggingFaceTB/SmolLM-135M",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Docker Model Runner
How to use HuggingFaceTB/SmolLM-135M with Docker Model Runner:
```
docker model run hf.co/HuggingFaceTB/SmolLM-135M
```

Trapezoidal scheduler with cooldown phase

by maveriq - opened Jul 17, 2024

Discussion

maveriq

Jul 17, 2024

•

edited Aug 21, 2024

Hi. Thanks for yet another insightful contribution. I am interested in extending this work with a couple of variations that I have in mind.

Can you say a bit more about the trapezoidal LR scheduling? In particular how is it different than OneCycleLR. Secondly is the cooldown phase the same as using the 'three_phase' option of OneCycleLR? And lastly, what is the warmup percentage/steps.

Would it be possible to open-source the training pipeline as well? Training from scratch at these sizes (135M/360M), is within the reach of many practicioners/researchers and having access to complete pipeline will help in reducing confounding factors.

Thanks!

maveriq

Jul 17, 2024

•

edited Jul 17, 2024

For anyone having same questions, I found most of the answers in this paper, except for the warmup percentage/steps.

Here is a quick implementation of TrapezoidLRScheduler

eliebak

Aug 21, 2024

•

edited Aug 21, 2024

Hey! For the warmup we set it to 5000 steps, to be honest we didn't do much ablation on it, i think it don't have that much impact for very long training (might be wrong). For the training code will post it on github this week! We also have an implementation of WSD in nanotron LrSchedulerArgs.

pietrolesci

Aug 29, 2024

Just landed on this discussion as I had the same question regarding the LR schedule. I found the original implementation useful: https://github.com/epfml/schedules-and-scaling/blob/6e8b7f952420c928cc09a0e4bda9678e2bf42e5f/src/optim/utils.py#L55

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment