Instructions to use perplexity-ai/pplx-embed-v1-4b with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- sentence-transformers
How to use perplexity-ai/pplx-embed-v1-4b with sentence-transformers:
from sentence_transformers import SentenceTransformer model = SentenceTransformer("perplexity-ai/pplx-embed-v1-4b", trust_remote_code=True) sentences = [ "The weather is lovely today.", "It's so sunny outside!", "He drove to the stadium." ] embeddings = model.encode(sentences) similarities = model.similarity(embeddings, embeddings) print(similarities.shape) # [3, 3] - Notebooks
- Google Colab
- Kaggle
Can't serve the model using TEI
I am trying to serve pplx-embed-v1-4B using HuggingeFace TEI (Text Embedding Inference), using following command:text-embeddings-router-80 --port 3114 --model-id perplexity-ai/pplx-embed-v1-4B --dtype float32 --max-batch-tokens 8096 --max-client-batch-size 2
inside container created from the latest TEI docker image (huggingface/text-embeddings-inference:cuda-1.9.2)
but I am getting following error:
2026-03-03T20:18:15.360847Z INFO text_embeddings_router: router/src/main.rs:216: Args { model_id: "per*******-**/****-*****-*1-4B", revision: None, tokenization_workers: None, dtype: Some(Float32), served_model_name: None, pooling: None, max_concurrent_requests: 512, max_batch_tokens: 8096, max_batch_requests: None, max_client_batch_size: 2, auto_truncate: true, default_prompt_name: None, default_prompt: None, dense_path: None, hf_api_token: None, hf_token: None, hostname: "run1067882-tei-pplx-embed-v1-4b-s1", port: 3114, uds_path: "/tmp/text-embeddings-inference-server", huggingface_hub_cache: Some("/group-volume/KR/cache/huggingface/hub"), payload_limit: 2000000, api_key: None, json_output: false, disable_spans: false, otlp_endpoint: None, otlp_service_name: "text-embeddings-inference.server", prometheus_port: 9000, cors_allow_origin: None }
2026-03-03T20:18:15.361376Z INFO hf_hub: /root/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/hf-hub-0.4.2/src/lib.rs:72: Using token file found "/group-volume/KR/cache/huggingface/token" 2026-03-03T20:18:15.475098Z INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:42: Starting download
2026-03-03T20:18:15.475109Z INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:18: Downloading `1_Pooling/config.json`
2026-03-03T20:18:15.477242Z INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:18: Downloading `sentence_bert_config.json`
2026-03-03T20:18:15.963924Z INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:18: Downloading `sentence_roberta_config.json`
2026-03-03T20:18:16.390802Z INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:18: Downloading `sentence_distilbert_config.json`2026-03-03T20:18:16.822889Z INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:18: Downloading `sentence_camembert_config.json`
2026-03-03T20:18:17.267219Z INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:18: Downloading `sentence_albert_config.json`2026-03-03T20:18:17.727894Z INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:18: Downloading `sentence_xlm-roberta_config.json`
2026-03-03T20:18:18.182772Z INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:18: Downloading `sentence_xlnet_config.json`
2026-03-03T20:18:18.639727Z INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:18: Downloading `config_sentence_transformers.json`
2026-03-03T20:18:19.087385Z WARN download_artifacts: text_embeddings_core::download: core/src/download.rs:65: Download failed: request error: HTTP status client error (404 Not Found) for url (https://huggingface.co/perplexity-ai/pplx-embed-v1-4b/resolve/main/config_sentence_transformers.json)
2026-03-03T20:18:19.087400Z INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:18: Downloading `config.json`
2026-03-03T20:18:19.088214Z INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:18: Downloading `tokenizer.json`
Error: Could not download model artifacts
Caused by:
0: request error: HTTP status client error (404 Not Found) for url (https://huggingface.co/perplexity-ai/pplx-embed-v1-4b/resolve/main/tokenizer.json)
1: HTTP status client error (404 Not Found) for url (https://huggingface.co/perplexity-ai/pplx-embed-v1-4b/resolve/main/tokenizer.json)1772569097728
(END)
Summary: TEI expects tokenizer.json for fast tokenizers, but pplx-embed-v1-4b does not contain it, it only contains tokenizer_config.json file
Should I use different TEI version?
README on the model card clearly states the model should work with TEI.
Ok, seems like TEI doesn't really ship version 1.9.2 in docker image with version 1.9.2:
docker run --entrypoint bash ghcr.io/huggingface/text-embeddings-inference:cpu-1.9.2 -c "text-embeddings-router --version"
text-embeddings-router 1.9.1
huh...
I got an errorcalled `Result::unwrap()` on an `Err` value: DriverError(CUDA_ERROR_ILLEGAL_ADDRESS, "an illegal memory access was encountered")
so checked the version of TEI, and its 1.9.1 instead of 1.9.2
Hey @TomaszZietkiewicz it's indeed version v1.9.2 despite the inner version saying it's v1.9.1, that's due to an issue when updating the crate version but the underlying version is indeed v1.9.2, it's just that it was not properly updated!
As per the missing tokenizer.json you're right, maybe cc @bowang0911 in case we can include it here too?
Finally, as per your error @juni3227 (see the clarification on the version above) it's likely that it's due to OOM, but if you could provide more information on which hardware and command are you using that'd be great (feel free to as well report this on https://github.com/huggingface/text-embeddings-inference/issues/new if applicable).
Thanks and apologies for the inconveniences! 🤗
@alvarobartt Our hardware is blackwell pro 6000, so Its definatively not OOM.
@mkrimmel-pplx will check if it works now.
Thanks for letting us know @juni3227 , make sure to run with latest Text Embeddings Inference v1.9.3 and feel free to report back!
Note that if you're running with Docker, we've released Blackwell specific containers e.g., ghcr.io/huggingface/text-embeddings-inference:120-1.9.3 (120 there stands for the compute capability for your Blackwell Pro 6000 instance i.e., 12.0) 🤗
One good news and a bad news.
The good news is that there is no longer version missmatch for TEI versioning.
The bad news is that, Blackwell Pro 6000 with 96GiB has OOM error for 0.6B model.
.... Somthing is not right. We are having OOM issue on this small model. Ran on Blackwell, so... how 96GiB is not enough for this small model? Not an exact size, but shouldn't this model takes up around 16GiB for 32k context?
docker run --gpus 1
-p 48233:80
-v $HOME/.cache/huggingface:/data
--shm-size 1g
ghcr.io/huggingface/text-embeddings-inference:120-1.9.3
--model-id perplexity-ai/pplx-embed-v1-0.6b
--max-batch-tokens 32768
--dtype float32
--hostname 0.0.0.0
Error Message
Unable to find image 'ghcr.io/huggingface/text-embeddings-inference:120-1.9.3' locally 120-1.9.3: Pulling from huggingface/text-embeddings-inference 32f112e3802c: Already exists 644e9b203583: Already exists 02559cd4bc8d: Already exists 2cd52cbb1ebe: Already exists 6e8af4fd0a07: Already exists 15a17189b2df: Already exists 02cb0e091e33: Already exists 9c3d619183d2: Already exists 7f7602a82106: Already exists 5d0fd49fa0be: Pull complete accfe16c0f24: Pull complete 218b749184cd: Pull complete Digest: sha256:aedf3b34836dc57289583142adcf2b93836cda0736ac8e6ce43691b9c2c67170 Status: Downloaded newer image for ghcr.io/huggingface/text-embeddings-inference:120-1.9.3 2026-05-27T03:57:20.464298Z INFO text_embeddings_router: router/src/main.rs:216: Args { model_id: "per*******-**/****-*****-**-0.6b", revision: None, tokenization_workers: None, dtype: Some(Float32), served_model_name: None, pooling: None, max_concurrent_requests: 512, max_batch_tokens: 32768, max_batch_requests: None, max_client_batch_size: 32, auto_truncate: true, default_prompt_name: None, default_prompt: None, dense_path: None, hf_api_token: None, hf_token: None, hostname: "0.0.0.0", port: 80, uds_path: "/tmp/text-embeddings-inference-server", huggingface_hub_cache: Some("/data"), payload_limit: 2000000, api_key: None, json_output: false, disable_spans: false, otlp_endpoint: None, otlp_service_name: "text-embeddings-inference.server", prometheus_port: 9000, cors_allow_origin: None } 2026-05-27T03:57:20.552955Z INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:42: Starting download 2026-05-27T03:57:20.552968Z INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:18: Downloading `1_Pooling/config.json` 2026-05-27T03:57:20.553005Z INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:18: Downloading `sentence_bert_config.json` 2026-05-27T03:57:20.793526Z INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:18: Downloading `sentence_roberta_config.json` 2026-05-27T03:57:20.996777Z INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:18: Downloading `sentence_distilbert_config.json` 2026-05-27T03:57:21.209453Z INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:18: Downloading `sentence_camembert_config.json` 2026-05-27T03:57:21.414666Z INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:18: Downloading `sentence_albert_config.json` 2026-05-27T03:57:21.629390Z INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:18: Downloading `sentence_xlm-roberta_config.json` 2026-05-27T03:57:21.842890Z INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:18: Downloading `sentence_xlnet_config.json` 2026-05-27T03:57:22.059903Z INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:18: Downloading `config_sentence_transformers.json` 2026-05-27T03:57:22.264187Z WARN download_artifacts: text_embeddings_core::download: core/src/download.rs:65: Download failed: request error: HTTP status client error (404 Not Found) for url (https://huggingface.co/perplexity-ai/pplx-embed-v1-0.6b/resolve/main/config_sentence_transformers.json) 2026-05-27T03:57:22.264219Z INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:18: Downloading `config.json` 2026-05-27T03:57:22.264317Z INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:18: Downloading `tokenizer.json` 2026-05-27T03:57:22.264381Z INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:72: Model artifacts downloaded in 1.711427596s 2026-05-27T03:57:22.586544Z WARN text_embeddings_router: router/src/lib.rs:203: Could not find a Sentence Transformers config 2026-05-27T03:57:22.586560Z INFO text_embeddings_router: router/src/lib.rs:221: Maximum number of tokens per request: 32768 2026-05-27T03:57:22.586703Z INFO text_embeddings_core::tokenization: core/src/tokenization.rs:38: Starting 31 tokenization workers 2026-05-27T03:57:22.586826Z INFO text_embeddings_router: router/src/lib.rs:271: Starting model backend 2026-05-27T03:57:22.586832Z INFO text_embeddings_backend: backends/src/lib.rs:595: Downloading `model.safetensors` 2026-05-27T03:57:22.586860Z INFO text_embeddings_backend: backends/src/lib.rs:430: Model weights downloaded in 28.544µs 2026-05-27T03:57:22.586868Z INFO download_dense_modules: text_embeddings_backend: backends/src/lib.rs:766: Downloading `modules.json` 2026-05-27T03:57:22.586898Z WARN download_dense_modules: text_embeddings_backend: backends/src/lib.rs:851: `modules.json` could be downloaded but parsing the modules failed: unknown variant `st_quantize.FlexibleQuantizer`, expected one of `sentence_transformers.models.Dense`, `sentence_transformers.models.Normalize`, `sentence_transformers.models.Pooling`, `sentence_transformers.models.Transformer` at line 18 column 43; so no Dense modules will be downloaded. 2026-05-27T03:57:22.586906Z INFO text_embeddings_backend: backends/src/lib.rs:442: Dense modules downloaded in 41.317µs 2026-05-27T03:57:23.018144Z INFO text_embeddings_backend_candle: backends/candle/src/lib.rs:544: Starting Pplx1 model on Cuda(CudaDevice(DeviceId(1))) 2026-05-27T03:57:37.426117Z INFO text_embeddings_router: router/src/lib.rs:289: Warming up modelthread '' (126) panicked at /root/.cargo/git/checkouts/cudarc-a338a6fe3117fe87/8b4f18b/src/driver/safe/core.rs:283:26:
called Result::unwrap() on an Err value: DriverError(CUDA_ERROR_OUT_OF_MEMORY, "out of memory")
note: run with RUST_BACKTRACE=1 environment variable to display a backtrace