Instructions to use llmware/slim-tags-3b with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use llmware/slim-tags-3b with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="llmware/slim-tags-3b", trust_remote_code=True)# Load model directly from transformers import AutoModelForCausalLM model = AutoModelForCausalLM.from_pretrained("llmware/slim-tags-3b", trust_remote_code=True, dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use llmware/slim-tags-3b with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "llmware/slim-tags-3b" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "llmware/slim-tags-3b", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/llmware/slim-tags-3b
- SGLang
How to use llmware/slim-tags-3b with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "llmware/slim-tags-3b" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "llmware/slim-tags-3b", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "llmware/slim-tags-3b" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "llmware/slim-tags-3b", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use llmware/slim-tags-3b with Docker Model Runner:
docker model run hf.co/llmware/slim-tags-3b
SLIM-TAGS-3B
slim-tags-3b is a small, specialized function-calling model fine-tuned to extract and generate meaningful tags from a chunk of text.
Tags generally correspond to named entities, but will also include key objects, entities and phrases that contribute meaningfully to the semantic meaning of the text.
The model is invoked as a specialized 'tags' classifier function that outputs a python dictionary in the form of:
{'tags': ['NASDAQ', 'S&P', 'Dow', 'Verizon', 'Netflix, ... ']}
with the value items in the list generally being extracted from the source text.
The intended use of the model is to auto-generate tags to text that can be used to enhance search retrieval, categorization, or to extract named entities that can be used programmatically in follow-up queries or prompts. It can also be used for fact-checking as a secondary validation on a longer (separate) LLM output.
This model is fine-tuned on top of llmware/bling-stable-lm-3b-4e1t-v0, which in turn, is a fine-tune of stabilityai/stablelm-3b-4elt.
Each slim model has a 'quantized tool' version, e.g., 'slim-tags-3b-tool'.
Prompt format:
function = "classify"params = "tags"prompt = "<human> " + {text} + "\n" +
"<{function}> " + {params} + "</{function}>" + "\n<bot>:"
Transformers Script
model = AutoModelForCausalLM.from_pretrained("llmware/slim-tags-3b")
tokenizer = AutoTokenizer.from_pretrained("llmware/slim-tags-3b")
function = "classify"
params = "tags"
text = "Citibank announced a reduction in its targets for economic growth in France and the UK last week in light of ongoing concerns about inflation and unemployment, especially in large employers such as Airbus."
prompt = "<human>: " + text + "\n" + f"<{function}> {params} </{function}>\n<bot>:"
inputs = tokenizer(prompt, return_tensors="pt")
start_of_input = len(inputs.input_ids[0])
outputs = model.generate(
inputs.input_ids.to('cpu'),
eos_token_id=tokenizer.eos_token_id,
pad_token_id=tokenizer.eos_token_id,
do_sample=True,
temperature=0.3,
max_new_tokens=100
)
output_only = tokenizer.decode(outputs[0][start_of_input:], skip_special_tokens=True)
print("output only: ", output_only)
# here's the fun part
try:
output_only = ast.literal_eval(llm_string_output)
print("success - converted to python dictionary automatically")
except:
print("fail - could not convert to python dictionary automatically - ", llm_string_output)
Using as Function Call in LLMWare
from llmware.models import ModelCatalog
slim_model = ModelCatalog().load_model("llmware/slim-tags-3b")
response = slim_model.function_call(text,params=["tags"], function="classify")
print("llmware - llm_response: ", response)
Model Card Contact
Darren Oberst & llmware team
- Downloads last month
- 12