Instructions to use Efficient-Large-Model/VILA-7b with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Efficient-Large-Model/VILA-7b with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="Efficient-Large-Model/VILA-7b")# Load model directly from transformers import AutoModelForCausalLM model = AutoModelForCausalLM.from_pretrained("Efficient-Large-Model/VILA-7b", dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use Efficient-Large-Model/VILA-7b with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "Efficient-Large-Model/VILA-7b" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Efficient-Large-Model/VILA-7b", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/Efficient-Large-Model/VILA-7b
- SGLang
How to use Efficient-Large-Model/VILA-7b with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "Efficient-Large-Model/VILA-7b" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Efficient-Large-Model/VILA-7b", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "Efficient-Large-Model/VILA-7b" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Efficient-Large-Model/VILA-7b", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use Efficient-Large-Model/VILA-7b with Docker Model Runner:
docker model run hf.co/Efficient-Large-Model/VILA-7b
License Compatibility
Hi, thank you for releasing this model!
I have a question regarding the licensing structure of the model.
From the README, I understand that:
- The code is licensed under Apache-2.0
- The pretrained weights are licensed under CC-BY-NC-SA-4.0
- The model license tag indicates CC-BY-NC-4.0 as the overall model license
- Additionally, the model is subject to the LLaMA model license
However, I noticed that the weights are released under CC-BY-NC-SA-4.0, which includes a ShareAlike (SA) requirement, while the model-level license (CC-BY-NC-4.0) does not include this clause.
It seems that downstream use would need to comply with the most restrictive terms among the components. In particular, the SA clause typically requires downstream derivatives to adopt the same license terms.
So I’m wondering:
- Is the CC-BY-NC-4.0 tag intended to represent the license of the entire model, including the weights?
- How should the ShareAlike requirement from the weights license be interpreted and enforced for downstream use of the model?
- What is the role of the LLaMA license in this setup? Is it a binding upstream license that must be preserved or propagated in downstream distributions, and how does it interact with the CC licenses?
I would really appreciate any clarification. Thanks!