---
license: llama3
datasets:
- glaiveai/RAG-v1
language:
- en
tags:
- RAG
---

exllamav2-quantized version of Llama-3-8B-RAG-v1 from glaiveai: https://huggingface.co/glaiveai/Llama-3-8B-RAG-v1
bpw: 6.0
head-bpw: 8.0


example usage with exllamav2:

```python

from exllamav2 import ExLlamaV2, ExLlamaV2Config, ExLlamaV2Cache, ExLlamaV2Tokenizer
from exllamav2.generator import ExLlamaV2Sampler, ExLlamaV2DynamicGenerator

model_path = /path/to/model_folder

config = ExLlamaV2Config(model_path)
model = ExLlamaV2(config)
cache = ExLlamaV2Cache(model, max_seq_len = 4096, lazy = True)
model.load_autosplit(cache, progress = True)
tokenizer = ExLlamaV2Tokenizer(config)

generator = ExLlamaV2DynamicGenerator(
    model = model,
    cache = cache,
    tokenizer = tokenizer,
)


gen_settings = ExLlamaV2Sampler.Settings(
    temperature = 1.0, 
    top_p = 0.1,
    token_repetition_penalty = 1.0
)

outputs = generator.generate(
    prompt = ["first input", "second input"], # string or list of strings
    max_new_tokens = 1024,
    stop_conditions = [tokenizer.eos_token_id],
    gen_settings = gen_settings,
    add_bos = True,
)

print(outputs)

```