Tobias Kerner
init
542d442
metadata
license: llama3
datasets:
  - glaiveai/RAG-v1
language:
  - en
tags:
  - RAG

exllamav2-quantized version of Llama-3-8B-RAG-v1 from glaiveai: https://huggingface.co/glaiveai/Llama-3-8B-RAG-v1 bpw: 6.0 head-bpw: 8.0

example usage with exllamav2:


from exllamav2 import ExLlamaV2, ExLlamaV2Config, ExLlamaV2Cache, ExLlamaV2Tokenizer
from exllamav2.generator import ExLlamaV2Sampler, ExLlamaV2DynamicGenerator

model_path = /path/to/model_folder

config = ExLlamaV2Config(model_path)
model = ExLlamaV2(config)
cache = ExLlamaV2Cache(model, max_seq_len = 4096, lazy = True)
model.load_autosplit(cache, progress = True)
tokenizer = ExLlamaV2Tokenizer(config)

generator = ExLlamaV2DynamicGenerator(
    model = model,
    cache = cache,
    tokenizer = tokenizer,
)



gen_settings = ExLlamaV2Sampler.Settings(
    temperature = 1.0, 
    top_p = 0.1,
    token_repetition_penalty = 1.0
)

outputs = generator.generate(
    prompt = ["first input", "second input"], # string or list of strings
    max_new_tokens = 1024,
    stop_conditions = [tokenizer.eos_token_id],
    gen_settings = gen_settings,
    add_bos = True,
)

print(outputs)