--- license: llama3 datasets: - glaiveai/RAG-v1 language: - en tags: - RAG --- exllamav2-quantized version of Llama-3-8B-RAG-v1 from glaiveai: https://huggingface.co/glaiveai/Llama-3-8B-RAG-v1 bpw: 6.0 head-bpw: 8.0 example usage with exllamav2: ```python from exllamav2 import ExLlamaV2, ExLlamaV2Config, ExLlamaV2Cache, ExLlamaV2Tokenizer from exllamav2.generator import ExLlamaV2Sampler, ExLlamaV2DynamicGenerator model_path = /path/to/model_folder config = ExLlamaV2Config(model_path) model = ExLlamaV2(config) cache = ExLlamaV2Cache(model, max_seq_len = 4096, lazy = True) model.load_autosplit(cache, progress = True) tokenizer = ExLlamaV2Tokenizer(config) generator = ExLlamaV2DynamicGenerator( model = model, cache = cache, tokenizer = tokenizer, ) gen_settings = ExLlamaV2Sampler.Settings( temperature = 1.0, top_p = 0.1, token_repetition_penalty = 1.0 ) outputs = generator.generate( prompt = ["first input", "second input"], # string or list of strings max_new_tokens = 1024, stop_conditions = [tokenizer.eos_token_id], gen_settings = gen_settings, add_bos = True, ) print(outputs) ```