Instructions to use miqudev/miqu-1-70b with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- llama-cpp-python
How to use miqudev/miqu-1-70b with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="miqudev/miqu-1-70b", filename="miqu-1-70b.q2_K.gguf", )
llm.create_chat_completion( messages = "No input example has been defined for this model task." )
- Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- llama.cpp
How to use miqudev/miqu-1-70b with llama.cpp:
Install from brew
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf miqudev/miqu-1-70b:Q4_K_M # Run inference directly in the terminal: llama-cli -hf miqudev/miqu-1-70b:Q4_K_M
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf miqudev/miqu-1-70b:Q4_K_M # Run inference directly in the terminal: llama-cli -hf miqudev/miqu-1-70b:Q4_K_M
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf miqudev/miqu-1-70b:Q4_K_M # Run inference directly in the terminal: ./llama-cli -hf miqudev/miqu-1-70b:Q4_K_M
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf miqudev/miqu-1-70b:Q4_K_M # Run inference directly in the terminal: ./build/bin/llama-cli -hf miqudev/miqu-1-70b:Q4_K_M
Use Docker
docker model run hf.co/miqudev/miqu-1-70b:Q4_K_M
- LM Studio
- Jan
- Ollama
How to use miqudev/miqu-1-70b with Ollama:
ollama run hf.co/miqudev/miqu-1-70b:Q4_K_M
- Unsloth Studio
How to use miqudev/miqu-1-70b with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for miqudev/miqu-1-70b to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for miqudev/miqu-1-70b to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for miqudev/miqu-1-70b to start chatting
- Docker Model Runner
How to use miqudev/miqu-1-70b with Docker Model Runner:
docker model run hf.co/miqudev/miqu-1-70b:Q4_K_M
- Lemonade
How to use miqudev/miqu-1-70b with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull miqudev/miqu-1-70b:Q4_K_M
Run and chat with the model
lemonade run user.miqu-1-70b-Q4_K_M
List all available models
lemonade list
An interesting yet useless consideration over the fp16 being out or not.
I noticed something interesting : Miqu-1-70b's Q2_K size is 25.5GB.
That corresponds to recent LlamaCPP Q2_K quantizations, from january 2024, at barely 3bpw.
Previous GGUF quants Q2_K of 2023 were almost the size of a Q3_K_S, at around 3.4bpw.
So, Miqu-1-70b's Q2_K has been made in january 2024.
Either Miqudev requantized from an anterior Q5_K_M, either he quantized from a Q8_0.. or a FP16.
I'm not an expert on the internals of the GGUF format, but is there a meta-data specifying that a quant is actually a requant?
If yes, we can know.
In any case, that would lead us nowhere, but still!
considering the fact that this person was an employee of a company which had been given only the quantized versions I don't think it's possible for it to be from fp16. Either it was a requantization of Q5 or Mistral quantized it right before handing them over to the company.
When that early access was likely given, the Q2_K variant used in Miqudev's quant didn't exist yet (why to present an already obsolete product to a customer, this while you face a ferocious competition?).
Hence the interrogation.
Yeah, makes sense. I didn't realize that it was given as early access a while ago and thought it might've been given recently. I believe it was a requantization though as the Q5 was most likely the one given to them.
we could at least check if the result of q5 -> f16 -> q2 is identical to the uploaded checkpoint. if it is, it should be more than likely that it was requantized in that fashion.
All three quants have a general.name of "D:\HF", which is strong evidence that all quants are made for hf upload from something else. Edit: and in fact, all metadata kv's other than the filetype are identical.
This is the first model that could answer all my test questions (including GPT4). I wished there was a gptq or awq version (4 bit) so the speed would be more practical...