Instructions to use miqudev/miqu-1-70b with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use miqudev/miqu-1-70b with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="miqudev/miqu-1-70b",
	filename="miqu-1-70b.q2_K.gguf",
)

llm.create_chat_completion(
	messages = "No input example has been defined for this model task."
)

Notebooks
Google Colab
Kaggle
Local Apps Settings

llama.cpp

How to use miqudev/miqu-1-70b with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf miqudev/miqu-1-70b:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf miqudev/miqu-1-70b:Q4_K_M

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf miqudev/miqu-1-70b:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf miqudev/miqu-1-70b:Q4_K_M

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf miqudev/miqu-1-70b:Q4_K_M
# Run inference directly in the terminal:
./llama-cli -hf miqudev/miqu-1-70b:Q4_K_M

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf miqudev/miqu-1-70b:Q4_K_M
# Run inference directly in the terminal:
./build/bin/llama-cli -hf miqudev/miqu-1-70b:Q4_K_M

Use Docker

docker model run hf.co/miqudev/miqu-1-70b:Q4_K_M

LM Studio
Jan
Ollama
How to use miqudev/miqu-1-70b with Ollama:
```
ollama run hf.co/miqudev/miqu-1-70b:Q4_K_M
```

Unsloth Studio

How to use miqudev/miqu-1-70b with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for miqudev/miqu-1-70b to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for miqudev/miqu-1-70b to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for miqudev/miqu-1-70b to start chatting

Docker Model Runner
How to use miqudev/miqu-1-70b with Docker Model Runner:
```
docker model run hf.co/miqudev/miqu-1-70b:Q4_K_M
```

Lemonade

How to use miqudev/miqu-1-70b with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull miqudev/miqu-1-70b:Q4_K_M

Run and chat with the model

lemonade run user.miqu-1-70b-Q4_K_M

List all available models

lemonade list

An interesting yet useless consideration over the fp16 being out or not.

#21

by Nexesenex - opened Feb 2, 2024

Discussion

Nexesenex

Feb 2, 2024

I noticed something interesting : Miqu-1-70b's Q2_K size is 25.5GB.

That corresponds to recent LlamaCPP Q2_K quantizations, from january 2024, at barely 3bpw.
Previous GGUF quants Q2_K of 2023 were almost the size of a Q3_K_S, at around 3.4bpw.
So, Miqu-1-70b's Q2_K has been made in january 2024.

Either Miqudev requantized from an anterior Q5_K_M, either he quantized from a Q8_0.. or a FP16.

I'm not an expert on the internals of the GGUF format, but is there a meta-data specifying that a quant is actually a requant?
If yes, we can know.

In any case, that would lead us nowhere, but still!

Anthonyg5005

Feb 2, 2024

considering the fact that this person was an employee of a company which had been given only the quantized versions I don't think it's possible for it to be from fp16. Either it was a requantization of Q5 or Mistral quantized it right before handing them over to the company.

Nexesenex

Feb 2, 2024

When that early access was likely given, the Q2_K variant used in Miqudev's quant didn't exist yet (why to present an already obsolete product to a customer, this while you face a ferocious competition?).
Hence the interrogation.

Anthonyg5005

Feb 2, 2024

Yeah, makes sense. I didn't realize that it was given as early access a while ago and thought it might've been given recently. I believe it was a requantization though as the Q5 was most likely the one given to them.

152334H

Feb 3, 2024

we could at least check if the result of q5 -> f16 -> q2 is identical to the uploaded checkpoint. if it is, it should be more than likely that it was requantized in that fashion.

mradermacher

Feb 4, 2024

•

edited Feb 4, 2024

All three quants have a general.name of "D:\HF", which is strong evidence that all quants are made for hf upload from something else. Edit: and in fact, all metadata kv's other than the filetype are identical.

exito100

Feb 9, 2024

This is the first model that could answer all my test questions (including GPT4). I wished there was a gptq or awq version (4 bit) so the speed would be more practical...

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment