Instructions to use HachiML/youri-2x7b_dev with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use HachiML/youri-2x7b_dev with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="HachiML/youri-2x7b_dev")# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("HachiML/youri-2x7b_dev") model = AutoModelForCausalLM.from_pretrained("HachiML/youri-2x7b_dev") - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use HachiML/youri-2x7b_dev with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "HachiML/youri-2x7b_dev" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "HachiML/youri-2x7b_dev", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/HachiML/youri-2x7b_dev
- SGLang
How to use HachiML/youri-2x7b_dev with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "HachiML/youri-2x7b_dev" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "HachiML/youri-2x7b_dev", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "HachiML/youri-2x7b_dev" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "HachiML/youri-2x7b_dev", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use HachiML/youri-2x7b_dev with Docker Model Runner:
docker model run hf.co/HachiML/youri-2x7b_dev
youri-2x7b_dev
This model is a Mixture of Experts (MoE) merger of the following two models:
๐ Evaluation
All scores for these benchmarks have been evaluated using the Stability-AI/lm-evaluation-harness. The results of the benchmark scores are stored in benchmark_scores. For detailed information on the scores and the conditions under which they were obtained, please refer to this link.
| Model | JCommonsenseQA(3-shot,acc.) | JNLI(3-shot,balanced acc.) | MARC-ja(0-shot,balanced acc.) | JSQuAD(2-shot,F1) | 4-AVERAGE |
|---|---|---|---|---|---|
| youri-2x7b_dev | 91.15 | 71.03 | 95.90 | 91.30 | 87.34 |
| youri-7b-instruction *1 | 88.83 | 63.56 | 93.78 | 92.19 | 84.59 |
| youri-7b-chat *1 | 91.78 | 70.35 | 96.69 | 79.62 | 84.61 |
| Model | jaqket-v2(1-shot,F1) | xlsum(1-shot,ROUGE 2) *2 | 6-AVERAGE |
|---|---|---|---|
| youri-2x7b_dev | 84.59 | 25.62 | 76.59 |
| youri-7b-instruction *1 | 83.92 | 24.67 | 75.13 |
| youri-7b-chat *1 | 83.71 | 24.21 | 75.33 |
| Model | xwinograd(0-shot,acc.) *2 | mgsm(5-shot,acc.) *2 | JCoLA(2-shot,balanced acc.) *2 | 9-AVERAGE |
|---|---|---|---|---|
| youri-2x7b_dev | 81.43 | 24.80 | 59.09 | 69.43 |
| youri-7b-instruction *1 | 78.94 | 17.20 | 54.04 | 66.35 |
| youri-7b-chat *1 | 80.92 | 25.20 | 53.78 | 67.36 |
*1 From the rinna's LM Benchmark.
*2 Since there was no mention of these template versions in rinna's LM Benchmark, the scores were calculated without specifying a template.
๐งฉ Configuration
The model has been made with a custom version of the mergekit library (mixtral branch) and the following configuration:
base_model: rinna/youri-7b-chat
gate_mode: hidden # one of "hidden", "cheap_embed", or "random"
dtype: bfloat16 # output dtype (float32, float16, or bfloat16)
experts:
- source_model: rinna/youri-7b-chat
positive_prompts:
- "่ณชๅใจๅ็ญใฎ้ธๆ่ขใๅ
ฅๅใจใใฆๅใๅใใ้ธๆ่ขใใๅ็ญใ้ธๆใใฆใใ ใใใ"
- "ๅๆใจไปฎ่ชฌใฎ้ขไฟใๅซๆใ็็พใไธญ็ซใฎไธญใใๅ็ญใใฆใใ ใใใ"
- "ไปฅไธใฎใใญในใใใใใธใใฃใใพใใฏใใฌใใฃใใฎๆๆ
ใฏใฉในใฎใใใใใซๅ้กใใฆใใ ใใใ"
- "ไปฅไธใฏใใฟในใฏใ่ชฌๆใใๆ็คบใจใๆ่ใฎใใๅ
ฅๅใฎ็ตใฟๅใใใงใใ่ฆๆฑใ้ฉๅใซๆบใใๅฟ็ญใๆธใใชใใใ"
- source_model: rinna/youri-7b-instruction
positive_prompts:
- "่ณชๅใซๅฏพใใๅ็ญใ้กๅใจๆ็ซ ใใไธ่จใงๆฝๅบใใฆใใ ใใใๅ็ญใฏๅ่ฉใง็ญใใฆใใ ใใใ"
- "ไธใใใใใใฅใผใน่จไบใ่ฆ็ดใใฆใใ ใใใ"
- "ไธใใใใๆใๆๆณ็ใงใใใใๅ็ญใใฆใใ ใใใ"
The positive_prompts in the above configuration are extracted from the instructions of benchmarks that each model excels in.
For reference on the benchmarks for each model, please see the LM Benchmark at rinna's LM Benchmark.
These benchmarks provide a detailed overview of the areas where each individual model performs particularly well, guiding the effective use of the merged model in various natural language processing tasks.
๐ป Usage
!pip install -q --upgrade transformers einops accelerate bitsandbytes
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
model_name = "HachiML/youri-2x7b_dev"
torch.set_default_device("cuda")
# Load the model and tokenizer
model = AutoModelForCausalLM.from_pretrained(
model_name,
torch_dtype="auto",
load_in_4bit=True,
trust_remote_code=True
)
tokenizer = AutoTokenizer.from_pretrained(
model_name,
trust_remote_code=True
)
torch.set_default_device("cuda")
# Create input
instruction = "ๆฌกใฎๆฅๆฌ่ชใ่ฑ่ชใซ็ฟป่จณใใฆใใ ใใใ"
input = "ๅคง่ฆๆจก่จ่ชใขใใซ๏ผใ ใใใผใใใใขใใซใ่ฑ: large language modelใLLM๏ผใฏใๅคๆฐใฎใใฉใกใผใฟ๏ผๆฐๅไธใใๆฐๅๅ๏ผใๆใคไบบๅทฅใใฅใผใฉใซใใใใฏใผใฏใงๆงๆใใใใณใณใใฅใผใฟ่จ่ชใขใใซใงใ่จๅคงใชใฉใใซใชใใใญในใใไฝฟ็จใใฆ่ชๅทฑๆๅธซใใๅญฆ็ฟใพใใฏๅๆๅธซใใๅญฆ็ฟใซใใฃใฆ่จ็ทดใ่กใใใใ"
prompt = f"""
ไปฅไธใฏใใฟในใฏใ่ชฌๆใใๆ็คบใจใๆ่ใฎใใๅ
ฅๅใฎ็ตใฟๅใใใงใใ่ฆๆฑใ้ฉๅใซๆบใใๅฟ็ญใๆธใใชใใใ
### ๆ็คบ:
{instruction}
### ๅ
ฅๅ:
{input}
### ๅฟ็ญ:
"""
# Tokenize the input string
token_ids = tokenizer.encode(prompt, add_special_tokens=False, return_tensors="pt")
# Generate text using the model
with torch.no_grad():
output_ids = model.generate(
token_ids.to(model.device),
max_new_tokens=200,
do_sample=True,
temperature=0.5,
pad_token_id=tokenizer.pad_token_id,
bos_token_id=tokenizer.bos_token_id,
eos_token_id=tokenizer.eos_token_id
)
# Decode and print the output
output = tokenizer.decode(output_ids.tolist()[0])
print(output)
- Downloads last month
- 12
Install from pip and serve model
# Install vLLM from pip: pip install vllm# Start the vLLM server: vllm serve "HachiML/youri-2x7b_dev"# Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "HachiML/youri-2x7b_dev", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'