Text Generation
Transformers
Safetensors
Japanese
mixtral
Mixture of Experts
text-generation-inference
Instructions to use HachiML/youri-2x7b_dev with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use HachiML/youri-2x7b_dev with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="HachiML/youri-2x7b_dev")# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("HachiML/youri-2x7b_dev") model = AutoModelForCausalLM.from_pretrained("HachiML/youri-2x7b_dev") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use HachiML/youri-2x7b_dev with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "HachiML/youri-2x7b_dev" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "HachiML/youri-2x7b_dev", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/HachiML/youri-2x7b_dev
- SGLang
How to use HachiML/youri-2x7b_dev with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "HachiML/youri-2x7b_dev" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "HachiML/youri-2x7b_dev", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "HachiML/youri-2x7b_dev" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "HachiML/youri-2x7b_dev", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use HachiML/youri-2x7b_dev with Docker Model Runner:
docker model run hf.co/HachiML/youri-2x7b_dev
| license: llama2 | |
| language: | |
| - ja | |
| tags: | |
| - moe | |
| # youri-2x7b_dev | |
| This model is a Mixture of Experts (MoE) merger of the following two models: | |
| - [rinna/youri-7b-instruction](https://huggingface.co/rinna/youri-7b-instruction) | |
| - [rinna/youri-7b-chat](https://huggingface.co/rinna/youri-7b-chat) | |
| ## 🏆 Evaluation | |
| All scores for these benchmarks have been evaluated using the [Stability-AI/lm-evaluation-harness](https://github.com/Stability-AI/lm-evaluation-harness/tree/jp-stable). | |
| The results of the benchmark scores are stored in [benchmark_scores](https://huggingface.co/HachiML/youri-2x7b_dev/tree/main/benchmark_scores). | |
| For detailed information on the scores and the conditions under which they were obtained, please refer to this link. | |
| | Model |JCommonsenseQA(3-shot,acc.)|JNLI(3-shot,balanced acc.)|MARC-ja(0-shot,balanced acc.)|JSQuAD(2-shot,F1)|4-AVERAGE| | |
| |----------------------------------------------------------------|------:|------:|---------:|-------:|------:| | |
| |[**youri-2x7b_dev**](https://huggingface.co/HachiML/youri-2x7b_dev)| **91.15**| **71.03**| **95.90**| **91.30**| **87.34**| | |
| |[youri-7b-instruction](https://huggingface.co/rinna/youri-7b-instruction) *1| 88.83| 63.56| 93.78| 92.19| 84.59| | |
| |[youri-7b-chat](https://huggingface.co/rinna/youri-7b-chat) *1| 91.78| 70.35| 96.69| 79.62| 84.61| | |
| | Model |jaqket-v2(1-shot,F1)|xlsum(1-shot,ROUGE 2) *2|6-AVERAGE| | |
| |----------------------------------------------------------------|------:|------:|------:| | |
| |[**youri-2x7b_dev**](https://huggingface.co/HachiML/youri-2x7b_dev)| **84.59**| **25.62**| **76.59**| | |
| |[youri-7b-instruction](https://huggingface.co/rinna/youri-7b-instruction) *1| 83.92| 24.67| 75.13| | |
| |[youri-7b-chat](https://huggingface.co/rinna/youri-7b-chat) *1| 83.71| 24.21| 75.33| | |
| | Model |xwinograd(0-shot,acc.) *2|mgsm(5-shot,acc.) *2|JCoLA(2-shot,balanced acc.) *2|9-AVERAGE| | |
| |----------------------------------------------------------------|------:|------:|---------:|------:| | |
| |[**youri-2x7b_dev**](https://huggingface.co/HachiML/youri-2x7b_dev)| **81.43**| **24.80**| **59.09**| **69.43**| | |
| |[youri-7b-instruction](https://huggingface.co/rinna/youri-7b-instruction) *1| 78.94 | 17.20| 54.04| 66.35| | |
| |[youri-7b-chat](https://huggingface.co/rinna/youri-7b-chat) *1| 80.92| 25.20| 53.78| 67.36| | |
| *1 From the [rinna's LM Benchmark](https://rinnakk.github.io/research/benchmarks/lm/index.html). | |
| *2 Since there was no mention of these template versions in rinna's LM Benchmark, the scores were calculated without specifying a template. | |
| ## 🧩 Configuration | |
| The model has been made with a custom version of the [mergekit](https://github.com/cg123/mergekit) library (mixtral branch) and the following configuration: | |
| ```yaml | |
| base_model: rinna/youri-7b-chat | |
| gate_mode: hidden # one of "hidden", "cheap_embed", or "random" | |
| dtype: bfloat16 # output dtype (float32, float16, or bfloat16) | |
| experts: | |
| - source_model: rinna/youri-7b-chat | |
| positive_prompts: | |
| - "質問と回答の選択肢を入力として受け取り、選択肢から回答を選択してください。" | |
| - "前提と仮説の関係を含意、矛盾、中立の中から回答してください。" | |
| - "以下のテキストを、ポジティブまたはネガティブの感情クラスのいずれかに分類してください。" | |
| - "以下は、タスクを説明する指示と、文脈のある入力の組み合わせです。要求を適切に満たす応答を書きなさい。" | |
| - source_model: rinna/youri-7b-instruction | |
| positive_prompts: | |
| - "質問に対する回答を題名と文章から一言で抽出してください。回答は名詞で答えてください。" | |
| - "与えられたニュース記事を要約してください。" | |
| - "与えられた文が文法的であるかを回答してください。" | |
| ``` | |
| The `positive_prompts` in the above configuration are extracted from the instructions of benchmarks that each model excels in. | |
| For reference on the benchmarks for each model, please see the LM Benchmark at [rinna's LM Benchmark](https://rinnakk.github.io/research/benchmarks/lm/index.html). | |
| These benchmarks provide a detailed overview of the areas where each individual model performs particularly well, guiding the effective use of the merged model in various natural language processing tasks. | |
| ## 💻 Usage | |
| ```python | |
| !pip install -q --upgrade transformers einops accelerate bitsandbytes | |
| import torch | |
| from transformers import AutoModelForCausalLM, AutoTokenizer | |
| model_name = "HachiML/youri-2x7b_dev" | |
| torch.set_default_device("cuda") | |
| # Load the model and tokenizer | |
| model = AutoModelForCausalLM.from_pretrained( | |
| model_name, | |
| torch_dtype="auto", | |
| load_in_4bit=True, | |
| trust_remote_code=True | |
| ) | |
| tokenizer = AutoTokenizer.from_pretrained( | |
| model_name, | |
| trust_remote_code=True | |
| ) | |
| torch.set_default_device("cuda") | |
| # Create input | |
| instruction = "次の日本語を英語に翻訳してください。" | |
| input = "大規模言語モデル(だいきぼげんごモデル、英: large language model、LLM)は、多数のパラメータ(数千万から数十億)を持つ人工ニューラルネットワークで構成されるコンピュータ言語モデルで、膨大なラベルなしテキストを使用して自己教師あり学習または半教師あり学習によって訓練が行われる。" | |
| prompt = f""" | |
| 以下は、タスクを説明する指示と、文脈のある入力の組み合わせです。要求を適切に満たす応答を書きなさい。 | |
| ### 指示: | |
| {instruction} | |
| ### 入力: | |
| {input} | |
| ### 応答: | |
| """ | |
| # Tokenize the input string | |
| token_ids = tokenizer.encode(prompt, add_special_tokens=False, return_tensors="pt") | |
| # Generate text using the model | |
| with torch.no_grad(): | |
| output_ids = model.generate( | |
| token_ids.to(model.device), | |
| max_new_tokens=200, | |
| do_sample=True, | |
| temperature=0.5, | |
| pad_token_id=tokenizer.pad_token_id, | |
| bos_token_id=tokenizer.bos_token_id, | |
| eos_token_id=tokenizer.eos_token_id | |
| ) | |
| # Decode and print the output | |
| output = tokenizer.decode(output_ids.tolist()[0]) | |
| print(output) | |
| ``` |