Instructions to use abideen/Phi-3-mini-4K-instruct-cpo-simpo with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use abideen/Phi-3-mini-4K-instruct-cpo-simpo with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="abideen/Phi-3-mini-4K-instruct-cpo-simpo", trust_remote_code=True) messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("abideen/Phi-3-mini-4K-instruct-cpo-simpo", trust_remote_code=True) model = AutoModelForCausalLM.from_pretrained("abideen/Phi-3-mini-4K-instruct-cpo-simpo", trust_remote_code=True) messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use abideen/Phi-3-mini-4K-instruct-cpo-simpo with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "abideen/Phi-3-mini-4K-instruct-cpo-simpo" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "abideen/Phi-3-mini-4K-instruct-cpo-simpo", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/abideen/Phi-3-mini-4K-instruct-cpo-simpo
- SGLang
How to use abideen/Phi-3-mini-4K-instruct-cpo-simpo with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "abideen/Phi-3-mini-4K-instruct-cpo-simpo" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "abideen/Phi-3-mini-4K-instruct-cpo-simpo", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "abideen/Phi-3-mini-4K-instruct-cpo-simpo" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "abideen/Phi-3-mini-4K-instruct-cpo-simpo", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use abideen/Phi-3-mini-4K-instruct-cpo-simpo with Docker Model Runner:
docker model run hf.co/abideen/Phi-3-mini-4K-instruct-cpo-simpo
Phi-3-mini-4K-instruct with CPO-SimPO
This repository contains the Phi-3-mini-128K-instruct model enhanced with the CPO-SimPO technique. CPO-SimPO combines Contrastive Preference Optimization (CPO) and Simple Preference Optimization (SimPO).
Introduction
Phi-3-mini-4K-instruct is a model optimized for instruction-based tasks. This approach has demonstrated notable improvements in key benchmarks, pushing the boundaries of AI preference learning.
What is CPO-SimPO?
CPO-SimPO is a novel technique, which combines elements from CPO and SimPO:
- Contrastive Preference Optimization (CPO): Adds a behavior cloning regularizer to ensure the model remains close to the preferred data distribution.
- Simple Preference Optimization (SimPO): Incorporates length normalization and target reward margins to prevent the generation of long but low-quality sequences.
Github
Model Performance
COMING SOON!
Key Improvements:
- Enhanced Model Performance: Significant score improvements, particularly in GSM8K (up by 8.49 points!) and TruthfulQA (up by 2.07 points).
- Quality Control: Improved generation of high-quality sequences through length normalization and reward margins.
- Balanced Optimization: The BC regularizer helps maintain the integrity of learned preferences without deviating from the preferred data distribution.
Usage
Installation
To use this model, you need to install the transformers library from Hugging Face.
pip install transformers
Inference
Here's an example of how to perform inference with the model:
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline
torch.random.manual_seed(0)
model = AutoModelForCausalLM.from_pretrained(
"Syed-Hasan-8503/Phi-3-mini-4K-instruct-cpo-simpo",
device_map="cuda",
torch_dtype="auto",
trust_remote_code=True,
)
tokenizer = AutoTokenizer.from_pretrained("Syed-Hasan-8503/Phi-3-mini-4K-instruct-cpo-simpo")
messages = [
{"role": "user", "content": "Can you provide ways to eat combinations of bananas and dragonfruits?"},
{"role": "assistant", "content": "Sure! Here are some ways to eat bananas and dragonfruits together: 1. Banana and dragonfruit smoothie: Blend bananas and dragonfruits together with some milk and honey. 2. Banana and dragonfruit salad: Mix sliced bananas and dragonfruits together with some lemon juice and honey."},
{"role": "user", "content": "What about solving an 2x + 3 = 7 equation?"},
]
pipe = pipeline(
"text-generation",
model=model,
tokenizer=tokenizer,
)
generation_args = {
"max_new_tokens": 500,
"return_full_text": False,
"temperature": 0.0,
"do_sample": False,
}
output = pipe(messages, **generation_args)
print(output[0]['generated_text'])
- Downloads last month
- 2