Phi-3-mini-4k-instruct โ€” ONNX (INT4)

INT4-quantized ONNX export of Phi-3-mini-4k-instruct, a 3.8B-parameter lightweight language model from Microsoft. Optimized for CPU inference with int4 RTN block-32 quantization.

Mirrored for use with inference4j, an inference-only AI library for Java.

Original Source

Usage with inference4j

try (TextGenerator gen = TextGenerator.builder().build()) {
    GenerationResult result = gen.generate("What is Java in one sentence?");
    System.out.println(result.text());
}

Model Details

Property Value
Architecture Phi-3 (3.8B parameters, 32 layers, 3072 hidden)
Task Text generation / chat
Context length 4096 tokens
Quantization INT4 RTN block-32 acc-level-4
Original framework PyTorch (transformers)

License

This model is licensed under the MIT License. Original model by Microsoft.

Downloads last month
18
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support