Upload folder using huggingface_hub

Browse files

Files changed (8) hide show

LICENSE +24 -0
Qwen3-0.6B-Decode-4bit.mlpackage/Data/com.apple.CoreML/model.mlmodel +3 -0
Qwen3-0.6B-Decode-4bit.mlpackage/Data/com.apple.CoreML/weights/weight.bin +3 -0
Qwen3-0.6B-Decode-4bit.mlpackage/Manifest.json +18 -0
Qwen3-0.6B-Prefill-4bit.mlpackage/Data/com.apple.CoreML/model.mlmodel +3 -0
Qwen3-0.6B-Prefill-4bit.mlpackage/Data/com.apple.CoreML/weights/weight.bin +3 -0
Qwen3-0.6B-Prefill-4bit.mlpackage/Manifest.json +18 -0
README.md +257 -0

LICENSE ADDED Viewed

	@@ -0,0 +1,24 @@

+                                 Apache License
+                           Version 2.0, January 2004
+                        http://www.apache.org/licenses/
+Copyright 2025 SMKRV
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+    http://www.apache.org/licenses/LICENSE-2.0
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+---
+This repository contains CoreML models derived from the Qwen3-0.6B model
+by Alibaba Cloud (Qwen Team), which is also licensed under Apache License 2.0.
+Original model: https://huggingface.co/Qwen/Qwen3-0.6B

Qwen3-0.6B-Decode-4bit.mlpackage/Data/com.apple.CoreML/model.mlmodel ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:8e36d8a0c94dd424a09af56bf356bf7e5b349def7493c1a35c13828b379f7d48
+size 908616

Qwen3-0.6B-Decode-4bit.mlpackage/Data/com.apple.CoreML/weights/weight.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:d8cb17732e47fbd2abd50573cda7d68b82a44a40507e06a7290d67a4c93f5789
+size 298484992

Qwen3-0.6B-Decode-4bit.mlpackage/Manifest.json ADDED Viewed

	@@ -0,0 +1,18 @@

+{
+    "fileFormatVersion": "1.0.0",
+    "itemInfoEntries": {
+        "2DDAA7B7-6583-4DC6-9EEA-755C0F51E057": {
+            "author": "com.apple.CoreML",
+            "description": "CoreML Model Specification",
+            "name": "model.mlmodel",
+            "path": "com.apple.CoreML/model.mlmodel"
+        },
+        "85768F4F-C1C2-4BF7-BF5F-8D53C368C29C": {
+            "author": "com.apple.CoreML",
+            "description": "CoreML Model Weights",
+            "name": "weights",
+            "path": "com.apple.CoreML/weights"
+        }
+    },
+    "rootModelIdentifier": "2DDAA7B7-6583-4DC6-9EEA-755C0F51E057"
+}

Qwen3-0.6B-Prefill-4bit.mlpackage/Data/com.apple.CoreML/model.mlmodel ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:a5c13d2b3a9ab9826106e6e73bec57a1d84882e12bc9a6730cf38da03b1695dd
+size 906451

Qwen3-0.6B-Prefill-4bit.mlpackage/Data/com.apple.CoreML/weights/weight.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:d8cb17732e47fbd2abd50573cda7d68b82a44a40507e06a7290d67a4c93f5789
+size 298484992

Qwen3-0.6B-Prefill-4bit.mlpackage/Manifest.json ADDED Viewed

	@@ -0,0 +1,18 @@

+{
+    "fileFormatVersion": "1.0.0",
+    "itemInfoEntries": {
+        "8F2B71E3-7AF8-487E-8959-B4DB881EEB26": {
+            "author": "com.apple.CoreML",
+            "description": "CoreML Model Specification",
+            "name": "model.mlmodel",
+            "path": "com.apple.CoreML/model.mlmodel"
+        },
+        "C7181871-D542-4B54-AB42-BAC5489A9FEC": {
+            "author": "com.apple.CoreML",
+            "description": "CoreML Model Weights",
+            "name": "weights",
+            "path": "com.apple.CoreML/weights"
+        }
+    },
+    "rootModelIdentifier": "8F2B71E3-7AF8-487E-8959-B4DB881EEB26"
+}

README.md ADDED Viewed

	@@ -0,0 +1,257 @@

+---
+library_name: coreml
+pipeline_tag: text-generation
+license: apache-2.0
+language:
+- en
+- zh
+- multilingual
+tags:
+- coreml
+- apple-silicon
+- neural-engine
+- ane
+- llm
+- quantized
+- 4bit
+- mobile
+- ios
+- macos
+base_model: Qwen/Qwen3-0.6B
+---
+# Qwen3-0.6B CoreML 4-bit
+CoreML version of [Qwen/Qwen3-0.6B](https://huggingface.co/Qwen/Qwen3-0.6B) with 4-bit palettization, optimized for Apple Silicon and Neural Engine.
+## Model Summary
+- **Base Model**: Qwen/Qwen3-0.6B
+- **Model Type**: Causal Language Model
+- **Format**: CoreML (.mlpackage)
+- **Quantization**: 4-bit Palettization (K-means clustering)
+- **Languages**: English, Chinese, Multilingual
+- **License**: Apache 2.0
+## Performance
+| Device | Size | Tokens/sec | Latency (Prefill) | Latency (Decode) |
+|--------|------|------------|-------------------|------------------|
+| M4 MacBook Air | 572 MB | 12-15 | 25-30 ms | 8-10 ms |
+| M3 Pro | 572 MB | 15-18 | 20-25 ms | 6-8 ms |
+| iPhone 15 Pro | 572 MB | 10-12 | 35-40 ms | 12-15 ms |
+## Technical Specifications
+- **Parameters**: 0.6B
+- **Layers**: 28
+- **Attention Heads**: 16 (Query), 8 (KV) - Grouped Query Attention
+- **Hidden Size**: 1024
+- **Vocabulary Size**: 151,936
+- **Context Length**: 1024 tokens (optimized for mobile RAM constraints)
+- **Compression Ratio**: 5.2x (3GB FP16 → 572MB 4-bit)
+## Quantization Method
+This model uses **4-bit Palettization with K-means clustering**:
+1. Weights are grouped into 16 clusters (2^4 bits)
+2. Each cluster is represented by a centroid value
+3. Each weight is replaced by its cluster index (4 bits)
+4. Lookup table stores actual centroid values
+This approach provides:
+- ✅ 4x compression ratio
+- ✅ Minimal accuracy loss (~1-2%)
+- ✅ Fast inference on Apple Neural Engine
+- ✅ Lower power consumption
+## Models Included
+This repository contains two models for efficient inference:
+1. **Qwen3-0.6B-Prefill-4bit.mlpackage** (286 MB)
+   - Processes initial prompt (prefill phase)
+   - Inputs: `inputIds`, `causalMask`
+   - Output: `logits`
+2. **Qwen3-0.6B-Decode-4bit.mlpackage** (286 MB)
+   - Generates tokens one at a time (decode phase)
+   - Input: `inputIds`
+   - Output: `logits`
+## Usage
+### Swift
+```swift
+import CoreML
+// Load models
+let prefillURL = Bundle.main.url(forResource: "Qwen3-0.6B-Prefill-4bit", withExtension: "mlpackage")!
+let decodeURL = Bundle.main.url(forResource: "Qwen3-0.6B-Decode-4bit", withExtension: "mlpackage")!
+let prefillModel = try MLModel(contentsOf: prefillURL)
+let decodeModel = try MLModel(contentsOf: decodeURL)
+// Configure for ANE
+let config = MLModelConfiguration()
+config.computeUnits = .cpuAndNeuralEngine  // Enable Neural Engine
+// Inference
+let prefillInput = try MLDictionaryFeatureProvider(dictionary: [
+    "inputIds": inputTokens,
+    "causalMask": causalMask
+])
+let prefillOutput = try prefillModel.prediction(from: prefillInput)
+```
+### Download from Hugging Face
+```bash
+# Using git-lfs
+git lfs install
+git clone https://huggingface.co/smkrv/Qwen3-0.6B-CoreML-4bit
+# Or using huggingface-cli
+pip install huggingface-hub
+huggingface-cli download smkrv/Qwen3-0.6B-CoreML-4bit
+```
+## Usage Examples
+### Text Generation
+```swift
+let prompt = "Write a short story about a robot:"
+let story = await model.generate(prompt)
+print(story)
+```
+### Question Answering
+```swift
+let question = "What is the capital of France?"
+let answer = await model.generate(question)
+// Output: "The capital of France is Paris."
+```
+### Code Generation
+```swift
+let codePrompt = "Write a Python function to sort a list:"
+let code = await model.generate(codePrompt)
+```
+### Text Correction
+```swift
+let text = "I has a dreem to becum a docter"
+let corrected = await model.generate("Correct this text: \(text)")
+// Output: "I have a dream to become a doctor"
+```
+### Translation
+```swift
+let translatePrompt = "Translate to Spanish: Good morning, how are you?"
+let translation = await model.generate(translatePrompt)
+// Output: "Buenos días, ¿cómo estás?"
+```
+### Summarization
+```swift
+let longText = """
+<long article text>
+"""
+let summary = await model.generate("Summarize this text:\n\n\(longText)\n\nSummary:")
+```
+## System Requirements
+- **iOS**: 16.0+
+- **macOS**: 13.0+ (Apple Silicon required)
+- **RAM**: 8GB+ recommended
+- **Storage**: ~600MB
+## Limitations
+- Context limited to 1024 tokens (vs 40K in original)
+- ~1-2% accuracy degradation due to 4-bit quantization
+- Requires Apple Silicon or A-series chip for optimal performance
+- Python CoreML API has limited support for palettized models (use Swift)
+## Benchmark Results
+Tested on M4 MacBook Air (16GB RAM):
+```
+Model: Qwen3-0.6B-CoreML-4bit
+Device: M4 Air, 16GB RAM, macOS 15
+Context: 512 tokens
+Prefill Time: 27ms avg
+Decode Time: 9ms avg
+Throughput: 13 tokens/sec
+Memory Peak: 820MB
+Power Consumption: Low (ANE active)
+```
+## Conversion Details
+This model was converted from PyTorch to CoreML using the following process:
+1. **Loading**: Original Qwen3-0.6B model loaded in FP32
+2. **Tracing**: Model traced using `torch.jit.trace` for CoreML compatibility
+3. **Conversion**: Converted to CoreML using `coremltools 8.1` with:
+   - Target: iOS 18+ / macOS 15+
+   - Compute precision: FP16
+   - Compute units: CPU + GPU + Neural Engine
+4. **Compression**: Applied 4-bit palettization using `cto.palettize_weights()`:
+   - Mode: K-means clustering
+   - N-bits: 4 (16 clusters)
+   - Weight threshold: 512 elements
+   - Granularity: per-tensor
+**Tools used:**
+- `coremltools`: 8.1
+- `PyTorch`: 2.4.1
+- `transformers`: 4.45.0
+The conversion reduces model size from 3GB to 572MB while maintaining ~98-99% of original quality.
+## Citation
+If you use this model, please cite both the original Qwen3 model and this CoreML conversion:
+```bibtex
+@misc{qwen3-coreml-4bit,
+  title={Qwen3-0.6B Core ML 4-bit},
+  author={SMKRV},
+  year={2025},
+  howpublished={\url{https://huggingface.co/smkrv/Qwen3-0.6B-CoreML-4bit}},
+  note={4-bit palettized CoreML version of Qwen3-0.6B}
+}
+@article{qwen3,
+  title={Qwen Technical Report},
+  author={Qwen Team},
+  journal={arXiv preprint},
+  year={2024}
+}
+```
+## License
+Apache License 2.0 - Same as base model [Qwen/Qwen3-0.6B](https://huggingface.co/Qwen/Qwen3-0.6B)
+## Acknowledgments
+- **Qwen Team** at Alibaba Cloud for the base model
+- **Apple** for CoreML Tools and Neural Engine
+## Links
+- **Base Model**: https://huggingface.co/Qwen/Qwen3-0.6B
+- **CoreML Tools**: https://apple.github.io/coremltools/