Qwen3-8B-LaCo-Pruned
This model is a layer-pruned version of Qwen3-8B-Base using the LaCo (Layer Collapse) structured pruning method.
Model Summary
| Attribute | Value |
|---|---|
| Base Model | Qwen/Qwen3-8B-Base |
| Pruning Method | LaCo (Layer Collapse) |
| Original Layers | 36 |
| Pruned Layers | 30 |
| Layers Removed | 6 |
| Compression | 16.7% |
Key Results
This model achieves 16.7% compression while retaining:
- ~90% of physical reasoning (PIQA)
- ~94% of commonsense reasoning (WinoGrande)
- ~79% of common sense completion (HellaSwag)
- ~41% of factual knowledge (MMLU)
This is a raw pruned model without post-training. Fine-tuning can further recover lost capabilities.
Benchmark Results (Pre-Training)
Note: All benchmarks below are evaluated on the pruned model without any post-training or fine-tuning. These results represent the raw performance after pruning only. Post-training is expected to improve these scores, particularly on knowledge-intensive tasks like MMLU.
Comparison with Original Qwen3-8B-Base
| Benchmark | Original | Pruned | Retention |
|---|---|---|---|
| PIQA (acc_norm) | 79.54% | 71.38% | 89.7% |
| WinoGrande | 67.0% | 62.83% | 93.8% |
| ARC-Challenge (acc_norm) | 42.0% | 36.09% | 85.9% |
| ARC-Easy (acc_norm) | 72.0% | 58.04% | 80.6% |
| HellaSwag (acc_norm) | 78.55% | 61.98% | 78.9% |
| BoolQ | 83.09% | 64.95% | 78.2% |
| MMLU (5-shot) | 76.89% | 31.30% | 40.7% |
Original scores from Qwen3 Technical Report
Benchmark Interpretation
| Capability | Benchmarks | Retention | Status |
|---|---|---|---|
| Physical Reasoning | PIQA | 89.7% | Excellent |
| Commonsense Reasoning | WinoGrande | 93.8% | Excellent |
| Basic Reasoning | ARC-Challenge | 85.9% | Good |
| Reading Comprehension | BoolQ | 78.2% | Good |
| Common Sense | HellaSwag | 78.9% | Good |
| Factual Knowledge | MMLU | 40.7% | Degraded |
The "Knowledge Cliff"
Our experiments reveal a critical finding: factual knowledge collapses catastrophically between 16-22% compression.
| Compression | Layers | MMLU | Status |
|---|---|---|---|
| 16.7% | 30 | 31.30% | Partial retention |
| 22.2% | 28 | 25.89% | Random chance |
| 27.8% | 26 | 25.12% | Random chance |
While reasoning capabilities degrade gradually with compression, factual knowledge encoded in specific layers is lost abruptly when those layers are removed.
Intended Use
This model is suitable for:
- Research on model compression and efficiency
- Fine-tuning base for domain-specific applications
- Inference optimization where speed/memory matters
- Applications prioritizing reasoning over factual recall
Limitations
Important: This is a raw pruned model without post-training.
| Use Case | Recommendation |
|---|---|
| Physical/commonsense reasoning | Recommended |
| Reading comprehension | Recommended |
| General text understanding | Recommended |
| Factual question answering | Fine-tune first |
| Knowledge-intensive tasks | Fine-tune first |
Pruning Details
LaCo Hyperparameters
| Parameter | Value | Description |
|---|---|---|
| MERGE_LAYERS (C) | 3 | Layers merged per operation |
| LOWEST_LAY (L) | 4 | Minimum layer index for merging |
| HIGHEST_LAY (H) | 28 | Maximum layer index for merging |
| INTERVAL (I) | 2 | Minimum gap between merge points |
| THRESHOLD (T) | 0.85 | Cosine similarity threshold |
| MAX_COMPRESSION | 20% | Maximum allowed compression |
Pruning Statistics
| Metric | Value |
|---|---|
| Successful Merges | 3 |
| Rejected Merges | 0 |
| Total Iterations | 4 |
| Final Compression | 16.7% |
Usage
Basic Inference
from transformers import AutoModelForCausalLM, AutoTokenizer
model_name = "Mercity/Qwen3-8B-LaCo-Pruned"
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
model_name,
torch_dtype="auto",
device_map="auto",
trust_remote_code=True
)
# Text generation
prompt = "The process of photosynthesis"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=100, do_sample=True, temperature=0.7)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
With 4-bit Quantization (Further Compression)
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
quantization_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_compute_dtype="float16",
bnb_4bit_quant_type="nf4",
)
model = AutoModelForCausalLM.from_pretrained(
"Mercity/Qwen3-8B-LaCo-Pruned",
quantization_config=quantization_config,
device_map="auto",
trust_remote_code=True
)
Recovery Recommendations
To improve factual knowledge after pruning:
LoRA Fine-tuning (Recommended)
from peft import LoraConfig, get_peft_model
lora_config = LoraConfig(
r=32,
lora_alpha=64,
target_modules=["q_proj", "k_proj", "v_proj", "o_proj",
"gate_proj", "up_proj", "down_proj"],
lora_dropout=0.05,
)
model = get_peft_model(model, lora_config)
# Fine-tune on OpenOrca, Alpaca, or domain-specific data
Expected recovery: MMLU could reach 45-55% with fine-tuning.
Technical Specifications
| Attribute | Value |
|---|---|
| Architecture | Transformer decoder-only |
| Layers | 30 |
| Hidden Size | 4096 |
| Attention Heads (Q) | 32 |
| Attention Heads (KV) | 8 (GQA) |
| Intermediate Size | 12288 |
| Vocabulary Size | 151,669 |
| Max Context Length | 32,768 tokens |
| Precision | bfloat16 |
Citation
If you use this model, please cite the original LaCo paper and Qwen3:
@article{yang2024laco,
title={LaCo: Large Language Model Pruning via Layer Collapse},
author={Yang, Yifei and Cao, Zouying and Zhao, Hai},
journal={arXiv preprint arXiv:2402.11187},
year={2024}
}
@misc{qwen3technicalreport,
title={Qwen3 Technical Report},
author={Qwen Team},
year={2025},
eprint={2505.09388},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2505.09388}
}
References
License
Apache 2.0 (same as base Qwen3 model)
Acknowledgments
- Qwen Team for the excellent Qwen3-8B-Base model
- LaCo authors for the pruning methodology
- Hugging Face for model hosting
- Downloads last month
- 10
Model tree for Mercity/Qwen3-8B-LaCo-30L
Base model
Qwen/Qwen3-8B-BaseDataset used to train Mercity/Qwen3-8B-LaCo-30L
Evaluation results
- Accuracy (Normalized) on PIQAself-reported71.380
- Accuracy (Normalized) on HellaSwagself-reported61.980
- Accuracy on BoolQself-reported64.950
- Accuracy on WinoGrandeself-reported62.830
- Accuracy (Normalized) on ARC-Challengeself-reported36.090
- Accuracy (Normalized) on ARC-Easyself-reported58.040
- Accuracy (5-shot) on MMLUself-reported31.300