Qwen3-8B-LaCo-Pruned

This model is a layer-pruned version of Qwen3-8B-Base using the LaCo (Layer Collapse) structured pruning method.

Model Summary

Attribute	Value
Base Model	Qwen/Qwen3-8B-Base
Pruning Method	LaCo (Layer Collapse)
Original Layers	36
Pruned Layers	30
Layers Removed	6
Compression	16.7%

Key Results

This model achieves 16.7% compression while retaining:

~90% of physical reasoning (PIQA)
~94% of commonsense reasoning (WinoGrande)
~79% of common sense completion (HellaSwag)
~41% of factual knowledge (MMLU)

This is a raw pruned model without post-training. Fine-tuning can further recover lost capabilities.

Benchmark Results (Pre-Training)

Note: All benchmarks below are evaluated on the pruned model without any post-training or fine-tuning. These results represent the raw performance after pruning only. Post-training is expected to improve these scores, particularly on knowledge-intensive tasks like MMLU.

Comparison with Original Qwen3-8B-Base

Benchmark	Original	Pruned	Retention
PIQA (acc_norm)	79.54%	71.38%	89.7%
WinoGrande	67.0%	62.83%	93.8%
ARC-Challenge (acc_norm)	42.0%	36.09%	85.9%
ARC-Easy (acc_norm)	72.0%	58.04%	80.6%
HellaSwag (acc_norm)	78.55%	61.98%	78.9%
BoolQ	83.09%	64.95%	78.2%
MMLU (5-shot)	76.89%	31.30%	40.7%

Original scores from Qwen3 Technical Report

Benchmark Interpretation

Capability	Benchmarks	Retention	Status
Physical Reasoning	PIQA	89.7%	Excellent
Commonsense Reasoning	WinoGrande	93.8%	Excellent
Basic Reasoning	ARC-Challenge	85.9%	Good
Reading Comprehension	BoolQ	78.2%	Good
Common Sense	HellaSwag	78.9%	Good
Factual Knowledge	MMLU	40.7%	Degraded

The "Knowledge Cliff"

Our experiments reveal a critical finding: factual knowledge collapses catastrophically between 16-22% compression.

Compression	Layers	MMLU	Status
16.7%	30	31.30%	Partial retention
22.2%	28	25.89%	Random chance
27.8%	26	25.12%	Random chance

While reasoning capabilities degrade gradually with compression, factual knowledge encoded in specific layers is lost abruptly when those layers are removed.

Intended Use

This model is suitable for:

Research on model compression and efficiency
Fine-tuning base for domain-specific applications
Inference optimization where speed/memory matters
Applications prioritizing reasoning over factual recall

Limitations

Important: This is a raw pruned model without post-training.

Use Case	Recommendation
Physical/commonsense reasoning	Recommended
Reading comprehension	Recommended
General text understanding	Recommended
Factual question answering	Fine-tune first
Knowledge-intensive tasks	Fine-tune first

Pruning Details

LaCo Hyperparameters

Parameter	Value	Description
MERGE_LAYERS (C)	3	Layers merged per operation
LOWEST_LAY (L)	4	Minimum layer index for merging
HIGHEST_LAY (H)	28	Maximum layer index for merging
INTERVAL (I)	2	Minimum gap between merge points
THRESHOLD (T)	0.85	Cosine similarity threshold
MAX_COMPRESSION	20%	Maximum allowed compression

Pruning Statistics

Metric	Value
Successful Merges	3
Rejected Merges	0
Total Iterations	4
Final Compression	16.7%

Usage

Basic Inference

from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "Mercity/Qwen3-8B-LaCo-Pruned"

tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype="auto",
    device_map="auto",
    trust_remote_code=True
)

# Text generation
prompt = "The process of photosynthesis"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=100, do_sample=True, temperature=0.7)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

With 4-bit Quantization (Further Compression)

from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig

quantization_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_compute_dtype="float16",
    bnb_4bit_quant_type="nf4",
)

model = AutoModelForCausalLM.from_pretrained(
    "Mercity/Qwen3-8B-LaCo-Pruned",
    quantization_config=quantization_config,
    device_map="auto",
    trust_remote_code=True
)

Recovery Recommendations

To improve factual knowledge after pruning:

LoRA Fine-tuning (Recommended)

from peft import LoraConfig, get_peft_model

lora_config = LoraConfig(
    r=32,
    lora_alpha=64,
    target_modules=["q_proj", "k_proj", "v_proj", "o_proj", 
                    "gate_proj", "up_proj", "down_proj"],
    lora_dropout=0.05,
)
model = get_peft_model(model, lora_config)
# Fine-tune on OpenOrca, Alpaca, or domain-specific data

Expected recovery: MMLU could reach 45-55% with fine-tuning.

Technical Specifications

Attribute	Value
Architecture	Transformer decoder-only
Layers	30
Hidden Size	4096
Attention Heads (Q)	32
Attention Heads (KV)	8 (GQA)
Intermediate Size	12288
Vocabulary Size	151,669
Max Context Length	32,768 tokens
Precision	bfloat16

Citation

If you use this model, please cite the original LaCo paper and Qwen3:

@article{yang2024laco,
  title={LaCo: Large Language Model Pruning via Layer Collapse},
  author={Yang, Yifei and Cao, Zouying and Zhao, Hai},
  journal={arXiv preprint arXiv:2402.11187},
  year={2024}
}

@misc{qwen3technicalreport,
  title={Qwen3 Technical Report},
  author={Qwen Team},
  year={2025},
  eprint={2505.09388},
  archivePrefix={arXiv},
  primaryClass={cs.CL},
  url={https://arxiv.org/abs/2505.09388}
}

References

License

Apache 2.0 (same as base Qwen3 model)

Acknowledgments

Qwen Team for the excellent Qwen3-8B-Base model
LaCo authors for the pruning methodology
Hugging Face for model hosting

Downloads last month: 10

Safetensors

Model size

7B params

Tensor type

BF16

Model tree for Mercity/Qwen3-8B-LaCo-30L

Base model

Qwen/Qwen3-8B-Base

Finetuned

(309)

this model

Dataset used to train Mercity/Qwen3-8B-LaCo-30L

Evaluation results

Accuracy (Normalized) on PIQA
self-reported

71.380
Accuracy (Normalized) on HellaSwag
self-reported

61.980
Accuracy on BoolQ
self-reported

64.950
Accuracy on WinoGrande
self-reported

62.830
Accuracy (Normalized) on ARC-Challenge
self-reported

36.090
Accuracy (Normalized) on ARC-Easy
self-reported

58.040
Accuracy (5-shot) on MMLU
self-reported

31.300