Model Overview

Model Architecture: Qwen3_5MoeForConditionalGeneration
- Input: Text
- Output: Text
Supported Hardware Microarchitecture: AMD MI300 MI350/MI355
ROCm: 7.0
PyTorch: 2.8.0
Transformers: 5.2.0
Operating System(s): Linux
Inference Engine: SGLang/vLLM
Model Optimizer: AMD-Quark (v0.11.1)
- Weight quantization: OCP MXFP4, Static
- Activation quantization: OCP MXFP4, Dynamic

Model Quantization

The model was quantized from Qwen/Qwen3.5-35B-A3B-FP8 using AMD-Quark. The weights are quantized to MXFP4 and activations are quantized to MXFP4.

Quantization scripts:

cd Quark/examples/torch/language_modeling/llm_ptq/
export exclude_layers="lm_head  model.visual.*  mtp.*  *mlp.gate *shared_expert_gate* *.linear_attn.*  *.self_attn.*  *.shared_expert.*"
python3 quantize_quark.py --model_dir Qwen/Qwen3.5-35B-A3B-FP8 \
                          --quant_scheme  mxfp4 \
                          --file2file_quantization \
                          --exclude_layers $exclude_layers \
                          --output_dir amd/Qwen3.5-35B-A3B-MXFP4

For further details or issues, please refer to the AMD-Quark documentation or contact the respective developers.

Evaluation

The model was evaluated on gsm8k benchmarks using the vllm framework.

Accuracy

Benchmark	Qwen/Qwen3.5-35B-A3B	amd/Qwen3.5-35B-A3B-MXFP4(this model)	Recovery
gsm8k (flexible-extract)	90.52	89.23	98.57%

Reproduction

The GSM8K results were obtained using the vLLM framework, based on the Docker image rocm/vllm-dev:nightly, and vLLM is installed inside the container with fixes applied for model support.

Evaluating model in a new terminal

lm_eval \
  --model vllm \
  --model_args pretrained=$MODEL,tensor_parallel_size=1,max_model_len=262144,gpu_memory_utilization=0.90,max_gen_toks=2048,trust_remote_code=True,reasoning_parser=qwen3 \
  --tasks gsm8k  --num_fewshot 5 \
  --batch_size auto

License

Downloads last month: 72

Safetensors

Model size

21B params

Tensor type

F32

BF16

Model tree for amd/Qwen3.5-35B-A3B-MXFP4

Base model

Qwen/Qwen3.5-35B-A3B-Base

Finetuned

Qwen/Qwen3.5-35B-A3B

Quantized

(202)

this model