Qwen-Image Technical Report
Paper
•
2508.02324
•
Published
•
267
4-bit NF4 quantized version of Qwen-Image-Edit-2511 using BitsAndBytes.
This quantized model significantly reduces VRAM requirements, making it accessible on consumer GPUs like RTX 3090/4080/4090.
| Input | Prompt | Output |
|---|---|---|
![]() |
"A cat wearing stylish sunglasses" | ![]() |
![]() |
"Cyberpunk style with neon lights" | ![]() |
Combine multiple images into one coherent scene:
| Input 1 | Input 2 | Prompt | Output |
|---|---|---|---|
![]() |
![]() |
"A cat sitting on a mountain cliff" | ![]() |
![]() |
![]() |
"Person hiking in the mountains" | ![]() |
| Input | Style | Output |
|---|---|---|
![]() |
Studio Ghibli | ![]() |
![]() |
Winter + Northern Lights | ![]() |
pip install torch diffusers transformers accelerate bitsandbytes
import torch
from PIL import Image
from diffusers import QwenImageEditPlusPipeline
# Load quantized model
pipe = QwenImageEditPlusPipeline.from_pretrained(
"seochan99/Qwen-Image-Edit-2511-bnb-nf4",
torch_dtype=torch.bfloat16,
)
pipe.to("cuda")
# Single image editing
image = Image.open("input.png")
result = pipe(
image=[image],
prompt="Turn this into anime style",
true_cfg_scale=4.0,
negative_prompt=" ",
num_inference_steps=50,
).images[0]
result.save("output.png")
Combine two images into one coherent scene:
img1 = Image.open("person.png")
img2 = Image.open("background.png")
result = pipe(
image=[img1, img2],
prompt="Person standing in the forest, natural lighting",
true_cfg_scale=4.0,
negative_prompt=" ",
num_inference_steps=50,
).images[0]
| Version | VRAM Usage |
|---|---|
| Original (BF16) | ~40GB |
| This (NF4 4-bit) | ~17GB |
Tested on NVIDIA RTX 4090.
Apache 2.0 (same as base model)
Base model
Qwen/Qwen-Image-Edit-2511