Wan2.1-Fun-V1.1-1.3B-InP (Diffusers)

This is a diffusers-format conversion of alibaba-pai/Wan2.1-Fun-V1.1-1.3B-InP (Wan-Fun Inpaint V1.1 1.3B) from VideoX-Fun format.

Model Details

Architecture: WanTransformer3DModel with in_channels=36 (16 noise + 4 mask + 16 image latent)
Parameters: 1.3B
Pipeline: WanImageToVideoPipeline (standard diffusers, no patching required)
Resolution: 480x832 (480p) recommended
Frames: 49 frames at 16fps (~3 seconds)

This model has the same I2V architecture as the official Wan2.1-I2V-14B-480P (in_channels=36), but at 1.3B scale.

Usage

import torch
from diffusers import WanImageToVideoPipeline
from PIL import Image

pipe = WanImageToVideoPipeline.from_pretrained(
    "engineerA314/Wan2.1-Fun-V1.1-1.3B-InP-Diffusers",
    torch_dtype=torch.bfloat16,
)
pipe.enable_sequential_cpu_offload()

image = Image.open("first_frame.png").convert("RGB")

output = pipe(
    image=image,
    prompt="A person is talking naturally",
    negative_prompt="static, blurred, low quality",
    height=480,
    width=832,
    num_frames=49,
    num_inference_steps=50,
    guidance_scale=5.0,
)

from diffusers.utils import export_to_video
export_to_video(output.frames[0], "output.mp4", fps=16)

Conversion Details

Converted from VideoX-Fun format using 1:1 weight key mapping (983 keys). No architectural modifications were needed -- the standard WanImageToVideoPipeline handles in_channels=36 natively.

Components

Component	Source
Transformer	Converted from `alibaba-pai/Wan2.1-Fun-V1.1-1.3B-InP`
VAE	`Wan-AI/Wan2.1-T2V-1.3B-Diffusers`
Text Encoder	`Wan-AI/Wan2.1-T2V-1.3B-Diffusers` (UMT5-XXL)
Image Encoder	`Wan-AI/Wan2.1-I2V-14B-480P-Diffusers` (CLIP ViT-H-14)
Scheduler	UniPCMultistepScheduler (`flow_shift=3.0`)

Comparison with TI2V variant

	This model (InP)	TI2V
`in_channels`	36 (noise + mask + image)	32 (noise + image)
Pipeline patches	None needed	`prepare_latents` override required
Origin	Wan-Fun Inpaint	Wan-Fun Camera Control (adapter removed)

Acknowledgements

Alibaba PAI / VideoX-Fun for the original Wan-Fun models
Wan-Video for the Wan 2.1 architecture

Downloads last month: 1,037

Model tree for engineerA314/Wan2.1-Fun-V1.1-1.3B-InP-Diffusers

Base model

alibaba-pai/Wan2.1-Fun-V1.1-1.3B-InP

Finetuned

(1)

this model