Diffusion Single File
comfyui

since it is a 2B model, why slower than SDXL with 3.5B.

#158
by ArranEye - opened

Is this due to the architecture? Will there be technological breakthroughs in the future to solve this problem?

The generation speed is slower than SDXL because Anima is a model based on the DiT (Diffusion Transformer) architecture. This speed difference stems from the inherent characteristics of the architecture itself.

Specifically, while SDXL uses a UNet architecture that relies on highly efficient local convolutions, DiT treats image latents as a sequence of patches and applies global self-attention mechanisms. The computational complexity of self-attention scales quadratically with the sequence length (the number of image patches). This means that, especially at higher resolutions, calculating attention across all patches requires significantly more computational overhead and processing time per generation step compared to UNet.

Despite this, you can look forward to the release of the official Turbo model, which is expected to greatly optimize and accelerate this process.

Sign up or log in to comment