since it is a 2B model, why slower than SDXL with 3.5B.

#158

by ArranEye - opened 14 days ago

Discussion

ArranEye

14 days ago

Is this due to the architecture? Will there be technological breakthroughs in the future to solve this problem?

kongbai-84

14 days ago

The generation speed is slower than SDXL because Anima is a model based on the DiT (Diffusion Transformer) architecture. This speed difference stems from the inherent characteristics of the architecture itself.

Specifically, while SDXL uses a UNet architecture that relies on highly efficient local convolutions, DiT treats image latents as a sequence of patches and applies global self-attention mechanisms. The computational complexity of self-attention scales quadratically with the sequence length (the number of image patches). This means that, especially at higher resolutions, calculating attention across all patches requires significantly more computational overhead and processing time per generation step compared to UNet.

Despite this, you can look forward to the release of the official Turbo model, which is expected to greatly optimize and accelerate this process.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment