Diffusion Single File
comfyui

Anima v1.0 4-step Distillation LoRAs (PCM + DMDX + DMD2 ) Released

#157
by darask0 - opened

I've released two distillation LoRAs for Anima v1.0 base, exploring different
distillation methods at 4-step inference, CFG=1.0. Both are publicly available
for research and study purposes, though neither matches or surpasses the official
Civitai Anima Turbo LoRA in practical quality
.

TL;DR

  • Use the official Civitai Anima Turbo for production
  • The LoRAs in this repository are for research/educational purposes only
  • PCM yielded poor visual quality (not recommended)
  • DMDX was visually indistinguishable from its Civitai Turbo warm-start (no value over Turbo)

Repository

Released LoRAs (both non-recommended for production)

LoRA Method Init Result
PCM Phased Consistency Model (Wang et al. 2024) Cold-start ⚠ Poor quality β€” not recommended
DMDX (ADM) Adversarial Distribution Matching (arxiv 2507.18569) Civitai Anima Turbo warm-start ⚠ Visually identical to Turbo (no added value)

DMD2 + TrigFlow training is underway and will be added when complete.

Technical Details

  • Base: anima-base-v1.0.safetensors
  • LoRA target: Wide LoRA covering AdaLN + attention + MLP linear layers (980 keys, rank 32)
  • Dataset: 5,000 Anima self-distillation samples (teacher 20-step CFG=4.5)
  • Hardware: NVIDIA B200 on Modal
  • Training cost: PCM ~$22 / DMDX ~$27

PCM (Not Recommended)

  • Phased Consistency Model with N=50 Euler timesteps split into K=4 phases
  • Pseudo-Huber loss with CFG-augmentation (w∈[4.0, 5.0]) embedded during training
  • Cold-start β€” converges numerically without warm-start
  • 5,000 steps, ~3.4 hours, ~$22
  • Result: Visual quality fell short of the Anima Turbo baseline despite stable loss curves.
    The cold-start approach failed to develop coherent style features expected from Anima distillation.
    Numerical convergence β‰  generation quality.

DMDX (ADM, Equivalent to Turbo)

  • Implementation of the ADM portion from arxiv 2507.18569
    (Lu et al., ByteDance Seed Vision)
  • Replaces DMD2's reverse-KL gradient trick with a **learnable discriminator
    • hinge GAN (TVD minimization)**
  • LADD-style discriminator: teacher MiniTrainDIT (frozen) backbone + 5 spectral-norm heads
  • Cubic time schedule + teacher Ξ”t evolution (Ξ”t = T/64)
  • 5,000 outer steps, ~4.3 hours, ~$27
  • Result: Student LoRA barely diverged from the Civitai Anima Turbo warm-start state
    even after full 5,000 outer training. Output visually indistinguishable from Turbo.
    GAN dynamics were healthy (D vs G oscillation, no mode collapse) but weight updates
    too conservative to depart from the warm-start trajectory.

Key Lessons Learned

Lesson 1: PCM Cold-Start is Risky on Anima

PCM trained with stable loss but produced degraded visual quality. Numerical metrics
(loss curves, no NaN, no divergence) gave a false sense of success. Visual inspection
at intermediate checkpoints would have caught this earlier.

β†’ Always verify with visual generation tests at intermediate checkpoints, not just loss curves.

Lesson 2: Warm-Start Dominance in DMDX

DMDX with strong warm-start (Civitai Turbo) failed to develop independent distillation
behavior. Possible causes:

  • Hinge GAN updates are conservative when D approaches perfect classification
    (G's gradients saturate)
  • Warm-start already worked well at the target 4-step CFG=1.0, leaving little room
    for G to improve
  • recon_weight=0 (pure ADM) provided no soft guidance signal

β†’ Future attempts might try cold-start, higher lr_gen, or recon_weight > 0 (Smooth-L1
anchor borrowed from LADD).

Lesson 3: Beat the Official Distill LoRA is Hard

CircleStone Labs trained the official Anima Turbo with vastly more compute and iteration
than what's feasible on a hobby budget (~$50-300 per attempt vs official ~$10k+ likely).
Beating the official Turbo on a hobby budget is not realistic; aim for understanding
and method comparison instead.

ComfyUI Usage (if you still want to try)

ModelSamplingAuraFlow: sigma_shift = 3.0
LoraLoaderModelOnly: strength_model = 1.0
KSampler: steps = 4, cfg = 1.0,
sampler = er_sde, scheduler = simple

PCM was tested with both er_sde + simple and res_multistep + beta. DMDX tested with
er_sde + simple. The official Civitai Anima Turbo
remains the recommended option.

Conclusion

While these LoRAs are technically functional, practical users should stick with the
official Civitai Anima Turbo
. This repository's value is in:

  • Documented distillation method comparisons (PCM, DMDX, DMD2)
  • Source code + Modal pipeline for experimentation
  • Honest failure analysis (docs/dmdx.md)
    Feedback welcome.

License

  • LoRA weights: Apache-2.0
  • Base model (Anima v1.0): CircleStone Labs Non-Commercial License + NVIDIA Open Model License
  • Non-commercial use only

Sign up or log in to comment