CNS sampler for Anima and more

#182

by iskanderius - opened 3 days ago

•

I came across these Reddit threads:
https://www.reddit.com/r/StableDiffusion/comments/1tray25/colored_noise_diffusion_sampling_plugandplay/
https://www.reddit.com/r/comfyui/comments/1tr581g/i_ported_the_cns_paper_colored_noise_diffusion/

About this recent paper:
https://arxiv.org/pdf/2605.30332

A recent paper, "Colored Noise Diffusion Sampling" (CNS), promises improved detail and structure in generations by changing how noise is injected during sampling. A ComfyUI implementation was quickly thrown together, but it had several serious issues (like crashing on APU or AMD cards). Plus, it was primarily targeted at Flux1 and Flux2 Klein.

Why will this sampler work not just for Flux?
Because "spectral bias" (where the model learns to draw overall shapes first and details later) is a fundamental property of all diffusion models. Anima, SDXL, and Flux all operate on the same physical principles. CNS simply manages this universal process. Also, the original node had hardcoded tensor dimensions (4D), which caused video models like Anima to immediately throw an error. I fixed that, so now the node smoothly digests 5D video latents as well.
So, I vibe-coded a rework based on the previous interpretation using a chatbot.

Here is why it's interesting and how it works:

🧠 The Core: Why do we need "Colored" noise?
Standard SDE samplers (like Euler Ancestral) inject white noise—equal energy across all frequencies. But diffusion models have "spectral bias": they learn low frequencies (overall structure/shape) early, and high frequencies (fine details/textures) late.
CNS replaces white noise with colored noise. It looks at the current step and dynamically routes noise energy into the frequency bands the model hasn't resolved yet.
Early steps? Fills in structural gaps.
Late steps? Pushes energy into high frequencies for detail sharpness.
The result is often richer textures and sharper details compared to standard Euler, without the chaotic smudging typical of high-churn samplers.

⚙️ Interface: Presets vs. Advanced settings
The original node I based this on dumped raw math variables (alpha_exp_sharpness, gamma_divider, etc.). I reduced it to a more understandable interface:

Presets: The paper provides two hardcoded sets of hyperparameters that work in 99% of cases:
Unguided (low CFG 1-5): For anime/stylization. Actively boosts high frequencies for expressive details.
Guided (high CFG 7+): For realism. Very gentle coloring to avoid noise artifacts.
Three main settings:
s_churn: How much noise to inject (0 = ODE, 0.5 is standard. Lower = cleaner).
energy_scale: Global volume knob for the effect.
Advanced overrides: If you enable override_preset, you get 3 sliders translated into human language:
gamma_divider: Coloring strength (Higher = closer to pure white noise).
high_freq_boost: The "Dirt/Detail" slider. Positive values = more detail but also grain. Zero or negative = smoother/cleaner.
power_gamma: Sharpness of the noise curve.
🟢 The AMD/Zluda Fix (and why Nvidia users shouldn't worry)
The original node used torch.fft.fft2 directly on the GPU. On AMD cards running via Zluda, for example, this instantly crashed with a CUFFT_INTERNAL_ERROR.

Why? Zluda is a translation layer (a workaround). It tries to translate Nvidia's proprietary cuFFT calls into AMD ROCm on the fly, and often chokes on the complex math of Fourier transforms with complex tensors.
Solution: I routed the FFT math to the CPU (noise.cpu() -> do FFT -> colored.to(device)). The CPU uses standard, stable math libraries (like MKL) that completely bypass Zluda bugs.
Does this hurt Nvidia users? No. The latent is tiny, and moving it to RAM/CPU and back adds a fraction of a millisecond per step. You won't notice the speed difference, but AMD users get 100% stability.

💡 Quick start tips
If you want to try it:
Start with the unguided preset for Anima, for example, and s_churn: 0.25.
If the image looks too "dirty" or noisy, lower high_freq_boost (enable override and set to 0.0 or -0.1) OR raise gamma_divider to 5.0+.
If it's too smooth—slowly turn s_churn up or add a little bit of high_freq_boost.

Below are a few examples. (Apologies that this isn't a perfectly clean comparison, as I used my favorite LoRA stack).
In any case, you can test it yourself; at the very least, it's an interesting sampler.
Please go easy on me if you don't like something; it's better to have an implementation for something like Anima than to not have one at all, even like this.
And you can always write your own node based on the paper.

Good luck and all the best to everyone.

1)er_sde+normal

2)euler+normal

3)cns+normal

4)cns+kl_optimal

5)cns+karras (really like Chroma vibe)

I don't have a GitHub, so here's a Dropbox link, just copy in Custom Nodes as always.
https://www.dropbox.com/scl/fi/xq4iawvkam27nr10ki8d1/CNSSampler_Z.7z?rlkey=j583i7dzqi6z4902bwltshdr4&st=axis7bk8&dl=0

iskanderius

3 days ago

•

edited about 10 hours ago

cns+bong_tangent

cns+sgm_uniform

sorryhyun

3 days ago

Actually this works nice and almost free lunch gain. I described how this works and how I implemented anima_lora github repo currently I'm working on.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment