Plans to upscale the parameters count?

#139

by kabachuha - opened about 19 hours ago

Hi! I really like the model, however I wonder if the parameter count might be a limitation, because it is even smaller than Z-Image.

In the LLM community there is a fine practice to so-called RYS or "depth upscale" models, to give them better parameter count without re-training from scratch and preserving the most of the model's knowledge. These upscales can later be continue to be trained to get even more grasp on the subjects.

Example depth upscales: https://dnhkng.github.io/posts/rys/, DavidAU's fine-tunes like https://huggingface.co/DavidAU/Qwen3.6-40B-Claude-4.6-Opus-Deckard-Heretic-Uncensored-Thinking-NEO-CODE-Di-IMatrix-MAX-GGUF

Would you like to do this or have you already tried it and the model broke?

guri06

about 18 hours ago

Great idea, as much as applying the DeepSeek-V4-Pro 1.6T A49B Text Encoder.

onixxexxd5555LOAF

about 16 hours ago

So the thing is that even the most minuscule changes to the text encoder (such as very low KLD abliteration or even "high quality" quants like q8/fp8) degrade the output quality because the model is trained on the very precise outputs the text encoder currently generates. So even if we end up with a significantly smarter text encoder, it would take very extensive and expensive training to take advantage of that.
Furthermore tdrussell said in the past that he believes the current text encoder is good enough and that he believes the model is being bottlenecked in other ways. (I believe he didn't elaborate further but sounds believable to me.)
And lastly Anima is also a weird Frankenstein model with qwen 0.6 outputs being mapped to underlying t5 with a tiny llm adapter duct taping both together. Which further complicates any such architecture change proposals.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment