Gap between 3.0 and 4.0 bpw?

#1
by McUH - opened

It would be nice to see some quants between 3.0 and 4.0 bpw (3.5, maybe 3.75). EXL3 performance improves especially in the below 4bpw area, so more fine steps there would be great (as each one brings visible improvement). Above 4.0bpw EXL3 improvement becomes less obvious, so not sure there is need for so many quants there (4.0, 4.25, 4.5). At the very least I would replace the 4.25 or 5.65 with 3.5 (in case you do not want to increase number of quants).

Crucible Labs org

they do take a long time to do, (about 5 hours each)
4.25 was requested and 5.65 is optimized for 3x3090 to have decent context and keep cache at FP16
i can do 1 more, but its gonna take awhile, i have many quants queued up right now

Thank you for the reply, I am aware EXL3 is difficult to make. The 3.5 or 3.75 would be ideal for me (40GB VRAM), and that 2.5-4bpw is where the quality of EXL3 truly improves with each increment (so makes sense to have steps there). There is likely much smaller difference between 5.0, 5.65 and 6.0 or between 4.0, 4.25 and 4.5. But 7.0bpw seems most redundant between 6.0 and 8.0 (so having 3.5 instead of 7.0 in future would be probably more beneficial).
Still, it is quite a large range of exl3 sizes compared to other available models, so thanks for that. If you make one more, then 3.5 would be probably best (between 3.0 and 4.0), it is also common size for 70B models by other people making EXL3 quants (so it is possible someone else will make it, though not with your calibration data set).

Darkhn changed discussion status to closed

I made a 4.0bpw some days ago. Is there a way that I can upload it to you?

Darkhn changed discussion status to open
Crucible Labs org

you can do a pull request upload

I just saw that you already have a 4.0 online. I guess that next time!

Sign up or log in to comment