Interesting model

#1
by FlameF0X - opened

So... i try it and it feels promising. It's fast on my laptop with a intel i3-6006u CPU and 12GB of RAM.
image

You, I, or someone, should run a benchmark on this model.

I did test it on code. It's interesting, it hallucinates at complex tasks-- but its good at simple assisting.

i ran a benchmark
.png

kinda sad

I'm getting a ton of hallucination

Well, it is a crazy small model

I'm getting a ton of hallucination

Really?

Yes. Its a 0.5 Billion parameter model. 500M. That's crazy small.

I know that its small. Qwen2-0.5B-Instruct behaves the same.

But not all SLM hallucinate the same. Like LFM2.5 350M, its smaller than qwen3, but it hallucinates less when using it than the previous version. (This sounds kinda out of context 😐 )

Guess the maker of this didn't do RL for hallucinations

do*

But yes. I think the RL for preventing hallucinations wasn't in the model road map.

Also, hallucination can be solved not only with RL, but also with a proper training. If the model is under-trained, it tends to hallucinate more often. (Saying from my own experience)

I do suspect that this is just Qwen3 0.5B but with a different name.

It would be nice if they provided their own inference script so we could see it actually evolve, because llama.cpp (or vllm) don't actually have this feature

Edit: Oops, guess it does

this?

use ruvllm::sona::SonaConfig;

let config = SonaConfig {
    micro_lora_rank: 2,
    base_lora_rank: 8,
    learning_rate: 0.001,
    ewc_lambda: 0.5,  // Memory protection strength
    pattern_threshold: 0.75,
    ..Default::default()
};

Sign up or log in to comment