Diwank Tomer's picture

Diwank Tomer PRO

diwank

·

https://diwank.name

AI & ML interests

None yet

Recent Activity

liked a model 38 minutes ago

JetBrains/Mellum2-12B-A2.5B-Thinking

reacted to sergiopaniego's post with 👍 39 minutes ago

most multi-turn RL loops have a silent bug: you decode the model's output to detect tool calls, then re-tokenize the conversation for the next turn. BPE isn't invertible, so decode then re-encode can land on different ids. gradient ends up on tokens the model never sampled. no crash, just quietly wrong math and broken training @qgallouedec wrote a super educational blog on MITO (message-in, token-out) vs TITO (token-in, token-out) and how you might fix the problem above go read it 🤓 https://qgallouedec-tito.hf.space/

liked a model about 6 hours ago

google/gemma-4-12B-it

View all activity

Organizations

diwank 's datasets 69

diwank/IBMDebaterArgQ

Viewer • Updated Aug 9, 2023 • 23.1k • 5

diwank/good_joke-dataset

Viewer • Updated Aug 3, 2023 • 20k • 13 • 2

diwank/imaginary-nlp-dataset

Viewer • Updated Aug 2, 2023 • 1.04M • 74 • 1

diwank/orca_minis_uncensored-chatml

Viewer • Updated Jul 30, 2023 • 83.1k • 108 • 2

diwank/michelleyun-therapydata

Viewer • Updated Jul 2, 2023 • 5.19k • 5 • 3

diwank/scenario_instructor

Viewer • Updated Mar 5, 2023 • 16.4k • 8

diwank/lld

Updated Aug 9, 2022 • 22 • 1

diwank/silicone-merged

Updated Mar 6, 2022 • 73 • 1

diwank/hinglish-dump

Updated Mar 5, 2022 • 61 • 1