Kakugo datasets Collection Synthetic datasets in low resource languages created using the Kakugo pipeline (https://github.com/Peter-Devine/kakugo) • 55 items • Updated 16 days ago • 1
Kakugo models Collection Small language models trained to interact in various low-resource languages created using the Kakugo pipeline (https://github.com/Peter-Devine/kakugo) • 56 items • Updated 16 days ago • 1
Falcon-H1-Tiny Collection A series of extremely small, yet powerful language models redefining capabilities at small scale • 22 items • Updated 28 days ago • 35
view article Article Emergent Semantics Beyond Token Embeddings: A GPT-like Transformer Learns with Frozen 16‑D Binary Token-ID Embeddings (n_embed=16) Jan 6 • 1
Jamba2 Collection Jamba2 is a highly-efficient open source family of language models built for maximum reliability and steerability in the enterprise. • 3 items • Updated Jan 8 • 5
HyperCLOVA X SEED Collection HyperCLOVA X SEED is NAVER's lightweight open-source lineup with a strong focus on Korean language performance • 6 items • Updated Dec 24, 2025 • 41
Openhands Trajectories Collection Dataset of 67,074 OpenHands trajectories collected with Qwen3-Coder-480B-A35B-Instruct and two RFT checkpoints trained on the data • 3 items • Updated Dec 23, 2025 • 6