Gated DeltaNet-2: Decoupling Erase and Write in Linear Attention Paper • 2605.22791 • Published 9 days ago • 30
Rethinking State Tracking in Recurrent Models Through Error Control Dynamics Paper • 2605.07755 • Published 22 days ago • 23
view article Article Adding Benchmaxxer Repellant to the Open ASR Leaderboard +9 bezzam, Steveeeeeeen, eustlb, SBruccoleriAppen, jmss-appen, c-e-ford-appen, wgb14, YukaiHuang, like2026, logicbean, ally-lxl • 24 days ago • 17
Investigating Efficiently Extending Transformers for Long Input Summarization Paper • 2208.04347 • Published Aug 8, 2022 • 1
view article Article EMO: Pretraining mixture of experts for emergent modularity allenai • 21 days ago • 38
view article Article Multimodal Embedding & Reranker Models with Sentence Transformers tomaarsen • Apr 9 • 60
OlmPool Collection Collection of models from the paper "Cracks in the Foundation: Seemingly Minor Architectural Choices Impact Long Context Extension". • 26 items • Updated 29 days ago • 5
Efficient Training on Multiple Consumer GPUs with RoundPipe Paper • 2604.27085 • Published about 1 month ago • 40
Why Fine-Tuning Encourages Hallucinations and How to Fix It Paper • 2604.15574 • Published Apr 16 • 25
Olmo 3.1 Collection The latest members of the Olmo 3 family: another 3 weeks of RL for 32B Think, the 32B Instruct model, large post-training research datasets... • 9 items • Updated Dec 23, 2025 • 52
Programming with Data: Test-Driven Data Engineering for Self-Improving LLMs from Raw Corpora Paper • 2604.24819 • Published Apr 27 • 89
Laguna XS.2 Collection Designed for agentic coding and long-horizon work on a local machine. Apache 2.0. • 5 items • Updated 22 days ago • 23
Parakeet ASR Collection NeMo Parakeet ASR Models attain strong speech recognition accuracy while being efficient for inference. Available in CTC and RNN-Transducer variants. • 16 items • Updated 10 days ago • 75