Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Sign Up

All HF Hub posts

SeanLee97ย 
posted an update 1 day ago
view post
Post
4488
Our lab recently released a paper where we introduce ShadowPEFT, a new Parameter-Efficient Fine-Tuning (PEFT) paradigm tailored for edge computing scenarios.

Unlike traditional approaches such as LoRA and its variants, which inject trainable parameters directly into the weights of Transformer, requiring tight coupling with the backbone.

ShadowPEFT instead enhances the frozen large base model by adding a lightweight, centralized, pretrainable, and detachable Shadow network.
This shadow network operates in parallel with the base model, delivering learned corrections to each decoder layer. Because the shadow module is architecturally decoupled from the backbone, it can be independently trained, stored, and deployed, benefiting edge computing scenarios and edge-cloud collaboration computing.

- HF Paper: ShadowPEFT: Shadow Network for Parameter-Efficient Fine-Tuning (2604.19254)
- GitHub: https://github.com/ShadowLLM/shadow-peft
- HF Collection: https://huggingface.co/collections/shadow-llm/shadow-peft-models
  • 7 replies
ยท
dealermatt72ย 
posted an update 2 days ago
view post
Post
6551
Hey Hugging Face community ๐Ÿ‘‹

My name is M. I'm a solo founder and self-taught developer based in Houston, TX. I build AI-powered apps โ€” I have an iOS app called DeFilter currently in App Store review, a security scanning platform called Sentinel, and a job marketplace called HireHuman.fyi for connecting humans with companies that prefer non-AI workers.

I'm also a poker dealer by night, which means I think a lot about reading situations in real time โ€” and that's exactly what sparked this idea.

I'm not the most technical person in the room. But I have a vision, I have drive, and I believe the best projects get built when people with different skills come together around a shared idea.

That's why I'm posting here. I want to build this with the community.

โ€” M (@dealermatt )

  • 3 replies
ยท
anakin87ย 
posted an update 2 days ago
view post
Post
10253
How LLM training with RL Environments works?

It all starts with ๐—ฅ๐—ฒ๐—ถ๐—ป๐—ณ๐—ผ๐—ฟ๐—ฐ๐—ฒ๐—บ๐—ฒ๐—ป๐˜ ๐—Ÿ๐—ฒ๐—ฎ๐—ฟ๐—ป๐—ถ๐—ป๐—ด ๐˜„๐—ถ๐˜๐—ต ๐—ฉ๐—ฒ๐—ฟ๐—ถ๐—ณ๐—ถ๐—ฎ๐—ฏ๐—น๐—ฒ ๐—ฅ๐—ฒ๐˜„๐—ฎ๐—ฟ๐—ฑ๐˜€
- question asked
- model generates reasoning + answer
- answer checked against ground truth
- reward drives RL training


In this setup, the environment is simple: fixed questions and answers, rollout logic, reward(s)

Consider a more complex tic-tac-toe env โŒโญ•
It adds:
- dynamic game generation/handling
- tunable opponent skill
- multi-turn interactions

(envs can also include tools)

---

What happens at training?

We use ๐—š๐—ฟ๐—ผ๐˜‚๐—ฝ ๐—ฅ๐—ฒ๐—น๐—ฎ๐˜๐—ถ๐˜ƒ๐—ฒ ๐—ฃ๐—ผ๐—น๐—ถ๐—ฐ๐˜† ๐—ข๐—ฝ๐˜๐—ถ๐—บ๐—ถ๐˜‡๐—ฎ๐˜๐—ถ๐—ผ๐—ป with a tic-tac-toe env

No critic model needed, the group is the baseline
Simpler than PPO

1๏ธโƒฃ Rollout generation: from the same board, model plays N games via sampling
2๏ธโƒฃ Each game scored with deterministic rewards (win, format, ...)
3๏ธโƒฃ Mean score computed across the group
4๏ธโƒฃ Each rollout's advantage = its score minus the group mean
5๏ธโƒฃ Model updated to favor trajectories above baseline

๐Ÿ” Repeat


For a deep dive, check out
๐ŸŒฑ https://github.com/anakin87/llm-rl-environments-lil-course
a free hands-on course on RL environments for LLMs
  • 2 replies
ยท
Ujjwal-Tyagiย 
posted an update 2 days ago
view post
Post
2699
We are hiring at Shirova AI. We need AI researchers and engineers to work in our research lab. Shirova AI is a research lab in India, so we can help our researchers move to nearby workspaces or let them work from home without ever coming to the lab. We're building our founding team, so the pay will be good. You can learn, so don't hesitate to mail us at: careers@shirova.com
sequelboxย 
posted an update 3 days ago
view post
Post
1773
NEW RELEASE: Esper 3.1 for Qwen 3.6!

- Your dedicated DevOps expert: Esper 3.1 maximizes DevOps and architecture helpfulness, powered by high-difficulty DevOps and architecture data generated with DeepSeek-V3.1-Terminus!
- Improved coding performance: challenging code-reasoning datasets stretch DeepSeek-V3.1-Terminus and DeepSeek-V3.2 to the limits, allowing Esper 3.1 to tackle harder coding tasks!
- AI to build AI: our high-difficulty AI expertise data boosts Esper 3.1's MLOps, AI architecture, AI research, and general reasoning skills.

Get it now: ValiantLabs/Qwen3.6-35B-A3B-Esper3.1

We're working on more finetunes for the newest Qwen and Gemma models, and we've also started working on the agentic-first datasets for Esper 4 :) we're going to make open source better and better for your work!

Please note that real life financial and family concerns have popped up and have imposed unfortunate limitations on our ability to devote time to our open-source work :( If you would like to see Esper 4 and our other releases speed up instead of slowing down, this is the best way you can help us: sequelbox/SupportOpenSource

No matter what, we'll keep fighting and we won't give up!

with love,
allegra
consome2ย 
posted an update 3 days ago
view post
Post
3179
Built a small site for tracking speech-to-speech, full-duplex, and audio foundation model work.
It covers models, benchmarks, datasets, and some blog posts to organize the landscape in one place.

Still early, but sharing in case it is useful:
https://www.fullduplex.ai/

If you spot missing entries or mistakes, I would really appreciate corrections.
  • 2 replies
ยท
ajibawa-2023ย 
posted an update 3 days ago
view post
Post
1129
Ruby-Code-Large
Dataset : ajibawa-2023/Ruby-Code-Large

Ruby-Code-Large is a large-scale corpus of Ruby programming language source code comprising 331,743 code samples stored in .jsonl format. The dataset is designed to support research and development in large language model (LLM) pretraining, static analysis, web application development, and software engineering automation within the Ruby ecosystem.

By offering a substantial, language-focused dataset, Ruby-Code-Large enables targeted experimentation in dynamic programming, object-oriented design, and rapid application developmentโ€”areas where Ruby is widely used, particularly in web frameworks and scripting.

Ruby-Code-Large addresses the lack of large, curated, Ruby-specific datasets, enabling focused research on expressive syntax, metaprogramming, and high-level abstractions.
eaddarioย 
posted an update 4 days ago
view post
Post
166
Experimental global target bitsโ€‘perโ€‘weight quantization of google/gemma-4-E2B-it, google/gemma-4-E4B-it and google/gemma-4-26B-A4B-it

Unlike standard llama.cpp quantizations that rely on fixed type heuristics (e.g., Q4_K_M), the Target BPW approach optimizes per-tensor precision where it matters the most, and produces high quality models that meet a precise global file size target.

Key Advantages:
- VRAM Maximization: Can generate high quality models sized exactly to fit hardware constraints (e.g., fitting the model into exactly 24GB VRAM).
- Data-Driven Precision: Quantization mix is determined by actual weight error sensitivity rather than hardcoded rules, often yielding better PPL/KLD size trade-offs.

Full benchmarks (PPL, KLD, ARC, MMLU, etc.) and methodology in the models' cards

eaddario/gemma-4-E2B-it-GGUF
eaddario/gemma-4-E4B-it-GGUF
eaddario/gemma-4-26B-A4B-it-GGUF
sergiopaniegoย 
posted an update 8 days ago
view post
Post
1090
Earlier this month, Apple introduced Simple Self-Distillation: a fine-tuning method that improves models on coding tasks just by sampling from the model and training on its own outputs with plain cross-entropy

Andโ€ฆ it's already supported in TRL, built by Kashif Rasul. you can really feel the pace of development in the team ๐ŸŽ

Paper by Ruixiang ZHANG, He Bai, Huangjie Zheng, Navdeep Jaitly, Ronan Collobert, Yizhe Zhang at Apple ๐ŸŽ

How it works: the model generates completions at a training-time temperature (T_train) with top_k/top_p truncation, then fine-tunes on them with plain cross-entropy. no labels or verifier needed

You can try it right away with this ready-to-run example (Qwen3-4B on rStar-Coder):
https://github.com/huggingface/trl/blob/main/trl/experimental/ssd/ssd.py
or benchmark a checkpoint with the eval script:
https://github.com/huggingface/trl/blob/main/trl/experimental/ssd/ssd_eval.py

One neat insight from the paper: T_train and T_eval compose into an effective T_eff = T_train ร— T_eval, so a broad band of configs works well. even very noisy samples still help

Want to dig deeper?

Paper: Embarrassingly Simple Self-Distillation Improves Code Generation (2604.01193)
Trainer docs: https://huggingface.co/docs/trl/main/en/ssd_trainer
victorย 
posted an update 9 days ago
view post
Post
4862
Want to share my enthusiasm for zai-org/GLM-5.1 here too ๐Ÿ”ฅ

I think we have it: our open source Claude Code = GLM-5.1 + Pi (https://pi.dev/) - Built a Three.js racing game to eval and it's extremely impressive. Thoughts:

- One-shot car physics with real drift mechanics (this is hard)

- My fav part: Awesome at self iterating (with no vision!) created 20+ Bun.WebView debugging tools to drive the car programmatically and read game state. Proved a winding bug with vector math without ever seeing the screen

- 531-line racing AI in a single write: 4 personalities, curvature map, racing lines, tactical drifting. Built telemetry tools to compare player vs AI speed curves and data-tuned parameters

- All assets from scratch: 3D models, procedural textures, sky shader, engine sounds, spatial AI audio!

- Can do hard math: proved road normals pointed DOWN via vector cross products, computed track curvature normalized by arc length to tune AI cornering speed

You are going to hear about this model a lot in the next months - open source let's go - and thanks z-ai๐Ÿš€๐Ÿš€
  • 4 replies
ยท