Ego2Robot – Convert egocentric video to robot training data

msunbot1 · November 24, 2025, 11:57am

Note: The Show and Tell category is for sharing and discussing projects, showcasing your Spaces, Models, Datasets and more. We value open-source and technical details over promotional content, so focus on sharing the intricate aspects of your work.

I built Ego2Robot over the past two weeks.

What it is: An open-source pipeline that converts egocentric human video (like factory workers, warehouse operations) into robot-compatible training datasets. Think: 10,000 hours of existing video → robot foundation model pretraining data.

The problem I’m solving: Robot foundation models (like Physical Intelligence’s π₀) need diverse training data, but collecting robot demonstrations costs $100-500/hour. Meanwhile, there are thousands of hours of human work already captured on video (Egocentric-10K has 10,000 hours from 85 factories). The gap is tooling to convert it into usable formats.

What’s different: Most robotics datasets are manually collected and annotated. This pipeline is fully automated:

Quality filtering (motion + hand detection) reduces 433s video → 60s of useful manipulation
Semantic extraction using VideoMAE + CLIP (zero-shot, no fine-tuning)
Unsupervised skill discovery (found 10 distinct manipulation patterns)
Exports to LeRobot v3 format (standard for robot learning)

Technical stack: PyTorch, Transformers (VideoMAE, CLIP), MediaPipe, OpenCV, scikit-learn. Uses only pretrained models - no training from scratch.

You can try it here:

Dataset: msunbot1/ego2robot-factory-episodes · Datasets at Hugging Face (50 episodes, ~1800 frames)
Code: GitHub - msunbot/ego2robot: Egocentric Factory Episodes for Robot Foundation Model Pretraining

What I learned: Foundation models (VideoMAE, CLIP) transfer surprisingly well to industrial footage despite being trained on general video/images. Automated quality filtering is critical - you can’t scale if you’re manually reviewing every frame.

Happy to answer questions about the technical approach, design decisions, or where this goes next!

Topic		Replies	Views
Videos for training data 🤗Datasets	1	443	May 6, 2024
Rawbots & training on robot building datasets Beginners	0	206	May 6, 2023
VLA models (Post-Training Isaac GR00T N1.5) Models	2	41	November 10, 2025
VideoMask SDK: Video → Segmentation Datasets (SAM-3, Python) Show and Tell	0	8	November 27, 2025
🧑‍🤝‍🧑 Russian Storytelling Video Dataset — 700 participants, full-body videos, unscripted speech, gestures Show and Tell	0	11	May 2, 2025

Ego2Robot – Convert egocentric video to robot training data

Related topics