VLA models (Post-Training Isaac GR00T N1.5)

Tamiraa · November 7, 2025, 5:00am

Hello everyone,

Post-Training Isaac GR00T N1.5, is it possible to train custom robot and custom real world dataset?

Thanks in advance

John6666 · November 7, 2025, 8:41am

Seems possible?

You can post-train Isaac GR00T N1.5 on a custom robot using your own real-world dataset. NVIDIA’s public model card states N1.5 is adaptable via post-training; the Hugging Face tutorial shows a complete run on a new embodiment; LeRobot’s docs explain the dataset format, processors, and the GR00T policy integration. (Hugging Face)

What this actually means

Custom robot = new embodiment. You describe your robot’s observations and actions, then fine-tune with embodiment_tag="new_embodiment" so GR00T learns that interface. This is the documented path when your hardware wasn’t in pretraining. (GitHub)
Custom dataset = LeRobot format. Record or convert your demos to LeRobotDataset v3 (Parquet + MP4, plus meta/ JSON). You can stream from the Hub or load locally. (Hugging Face)
Policy I/O. Inputs are camera frames + proprio + a text instruction; outputs are continuous-valued action vectors you scale and send to your controller. (Hugging Face)

Background, fast

GR00T N1.5 is a vision-language-action policy: VLM encoders + a flow-matching transformer that predicts action chunks conditioned on vision, language, and state. Designed for cross-embodiment adaptation via post-training. License flagged non-commercial for the public 3B checkpoint. (Hugging Face)
LeRobot is the training/runtime scaffold: unified dataset API, processors that map robots ↔ datasets, and a maintained GR00T N1.5 policy integration. (Hugging Face)

End-to-end recipe (clear and explicit)

1) Choose a base and verify terms

Model: nvidia/GR00T-N1.5-3B. Confirm “ready for non-commercial use.” (Hugging Face)

2) Record or port your data

Use LeRobot v3 tools to record directly to Hub/local, or convert an existing set to v3. v3 uses Parquet for state/action and MP4 for video, with meta/ describing schema, FPS, and episode offsets. Supports StreamingLeRobotDataset to train without downloading. (Hugging Face)
Example starter datasets you can imitate for structure: SO-101 pick-place, SO-100 pick-place, DROID v1.0.1 (LeRobot ports). (Hugging Face)

3) Describe your embodiment

Add meta/modality.json to your dataset. Copy an example then edit camera names, state keys, and action dims. In the official tutorial this is Step 1.2. For new robots, set embodiment_tag to new_embodiment. (Hugging Face)
GitHub issues and docs confirm this tag is required when your embodiment wasn’t in pretrain. (GitHub)

4) Fine-tune

The HF tutorial provides a runnable command (scripts/gr00t_finetune.py). It notes ~25 GB VRAM for defaults and shows flags to reduce memory if needed. (Hugging Face)

# Fine-tune GR00T N1.5 on your LeRobot v3 dataset
# refs:
#  blog: https://huggingface.co/blog/nvidia/gr00t-n1-5-so101-tuning
#  repo: https://github.com/NVIDIA/Isaac-GR00T
python scripts/gr00t_finetune.py \
  --dataset-path /data/my_robot_v3_dataset \
  --num-gpus 1 \
  --output-dir ./checkpoints/my_robot_n1p5 \
  --max-steps 10000 \
  --data-config so100_dualcam \
  --video-backend torchvision_av

5) Evaluate and deploy

Use the tutorial’s open-loop eval and inference server + client scripts. Map the model’s action vector to your controller API (ROS2 or vendor SDK). Keep units and bounds consistent. (Hugging Face)

Data and format details you must get right

LeRobot v3 layout: meta/info.json (schema, fps), meta/stats.json (norm stats), meta/episodes/ (episode offsets), data/ Parquet shards, videos/ per-camera MP4 shards. Episode views are reconstructed from metadata. (Hugging Face)
Loading/streaming: LeRobotDataset(...) for cached local; StreamingLeRobotDataset(...) for on-the-fly training. Both return dicts with keys like observation.images.front, observation.state, action. (Hugging Face)
Processors: LeRobot processors define the glue between your hardware and dataset keys; start from the official “Processors for Robots and Teleoperators.” (Hugging Face)

Known pitfalls and fixes

Parquet/MP4 vs expected layout: Some users hit loader errors if the dataset layout doesn’t match the pipeline’s expectations. Align your modality.json keys and verify the v3 loader version. (GitHub)
Large action spaces: Reports of overflow/instability at the start of training when action_dim is large; restarts or stabilization recipes help. Monitor loss/grad norms. (GitHub)
Camera name mismatches: If your data uses tip but the example uses wrist, update modality.json and processors accordingly. Community guides show concrete edits. (Zenn)

Evidence that this path works

Official tutorial post-trains N1.5 on SO-101 from teleop demos, including dataset prep, finetune, eval, deploy. (Hugging Face)
Public fine-tunes: Dozens of community N1.5 checkpoints on Hugging Face confirm the workflow is repeatable on varied tasks and rigs. (Hugging Face)

Minimal working example (data load → fine-tune)

# pip install "lerobot>=0.4.0"  # docs: https://huggingface.co/docs/lerobot
# refs:
#   dataset: https://huggingface.co/datasets/lerobot/svla_so101_pickplace
#   tutorial: https://huggingface.co/blog/nvidia/gr00t-n1-5-so101-tuning
from lerobot.datasets.streaming_dataset import StreamingLeRobotDataset

repo_id = "lerobot/svla_so101_pickplace"  # small, clean, GR00T-ready example
ds = StreamingLeRobotDataset(repo_id)     # streams from the Hub
sample = ds[0]                            # dict with observation.*, action

Run the tutorial’s gr00t_finetune.py with your dataset path after you validate keys and shapes. (Hugging Face)

Starter picks on Hugging Face

Models

nvidia/GR00T-N1.5-3B — base policy for post-training; model card explicitly mentions post-training support and shows I/O. (Hugging Face)
Community finetunes (reference configs, tasks, dual-cam) — browse the gr00t_n1_5 filter on the Hub. (Hugging Face)

Datasets

lerobot/svla_so101_pickplace and lerobot/svla_so100_pickplace — small, proven with the official tutorial and LeRobot loaders. Good for smoke tests. (Hugging Face)
lerobot/droid_1.0.1 — large in-the-wild manipulation demos in LeRobot format. Useful for diversity or pretraining. (Hugging Face)

Docs you will use repeatedly

LeRobot Dataset v3 design, directory layout, streaming API, and migration notes. (Hugging Face)
GR00T N1.5 policy integration page in LeRobot. (Hugging Face)

Synthetic-data booster (optional)

If you are data-limited, NVIDIA’s GR00T-Dreams blueprint shows a pipeline to generate large synthetic trajectory sets and mix them with real demos for post-training. (NVIDIA Developer)

Quick checklist

Base checkpoint chosen and license verified. (Hugging Face)
Real demos recorded or ported to LeRobot v3. (Hugging Face)
meta/modality.json matches your sensors, action dims, and camera names; embodiment_tag="new_embodiment". (Hugging Face)
Finetune with the official script; monitor VRAM and training stability. (Hugging Face)
Evaluate and deploy via the server/client example; map action vectors to your controller. (Hugging Face)

Short, curated references (grouped)

Official + model cards

GR00T N1.5 model card: post-training support, I/O, license. (Hugging Face)
GR00T GitHub repo (scripts, examples, “deeper understanding”). (GitHub)

Step-by-step guides

HF tutorial: Post-Training N1.5 on SO-101 (dataset prep, modality.json, training, eval, deploy, VRAM). (Hugging Face)
LeRobot: GR00T N1.5 Policy integration. (Hugging Face)
LeRobot: Dataset v3 docs (record, stream, migrate, directory layout). (Hugging Face)