Abstract
NE-Dreamer uses a temporal transformer to predict next-step encoder embeddings for model-based reinforcement learning without requiring decoders or auxiliary supervision.
Capturing temporal dependencies is critical for model-based reinforcement learning (MBRL) in partially observable, high-dimensional domains. We introduce NE-Dreamer, a decoder-free MBRL agent that leverages a temporal transformer to predict next-step encoder embeddings from latent state sequences, directly optimizing temporal predictive alignment in representation space. This approach enables NE-Dreamer to learn coherent, predictive state representations without reconstruction losses or auxiliary supervision. On the DeepMind Control Suite, NE-Dreamer matches or exceeds the performance of DreamerV3 and leading decoder-free agents. On a challenging subset of DMLab tasks involving memory and spatial reasoning, NE-Dreamer achieves substantial gains. These results establish next-embedding prediction with temporal transformers as an effective, scalable framework for MBRL in complex, partially observable environments.
Community
Most world models learn representations by reconstructing pixels. But reconstruction isn’t necessarily aligned with control.
In this paper we explore a different idea:
➡️predict the next encoder embedding instead of reconstructing the observation.
Using a next-embedding prediction objective and temporal transformer over latents, NE-Dreamer learns temporally predictive latent states and significantly improves performance on hard navigation tasks.
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- Learning Invariant Visual Representations for Planning with Joint-Embedding Predictive World Models (2026)
- Mixture-of-World Models: Scaling Multi-Task Reinforcement Learning with Modular Latent Dynamics (2026)
- Olaf-World: Orienting Latent Actions for Video World Modeling (2026)
- Recursive Belief Vision Language Action Models (2026)
- A Lightweight Library for Energy-Based Joint-Embedding Predictive Architectures (2026)
- HanoiWorld : A Joint Embedding Predictive Architecture BasedWorld Model for Autonomous Vehicle Controller (2026)
- Semantic Belief-State World Model for 3D Human Motion Prediction (2026)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper
Collections including this paper 0
No Collection including this paper