PRInTS: Reward Modeling for Long-Horizon Information Seeking
Paper
•
2511.19314
•
Published
•
6
This repository hosts the PRInTS (Process Reward via Information gain scoring and Trajectory Summarization) Qwen3-4B model. PRInTS is a generative process reward model for long-horizon information-seeking tasks.
PRInTS (Process Reward via Information gain scoring and Trajectory Summary) is a generative PRM jointly trained with two key abilities for fine-grained guidance under the challenge of context accumulation.
Key Highlights:
Qwen3ForCausalLM, fine-tuned Large Language ModelThe PRInTS (Qwen3-4B) model provides fine-grained guidance for information-seeking agents at test time, estimating step-level information-gain scores across n rollouts of the agents.
If you find this work useful, please consider citing us:
@article{lee2024prints,
title={PRInTS: Reward Modeling for Long-Horizon Information Seeking},
author={Jaewoo Lee and Archiki Prasad and Justin Chih-Yao Chen and Zaid Khan and Elias Stengel-Eskin and Mohit Bansal},
year={2025},
journal={arXiv preprint arXiv:2511.19314},
url={https://arxiv.org/abs/2511.19314},
}