LatentRM

The Latent Reward Model (LatentRM) is a learned scorer designed for latent reasoning models that reason in continuous hidden space. LatentRM provides the missing aggregation signal for parallel test-time scaling in latent models, enabling techniques such as best-of-N and beam search without explicit token-level probabilities.

Paper Link👁️

GitHub Repo🐙

Citation

@misc{you2025paralleltesttimescalinglatent,
      title={Parallel Test-Time Scaling for Latent Reasoning Models}, 
      author={Runyang You and Yongqi Li and Meng Liu and Wenjie Wang and Liqiang Nie and Wenjie Li},
      year={2025},
      eprint={2510.07745},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2510.07745}, 
}
Downloads last month
11
Safetensors
Model size
0.1B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for dd101bb/latentRM

Finetuned
(2074)
this model

Dataset used to train dd101bb/latentRM