ThinkPRM: A Generative Process Reward Models for Scalable Reasoning Verification

Reasoning with LLMs can benefit from utilizing more test compute, which depends on high-quality process reward models (PRMs) to select promising paths for search or ranking. PRMs score problem-solution pairs to indicate whether the…

Continue Reading