Recent advances in reasoning-centric large language models (LLMs) have expanded the scope of reinforcement learning (RL) beyond narrow, task-specific applications, enabling broader generalization and reasoning capabilities. However,…
From Exploration Collapse to Predictable Limits: Shanghai AI Lab Proposes Entropy-Based Scaling Laws for Reinforcement Learning in LLMs
