OMEGA: A Structured Math Benchmark to Probe the Reasoning Limits of LLMs

Introduction to Generalization in Mathematical Reasoning

Large-scale language models with long CoT reasoning, such as DeepSeek-R1, have shown good results on Olympiad-level mathematics. However, models trained through Supervised…

Continue Reading