Large Reasoning Models (LRMs)
Large Reasoning Models (LRMs) represent a new phase beyond traditional LLMs, emphasizing explicit reasoning,
multi-step search, and tool integration. This roundup highlights recent key papers that define the emerging
LRM paradigm — from agentic reasoning and retrieval-augmented generation to structured problem-solving frameworks.
Foundations & Core Concepts
Introduces Search-o1, an agentic framework that fuses internal reasoning steps with web-scale retrieval and planning, demonstrating significant gains on reasoning benchmarks and knowledge-intensive tasks.
Extends RAG into a multi-hop reasoning pipeline where retrieval and generation interleave dynamically — each step guiding the next retrieval query in a self-improving reasoning chain.
Presents the DeepSeek-R1 model, showing that reinforcement-trained reasoning agents can outperform supervised LLMs on symbolic and analytical reasoning tasks.
A milestone release illustrating the transition from large language models to large reasoning models (LRMs), with emphasis on test-time reasoning, search, and verification loops.
Reasoning Frameworks & Cognitive Modeling
Explores how LLMs can internally simulate multi-step reasoning via Chain-of-Thought prompting, serving as an early conceptual foundation for LRMs.
Proposes a structured search over reasoning paths, allowing models to explore and evaluate multiple possible thoughts before converging — a precursor to modern LRM search strategies.
Combines reasoning traces with tool usage, enabling LLMs to interleave thought and action — a foundation for agentic LRM architectures.
Shows that models can self-supervise their own tool usage patterns, effectively learning how to call APIs and calculators to improve reasoning quality.
Retrieval-Augmented & Knowledge-Guided Reasoning
The seminal RAG model integrates retrieval and generation, serving as a foundation for later LRM retrieval chains like CRAG and Search-o1.
Introduces a self-improving RAG pipeline where the model evaluates and refines its own retrievals — bridging the gap toward agentic, self-correcting reasoning systems.
Benchmarks & Evaluation of Reasoning
Defines a suite of tasks designed to push the limits of LLM reasoning and abstraction, widely used for evaluating emerging LRMs.
Mathematical and multi-step logic benchmarks now serve as standard indicators for LRM test-time reasoning quality and planning ability.
Quick Takeaways
- Agentic reasoning: LRMs like Search-o1 and o1 blend model reasoning with active search and tool use.
- Retrieval integration: CRAG and Self-RAG evolve traditional RAG into adaptive, iterative retrieval systems.
- Structured cognition: Tree of Thoughts and ReAct give LRMs deliberation and action patterns akin to planning agents.
- Benchmarks matter: GSM8K and BIG-Bench Hard drive measurable progress in reasoning fidelity.
- From LLMs → LRMs: The paradigm shift centers on test-time reasoning, search, and self-correction loops, not just scale.