Large Reasoning Models (LRMs)

Large Reasoning Models (LRMs) represent a new phase beyond traditional LLMs, emphasizing explicit reasoning, multi-step search, and tool integration. This roundup highlights recent key papers that define the emerging LRM paradigm — from agentic reasoning and retrieval-augmented generation to structured problem-solving frameworks.


Foundations & Core Concepts

Search-o1: Agentic Search-Enhanced Large Reasoning Models — arXiv (2025)
Introduces Search-o1, an agentic framework that fuses internal reasoning steps with web-scale retrieval and planning, demonstrating significant gains on reasoning benchmarks and knowledge-intensive tasks.
Chain-of-Retrieval Augmented Generation (CRAG) — arXiv (2025)
Extends RAG into a multi-hop reasoning pipeline where retrieval and generation interleave dynamically — each step guiding the next retrieval query in a self-improving reasoning chain.
DeepSeek-R1: Towards Scalable Reasoning with Large Language Models — arXiv (2025)
Presents the DeepSeek-R1 model, showing that reinforcement-trained reasoning agents can outperform supervised LLMs on symbolic and analytical reasoning tasks.
OpenAI o1: Towards Large Reasoning Models — OpenAI (2024)
A milestone release illustrating the transition from large language models to large reasoning models (LRMs), with emphasis on test-time reasoning, search, and verification loops.

Reasoning Frameworks & Cognitive Modeling

ThinkGPT: Harnessing Chain-of-Thought Reasoning in LLMs — arXiv
Explores how LLMs can internally simulate multi-step reasoning via Chain-of-Thought prompting, serving as an early conceptual foundation for LRMs.
Tree of Thoughts: Deliberate Problem Solving with Large Language Models — arXiv
Proposes a structured search over reasoning paths, allowing models to explore and evaluate multiple possible thoughts before converging — a precursor to modern LRM search strategies.
ReAct: Synergizing Reasoning and Acting in Language Models — arXiv
Combines reasoning traces with tool usage, enabling LLMs to interleave thought and action — a foundation for agentic LRM architectures.
Toolformer: Language Models Can Teach Themselves to Use Tools — Meta AI
Shows that models can self-supervise their own tool usage patterns, effectively learning how to call APIs and calculators to improve reasoning quality.

Retrieval-Augmented & Knowledge-Guided Reasoning

Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks — arXiv (RAG)
The seminal RAG model integrates retrieval and generation, serving as a foundation for later LRM retrieval chains like CRAG and Search-o1.
Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection — arXiv (Meta AI)
Introduces a self-improving RAG pipeline where the model evaluates and refines its own retrievals — bridging the gap toward agentic, self-correcting reasoning systems.

Benchmarks & Evaluation of Reasoning

BIG-Bench Hard: Challenging LLMs on Reasoning and Generalization — Google
Defines a suite of tasks designed to push the limits of LLM reasoning and abstraction, widely used for evaluating emerging LRMs.
MATH-500 and GSM8K: Reasoning Benchmarks for LRMs — arXiv
Mathematical and multi-step logic benchmarks now serve as standard indicators for LRM test-time reasoning quality and planning ability.

Quick Takeaways