Large Reasoning Models (LRMs)

Large Reasoning Models (LRMs) represent a new phase beyond traditional LLMs, emphasizing explicit reasoning, multi-step search, and tool integration. This roundup highlights recent key papers that define the emerging LRM paradigm — from agentic reasoning and retrieval-augmented generation to structured problem-solving frameworks.

Foundations & Core Concepts

Search-o1: Agentic Search-Enhanced Large Reasoning Models — arXiv (2025)

Introduces Search-o1, an agentic framework that fuses internal reasoning steps with web-scale retrieval and planning, demonstrating significant gains on reasoning benchmarks and knowledge-intensive tasks.

Chain-of-Retrieval Augmented Generation (CRAG) — arXiv (2025)

Extends RAG into a multi-hop reasoning pipeline where retrieval and generation interleave dynamically — each step guiding the next retrieval query in a self-improving reasoning chain.

DeepSeek-R1: Towards Scalable Reasoning with Large Language Models — arXiv (2025)

Presents the DeepSeek-R1 model, showing that reinforcement-trained reasoning agents can outperform supervised LLMs on symbolic and analytical reasoning tasks.

OpenAI o1: Towards Large Reasoning Models — OpenAI (2024)

A milestone release illustrating the transition from large language models to large reasoning models (LRMs), with emphasis on test-time reasoning, search, and verification loops.

Reasoning Frameworks & Cognitive Modeling

ThinkGPT: Harnessing Chain-of-Thought Reasoning in LLMs — arXiv

Explores how LLMs can internally simulate multi-step reasoning via Chain-of-Thought prompting, serving as an early conceptual foundation for LRMs.

Tree of Thoughts: Deliberate Problem Solving with Large Language Models — arXiv

Proposes a structured search over reasoning paths, allowing models to explore and evaluate multiple possible thoughts before converging — a precursor to modern LRM search strategies.

ReAct: Synergizing Reasoning and Acting in Language Models — arXiv

Combines reasoning traces with tool usage, enabling LLMs to interleave thought and action — a foundation for agentic LRM architectures.

Toolformer: Language Models Can Teach Themselves to Use Tools — Meta AI

Shows that models can self-supervise their own tool usage patterns, effectively learning how to call APIs and calculators to improve reasoning quality.

Retrieval-Augmented & Knowledge-Guided Reasoning

Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks — arXiv (RAG)

The seminal RAG model integrates retrieval and generation, serving as a foundation for later LRM retrieval chains like CRAG and Search-o1.

Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection — arXiv (Meta AI)

Introduces a self-improving RAG pipeline where the model evaluates and refines its own retrievals — bridging the gap toward agentic, self-correcting reasoning systems.

Benchmarks & Evaluation of Reasoning

BIG-Bench Hard: Challenging LLMs on Reasoning and Generalization — Google

Defines a suite of tasks designed to push the limits of LLM reasoning and abstraction, widely used for evaluating emerging LRMs.

MATH-500 and GSM8K: Reasoning Benchmarks for LRMs — arXiv

Mathematical and multi-step logic benchmarks now serve as standard indicators for LRM test-time reasoning quality and planning ability.

Foundations & Core Concepts

Reasoning Frameworks & Cognitive Modeling

Retrieval-Augmented & Knowledge-Guided Reasoning

Benchmarks & Evaluation of Reasoning

Quick Takeaways