Neural Architecture Search — a compact, categorized walkthrough

Neural Architecture Search (NAS) automates design of neural network architectures, trading manual trial-and-error for systematic search. Below is a curated, categorized list of influential papers, frameworks, benchmarks, and practical notes — each linked to a stable online source where possible. Use this as a jumping-off point for survey reading, trying out code, or building hardware-aware NAS flows.

Surveys & Reviews

A Survey on Neural Architecture Search — arXiv

A broad overview of search spaces, search strategies (RL, evolution, gradient, Bayesian), and performance estimation techniques.

NAS: Past, Present and Future — arXiv

Discusses historical development, practical challenges (compute cost, reproducibility), and open directions such as hardware-aware NAS.

Benchmarks and Best Practices for NAS — arXiv

Explores evaluation methodology, reproducibility issues, and the role of standardized benchmarks (NAS-Bench family).

Core NAS Methods & Milestones

Neural Architecture Search with Reinforcement Learning — arXiv (Zoph & Le)

One of the first influential papers applying reinforcement learning to automatically generate high-performing CNN cells.

ENAS: Efficient Neural Architecture Search via Parameter Sharing — arXiv

Introduced parameter sharing to drastically lower NAS compute cost by reusing weights across sampled architectures.

DARTS: Differentiable Architecture Search — arXiv

Formulated NAS as a differentiable optimization problem, enabling gradient-based search over continuous architecture parameters.

ProxylessNAS — arXiv

Performs NAS directly on target task and hardware (no proxy tasks), focusing on latency-aware architectures for mobile devices.

Once-for-All (OFA) — arXiv

Trains a single, elastic supernet that supports many subnetworks; enables instant specialization for hardware/accuracy trade-offs at deployment time.

Benchmarks & Reproducibility

NAS-Bench-101: Towards Reproducible NAS — arXiv

A tabular benchmark exposing the full search space and exact performance for many architectures to enable fair comparison of NAS algorithms.

NAS-Bench-201 — arXiv

A compact, reproducible benchmark across multiple datasets (CIFAR-10/100, ImageNet-16-120) designed for fast algorithm evaluation and ablation studies.

Hardware-Aware & Resource-Constrained NAS

FBNet: Hardware-Aware NAS — arXiv

Jointly optimizes accuracy and latency using a differentiable search guided by measured hardware cost; targets mobile inference.

MNAS: Platform-Aware NAS — arXiv

One of the early works to incorporate platform latency directly into the reward/surrogate objective to produce deployable models.

HA-NAS and Related Works — arXiv

Representative set of methods that combine search with explicit hardware models, multi-objective optimization, or latency tables for fast evaluation.

Frameworks & Tooling

AutoKeras — Project

An accessible AutoML library that includes NAS primitives and high-level APIs for tabular, image and text tasks.

NNI (Neural Network Intelligence) — Microsoft

Offers a broad suite of NAS algorithms, hyperparameter tuning, and built-in support for hardware-aware experiments and distributed search.

Once-for-All / PyTorch Implementations — Various repos

Community implementations of OFA, DARTS, and ProxylessNAS that make experimentation and deployment easier on modern toolchains.

Applications & Case Studies

AutoML for Efficient Vision Models — arXiv

Case studies showing NAS-designed models for mobile vision, object detection, and few-shot learning, demonstrating accuracy/latency trade-offs.

NAS for NLP and Transformers — arXiv

Adapts NAS ideas to transformer architectures and sequence tasks, focusing on block-level search and pruning for efficiency.

Quick Takeaways

Choose your search strategy by compute budget: RL/evolution often produce strong results but can be expensive; parameter sharing, surrogate models, and differentiable methods reduce cost.
Benchmarks matter: Use NAS-Bench datasets and standardized pipelines for reproducible comparisons and ablations.
Hardware must be in the loop: For deployment, include latency/power/size as explicit objectives or use hardware-aware layers like ProxylessNAS/FBNet/OFA flows.
One-shot & supernet approaches: Offer dramatic speed-ups (train once, derive many subnets) but require careful fairness and evaluation strategies to avoid bias.
Practical tip: start with constrained search spaces (cell-based, channel-level) and cheap proxies (smaller datasets, early stopping) before scaling up.