State Space Models & Mamba
Structured State Space (SSM) models and the recent family of Mamba variants have accelerated research into long-range sequence modeling, efficient visual and multimodal representation, and alternatives to transformers. Below is a curated list of notable publications you provided, organized by topic with direct links to each source.
Core SSM theory & foundations
Introduces the HiPPO framework for principled recurrent memory: projection operators that preserve function approximation under online updates. HiPPO underpins many later SSM developments.
The S4 family: shows how structured state space layers can be implemented to model very long-range dependencies efficiently, with strong empirical results on sequence tasks.
Proposes simplifications to SSM-layer implementations to reduce complexity while retaining modeling power — practical guidance for lighter-weight SSMs.
Mamba family — linear-time and selective state spaces
Presents Mamba, a selective-state-space design that attains linear-time complexity while preserving the SSM inductive biases; focuses on efficiency and scalability for long sequences.
Explores applying SSMs to large-scale language modeling, detailing practical training recipes, scaling behavior, and challenges for language tasks.
Analyzes how attention-like computations emerge in Mamba architectures and examines interpretability/behavioral parallels with attention-based models.
Vision & multimodal SSM adaptations
Adapts bidirectional SSM layers for visual representation learning, trading off compute for larger receptive fields in images and vision tokens.
NeurIPS presentation of a visual SSM variant; emphasizes architectural choices that make SSMs effective on image and patch-based inputs.
Examines extensions of Mamba-style SSMs to multimodal inputs (vision + language), discussing fusion strategies and scaling considerations.
Presents a windowed selective-scan variant tailored for visual data — balancing local processing with selective long-range aggregation.
Applications & domain-specific SSMs
Applies an SSM/Mamba-inspired architecture to remote sensing imagery classification, highlighting robustness to multi-scale patterns in aerial data.
Broader surveys & theoretical connections
Survey covering SSMs as a class of transformer alternatives — architectures, algorithms, empirical comparisons, and open problems.
Explores a formal duality between transformers and SSMs, deriving generalized models and algorithmic implications for efficient implementations.
Quick takeaways
- HiPPO is foundational: many SSM designs (including S4 and Mamba) build on the HiPPO projection framework.
- S4 → Mamba progression: S4 showed that structured state spaces can handle very long contexts; Mamba and variants push further on efficiency and selective computation.
- Vision & multimodal: multiple papers adapt SSMs to images and multimodal inputs (Vision Mamba, Vmamba, Vl-mamba) — promising alternatives to some transformer-heavy pipelines.
- Surveys & theory: recent surveys and the "Transformers are SSMs" paper highlight deep connections and point to algorithmic cross-fertilization.
- Applied variants: work like Rsmamba shows domain-specific adaptations (remote sensing) are already emerging.