Beyond HBM: The Rise of HBF and the Second Wave of AI Memory
February 19, 2026Beyond HBM: The Rise of HBF and the Second Wave of AI Memory
Executive Summary
While the market remains fixated on High Bandwidth Memory (HBM) to power AI model training, a massive bottleneck is forming just over the horizon. As AI shifts from training to inference — specifically video generation, long-context agents, and RAG (Retrieval-Augmented Generation) — HBM alone becomes too expensive and capacity-constrained. The industry's answer: High Bandwidth Flash (HBF).
This is no longer speculative. In February 2026, SK Hynix published its H3 architecture paper in IEEE Computer Architecture Letters, demonstrating a hybrid HBM+HBF system that achieves 2.69x higher throughput per watt compared to HBM-only configurations. SanDisk and SK Hynix signed an MoU in August 2025 to jointly standardize HBF, with samples expected in H2 2026 and the first AI inference systems deploying HBF by early 2027.
This memo explores the technical necessity of HBF, the emerging hybrid architecture, competitive landscape, and investment implications across the semiconductor supply chain.
1. The Core Problem: The "Memory Wall" in the Age of Inference AI
Professor Joungho Kim (KAIST), known as the "Father of HBM," recently predicted that in the mature AI era, every individual will require 100TB of storage capacity. The current memory hierarchy simply cannot deliver this economically.
The HBM Limit
Top-tier HBM stacks (e.g. NVIDIA H200) max out around 141GB. This is fundamentally insufficient for:
- World Models (Sora, Runway) requiring massive temporal consistency and hundreds of GB of context per inference pass
- Long-context LLMs (1M+ token windows) where KV-cache alone can consume 50-100GB
- RAG systems that need rapid access to terabytes of vector embeddings
- Multi-modal agents processing video, audio, and text simultaneously
The cost problem is equally severe: HBM costs roughly $20-30 per GB, making a 1TB inference node economically prohibitive at $20,000-30,000 for memory alone.
What HBF Solves
HBF is not a replacement for HBM, but a complementary tier in the memory hierarchy:
| Metric | HBM3E | HBF (Target) | Ratio |
|---|---|---|---|
| Capacity per stack | ~24GB | ~192-384GB | 8-16x |
| Bandwidth | 1.2 TB/s | ~1.6 TB/s | Comparable |
| Cost per GB | $20-30 | $3-5 | 5-8x cheaper |
| Latency | ~10ns | ~1-10μs | Higher, but sufficient for inference |
| Power per GB | High | ~60% lower | Significant savings |
If HBM is the "exclusive bookshelf" next to the GPU, HBF is the "high-speed library" in the next room — large enough to hold everything, fast enough to prevent GPU starvation.
The Paradigm Shift
| Era | Period | Focus | Primary Metric |
|---|---|---|---|
| Training Era | 2024-2026 | HBM | Absolute Speed |
| Inference Era | 2027-2029 | HBM + HBF | Total Cost of Ownership (TCO) & Capacity |
The market is pricing in model creation (HBM demand). It has not yet fully priced in model operation at scale (HBF demand).
2. The H3 Architecture: HBM + HBF Hybrid (Validated)
In February 2026, SK Hynix published a landmark paper in IEEE Computer Architecture Letters: "H3: Hybrid Architecture using High Bandwidth Memory and High Bandwidth Flash for Cost-Efficient LLM Inference" (DOI: 10.1109/LCA.2026.3660969).
Architecture Design
H3 integrates both HBM and HBF within a single GPU system, leveraging their respective strengths:
- HBM handles model weights, activations, and frequently-accessed KV-cache entries (latency-sensitive, write-heavy)
- HBF stores read-only data such as large embedding tables, extended KV-cache, and pre-computed attention matrices (capacity-sensitive, read-heavy)
The key insight: in LLM inference, the majority of data accesses are reads (token generation reads from KV-cache far more than it writes). HBF's read bandwidth matches HBM, making it ideal for this workload.
Simulation Results
Using eight HBM3E stacks and eight HBF stacks alongside NVIDIA's Blackwell B200 GPU:
- 2.69x higher throughput per watt compared to HBM-only
- 18.8x larger batch sizes — dramatically more concurrent queries per GPU
- Near-linear capacity scaling without proportional cost increase
These results validate the thesis: inference-era AI needs capacity more than it needs nanosecond latency.
Three Technology Pillars
Vertical Stacking: HBF targets 300+ NAND layer stacks with TSVs (through-silicon vias) connecting them to logic layers — far beyond standard enterprise SSDs at 128-176 layers.
CXL Interface: Moving from NVMe (microsecond latency) to Compute Express Link (CXL), allowing the GPU to access HBF directly via the memory bus. CXL 3.0 (expected ratification late 2026) enables memory pooling and sharing across multiple devices.
Smart Controllers: In-storage compute for pre-processing — filtering, vector search, decompression — before data reaches the GPU. This "near-data processing" reduces data movement by up to 10x.
3. Industry Mobilization: From Concept to Commercialization
The HBF ecosystem has rapidly organized since mid-2025.
SanDisk + SK Hynix MoU (August 2025)
The two companies signed a memorandum of understanding to jointly define HBF technical specifications and promote open standardization. Target timeline:
- H2 2026: First HBF memory samples
- Early 2027: First AI inference systems using HBF
SanDisk HBF Technical Advisory Board (July 2025)
SanDisk formed an HBF Technical Advisory Board with heavyweight members:
- David Patterson — UC Berkeley Professor Emeritus, 2017 Turing Award winner, co-creator of RISC and RAID. Leads the board.
- Raja Koduri — Former Intel GPU chief, now CEO of Oxmiq Labs. Deep expertise in GPU-memory architectures.
- Alper Ilkbahar — SanDisk EVP & CTO
Patterson's involvement signals that HBF is being positioned as a foundational computing paradigm, not just a storage product.
Samsung Enters the Race (October 2025)
Samsung began early concept design work on HBF, leveraging its position as the world's largest NAND manufacturer. While Samsung's product specifications remain undisclosed, their entry validates the market and ensures competitive pressure will accelerate development.
Kioxia XL-FLASH with CXL
Kioxia demonstrated a CXL-attached flash memory expander at FMS 2025:
- 32-die BiCS FLASH Gen 8 QLC stack in a compact BGA package
- Average read latency under 10 microseconds
- Positions NAND as a direct CXL memory tier
China Angle: YMTC
China's YMTC is preparing to enter the DRAM market through a partnership with CXMT, targeting HBM manufacturing through advanced packaging. YMTC's Xtacking architecture supports integration with AI accelerators. A third fab in Wuhan (online ~2027) will dedicate roughly half its capacity to DRAM. While export controls may limit their HBF impact outside China, they represent a significant domestic demand catalyst.
4. Competitive Landscape: Who Wins the HBF War?
Tier 1: The Innovators
SK Hynix (000660.KS) — The Architect
As the current HBM leader, SK Hynix is best positioned to apply HBM packaging technologies (TSV, hybrid bonding) to NAND. Their H3 paper demonstrates they are 12-18 months ahead of competitors in system-level understanding. Their roadmap explicitly targets CXL-attached flash memory, and the MoU with SanDisk ensures ecosystem alignment.
SanDisk (SNDK) — The Pure Play
Post Western Digital spin-off, SanDisk is the most direct equity play for HBF. Key advantages:
- Unmatched NAND controller and firmware expertise
- HBF trademark (HBF™ is their registered mark)
- Technical Advisory Board with Patterson and Koduri
- Pure-play valuation re-rating from "commodity storage" to "AI infrastructure"
- Active standardization leadership via MoU with SK Hynix
Tier 2: The Scalers
Samsung Electronics (005930.KS) — The Integrated Giant
The only player with in-house Foundry, Logic, HBM, and NAND capabilities. While not first to market, Samsung's vertical integration enables lowest-cost HBF at massive scale once standards are set.
Micron (MU) — The Specialist
Leveraging legacy research from 3D XPoint and deep CXL investment. Micron's LPDDR-class power efficiency expertise is critical for inference-optimized HBF modules.
Kioxia — The Dark Horse
XL-FLASH with CXL demonstrates early product readiness. Kioxia's entire 2026 NAND production is already sold out, suggesting strong demand for their advanced flash products. If they partner with a CXL IP provider, they could emerge as a Tier 1 competitor.
5. The Supply Chain: Picks and Shovels
HBF is extraordinarily complex to build. The following supply chain layers are critical.
A. Connectivity — The Highway
Astera Labs (ALAB) — The Critical Link
HBF relies on CXL to communicate with GPUs. Astera Labs dominates the market for Retimers — chips that maintain signal integrity over high-speed connections. Without Astera, the "High Bandwidth" in HBF is physically impossible over server distances. They are the toll road for every HBF packet.
As CXL 3.0 enables memory pooling (sharing HBF across multiple GPUs in a rack), Astera's TAM expands from per-socket to per-rack.
B. Controllers — The Brain
Silicon Motion (SIMO) & Marvell (MRVL)
HBF requires complex error correction (ECC), wear leveling, and logic management far beyond standard SSD controllers.
- SIMO: Primary beneficiary if HBF adoption spreads to mid-range enterprise. Controller IP already in 70%+ of client SSDs.
- Marvell: Go-to for custom ASIC controllers for hyperscalers (Google/AWS) building custom HBF rack architectures.
C. Inspection & Packaging — The Safety Net
Camtek (CAMT) — Zero Tolerance
Stacking 300+ NAND layers with TSVs requires rigorous inspection at every level. If one layer fails, the entire stack is waste. Camtek's 2D/3D inspection equipment is mandatory for yield management. The harder HBF is to build, the more indispensable Camtek becomes.
BE Semiconductor (BESI) — The Bonder
Leader in hybrid bonding equipment — the technology required to connect ultra-dense NAND layers to logic dies at sub-micron pitch.
D. Systems — The Integrator
Pure Storage (PSTG) — The Enterprise Bridge
Using "DirectFlash" technology that eliminates the traditional SSD form factor, Pure Storage is positioned to integrate HBF components into turnkey "AI Data Lakes" for enterprise clients. They bridge the gap between raw components and usable infrastructure.
6. Investment Framework
The Thesis in Three Sentences
- SK Hynix's H3 paper validates that HBM+HBF hybrid achieves 2.69x better throughput-per-watt for LLM inference.
- SanDisk and SK Hynix are standardizing HBF with samples in H2 2026, making this a 12-18 month investment horizon.
- The supply chain winners are identifiable today, before commercialization.
Strategy Matrix
| Approach | Ticker(s) | Rationale |
|---|---|---|
| Aggressive | SK Hynix (000660.KS), SanDisk (SNDK) | Direct HBF innovators with first-mover advantage |
| Infrastructure | Astera Labs (ALAB) | Indispensable CXL connectivity — toll road for HBF |
| Safety | Camtek (CAMT) | Complexity tax — the harder HBF is to build, the more they earn |
| Scale | Samsung (005930.KS), Micron (MU) | Vertical integration and cost advantages once standards set |
| Integration | Pure Storage (PSTG) | Enterprise adoption layer for HBF-based AI data lakes |
Key Catalysts to Watch
- CXL 3.0 specification ratification (expected late 2026) — sets the interface standard for HBF
- SK Hynix / SanDisk HBF samples (H2 2026) — first physical validation
- First AI inference systems with HBF (early 2027) — proof of commercial viability
- Hyperscaler procurement signals — watch for CXL memory references in MSFT/GOOG/AMZN/META CapEx calls
- Samsung HBF product announcement — validates market size and competitive intensity
Risk Factors
- Hardware roadmaps could slip — 2026-2027 timeline is aggressive
- CXL adoption could fragment if competing standards emerge
- HBM cost reductions (HBM4 at lower price points) could narrow the TCO gap
- NAND oversupply cycles could compress margins for flash-heavy players
- China export controls may create fragmented standards
Disclaimer: This analysis implies a technological forecast for 2026-2027. Hardware roadmaps are subject to change. CrazyRich Agents provides AI-generated research for informational purposes only — not investment advice.