000660.KSDeep Divehigh

Beyond HBM: The Rise of HBF and the Second Wave of AI Memory

February 19, 2026

Beyond HBM: The Rise of HBF and the Second Wave of AI Memory

Executive Summary

While the market remains fixated on High Bandwidth Memory (HBM) to power AI model training, a massive bottleneck is forming just over the horizon. As AI shifts from training to inference — specifically video generation, long-context agents, and RAG (Retrieval-Augmented Generation) — HBM alone becomes too expensive and capacity-constrained. The industry's answer: High Bandwidth Flash (HBF).

This is no longer speculative. In February 2026, SK Hynix published its H3 architecture paper in IEEE Computer Architecture Letters, demonstrating a hybrid HBM+HBF system that achieves 2.69x higher throughput per watt compared to HBM-only configurations. SanDisk and SK Hynix signed an MoU in August 2025 to jointly standardize HBF, with samples expected in H2 2026 and the first AI inference systems deploying HBF by early 2027.

This memo explores the technical necessity of HBF, the emerging hybrid architecture, competitive landscape, and investment implications across the semiconductor supply chain.

1. The Core Problem: The "Memory Wall" in the Age of Inference AI

Professor Joungho Kim (KAIST), known as the "Father of HBM," recently predicted that in the mature AI era, every individual will require 100TB of storage capacity. The current memory hierarchy simply cannot deliver this economically.

The HBM Limit

Top-tier HBM stacks (e.g. NVIDIA H200) max out around 141GB. This is fundamentally insufficient for:

World Models (Sora, Runway) requiring massive temporal consistency and hundreds of GB of context per inference pass
Long-context LLMs (1M+ token windows) where KV-cache alone can consume 50-100GB
RAG systems that need rapid access to terabytes of vector embeddings
Multi-modal agents processing video, audio, and text simultaneously

The cost problem is equally severe: HBM costs roughly $20-30 per GB, making a 1TB inference node economically prohibitive at $20,000-30,000 for memory alone.

What HBF Solves

HBF is not a replacement for HBM, but a complementary tier in the memory hierarchy:

Metric	HBM3E	HBF (Target)	Ratio
Capacity per stack	~24GB	~192-384GB	8-16x
Bandwidth	1.2 TB/s	~1.6 TB/s	Comparable
Cost per GB	$20-30	$3-5	5-8x cheaper
Latency	~10ns	~1-10μs	Higher, but sufficient for inference
Power per GB	High	~60% lower	Significant savings

If HBM is the "exclusive bookshelf" next to the GPU, HBF is the "high-speed library" in the next room — large enough to hold everything, fast enough to prevent GPU starvation.

The Paradigm Shift

Era	Period	Focus	Primary Metric
Training Era	2024-2026	HBM	Absolute Speed
Inference Era	2027-2029	HBM + HBF	Total Cost of Ownership (TCO) & Capacity

The market is pricing in model creation (HBM demand). It has not yet fully priced in model operation at scale (HBF demand).

2. The H3 Architecture: HBM + HBF Hybrid (Validated)

In February 2026, SK Hynix published a landmark paper in IEEE Computer Architecture Letters: "H3: Hybrid Architecture using High Bandwidth Memory and High Bandwidth Flash for Cost-Efficient LLM Inference" (DOI: 10.1109/LCA.2026.3660969).

Architecture Design

H3 integrates both HBM and HBF within a single GPU system, leveraging their respective strengths:

HBM handles model weights, activations, and frequently-accessed KV-cache entries (latency-sensitive, write-heavy)
HBF stores read-only data such as large embedding tables, extended KV-cache, and pre-computed attention matrices (capacity-sensitive, read-heavy)

The key insight: in LLM inference, the majority of data accesses are reads (token generation reads from KV-cache far more than it writes). HBF's read bandwidth matches HBM, making it ideal for this workload.

Simulation Results

Using eight HBM3E stacks and eight HBF stacks alongside NVIDIA's Blackwell B200 GPU:

2.69x higher throughput per watt compared to HBM-only
18.8x larger batch sizes — dramatically more concurrent queries per GPU
Near-linear capacity scaling without proportional cost increase

These results validate the thesis: inference-era AI needs capacity more than it needs nanosecond latency.

Three Technology Pillars

Vertical Stacking: HBF targets 300+ NAND layer stacks with TSVs (through-silicon vias) connecting them to logic layers — far beyond standard enterprise SSDs at 128-176 layers.

CXL Interface: Moving from NVMe (microsecond latency) to Compute Express Link (CXL), allowing the GPU to access HBF directly via the memory bus. CXL 3.0 (expected ratification late 2026) enables memory pooling and sharing across multiple devices.

Smart Controllers: In-storage compute for pre-processing — filtering, vector search, decompression — before data reaches the GPU. This "near-data processing" reduces data movement by up to 10x.

3. Industry Mobilization: From Concept to Commercialization

The HBF ecosystem has rapidly organized since mid-2025.

SanDisk + SK Hynix MoU (August 2025)

The two companies signed a memorandum of understanding to jointly define HBF technical specifications and promote open standardization. Target timeline:

H2 2026: First HBF memory samples
Early 2027: First AI inference systems using HBF

SanDisk HBF Technical Advisory Board (July 2025)

SanDisk formed an HBF Technical Advisory Board with heavyweight members:

David Patterson — UC Berkeley Professor Emeritus, 2017 Turing Award winner, co-creator of RISC and RAID. Leads the board.
Raja Koduri — Former Intel GPU chief, now CEO of Oxmiq Labs. Deep expertise in GPU-memory architectures.
Alper Ilkbahar — SanDisk EVP & CTO

Patterson's involvement signals that HBF is being positioned as a foundational computing paradigm, not just a storage product.

Samsung Enters the Race (October 2025)

Samsung began early concept design work on HBF, leveraging its position as the world's largest NAND manufacturer. While Samsung's product specifications remain undisclosed, their entry validates the market and ensures competitive pressure will accelerate development.

Kioxia XL-FLASH with CXL

Kioxia demonstrated a CXL-attached flash memory expander at FMS 2025:

32-die BiCS FLASH Gen 8 QLC stack in a compact BGA package
Average read latency under 10 microseconds
Positions NAND as a direct CXL memory tier

China Angle: YMTC

China's YMTC is preparing to enter the DRAM market through a partnership with CXMT, targeting HBM manufacturing through advanced packaging. YMTC's Xtacking architecture supports integration with AI accelerators. A third fab in Wuhan (online ~2027) will dedicate roughly half its capacity to DRAM. While export controls may limit their HBF impact outside China, they represent a significant domestic demand catalyst.

4. Competitive Landscape: Who Wins the HBF War?

Tier 1: The Innovators

SK Hynix (000660.KS) — The Architect

As the current HBM leader, SK Hynix is best positioned to apply HBM packaging technologies (TSV, hybrid bonding) to NAND. Their H3 paper demonstrates they are 12-18 months ahead of competitors in system-level understanding. Their roadmap explicitly targets CXL-attached flash memory, and the MoU with SanDisk ensures ecosystem alignment.

SanDisk (SNDK) — The Pure Play

Post Western Digital spin-off, SanDisk is the most direct equity play for HBF. Key advantages:

Unmatched NAND controller and firmware expertise
HBF trademark (HBF™ is their registered mark)
Technical Advisory Board with Patterson and Koduri
Pure-play valuation re-rating from "commodity storage" to "AI infrastructure"
Active standardization leadership via MoU with SK Hynix

Tier 2: The Scalers

Samsung Electronics (005930.KS) — The Integrated Giant

The only player with in-house Foundry, Logic, HBM, and NAND capabilities. While not first to market, Samsung's vertical integration enables lowest-cost HBF at massive scale once standards are set.

Micron (MU) — The Specialist

Leveraging legacy research from 3D XPoint and deep CXL investment. Micron's LPDDR-class power efficiency expertise is critical for inference-optimized HBF modules.

Kioxia — The Dark Horse

XL-FLASH with CXL demonstrates early product readiness. Kioxia's entire 2026 NAND production is already sold out, suggesting strong demand for their advanced flash products. If they partner with a CXL IP provider, they could emerge as a Tier 1 competitor.

5. The Supply Chain: Picks and Shovels

HBF is extraordinarily complex to build. The following supply chain layers are critical.

A. Connectivity — The Highway

Astera Labs (ALAB) — The Critical Link

HBF relies on CXL to communicate with GPUs. Astera Labs dominates the market for Retimers — chips that maintain signal integrity over high-speed connections. Without Astera, the "High Bandwidth" in HBF is physically impossible over server distances. They are the toll road for every HBF packet.

As CXL 3.0 enables memory pooling (sharing HBF across multiple GPUs in a rack), Astera's TAM expands from per-socket to per-rack.

B. Controllers — The Brain

Silicon Motion (SIMO) & Marvell (MRVL)

HBF requires complex error correction (ECC), wear leveling, and logic management far beyond standard SSD controllers.

SIMO: Primary beneficiary if HBF adoption spreads to mid-range enterprise. Controller IP already in 70%+ of client SSDs.
Marvell: Go-to for custom ASIC controllers for hyperscalers (Google/AWS) building custom HBF rack architectures.

C. Inspection & Packaging — The Safety Net

Camtek (CAMT) — Zero Tolerance

Stacking 300+ NAND layers with TSVs requires rigorous inspection at every level. If one layer fails, the entire stack is waste. Camtek's 2D/3D inspection equipment is mandatory for yield management. The harder HBF is to build, the more indispensable Camtek becomes.

BE Semiconductor (BESI) — The Bonder

Leader in hybrid bonding equipment — the technology required to connect ultra-dense NAND layers to logic dies at sub-micron pitch.

D. Systems — The Integrator

Pure Storage (PSTG) — The Enterprise Bridge

Using "DirectFlash" technology that eliminates the traditional SSD form factor, Pure Storage is positioned to integrate HBF components into turnkey "AI Data Lakes" for enterprise clients. They bridge the gap between raw components and usable infrastructure.

6. Investment Framework

The Thesis in Three Sentences

SK Hynix's H3 paper validates that HBM+HBF hybrid achieves 2.69x better throughput-per-watt for LLM inference.
SanDisk and SK Hynix are standardizing HBF with samples in H2 2026, making this a 12-18 month investment horizon.
The supply chain winners are identifiable today, before commercialization.

Strategy Matrix

Approach	Ticker(s)	Rationale
Aggressive	SK Hynix (000660.KS), SanDisk (SNDK)	Direct HBF innovators with first-mover advantage
Infrastructure	Astera Labs (ALAB)	Indispensable CXL connectivity — toll road for HBF
Safety	Camtek (CAMT)	Complexity tax — the harder HBF is to build, the more they earn
Scale	Samsung (005930.KS), Micron (MU)	Vertical integration and cost advantages once standards set
Integration	Pure Storage (PSTG)	Enterprise adoption layer for HBF-based AI data lakes

Key Catalysts to Watch

CXL 3.0 specification ratification (expected late 2026) — sets the interface standard for HBF
SK Hynix / SanDisk HBF samples (H2 2026) — first physical validation
First AI inference systems with HBF (early 2027) — proof of commercial viability
Hyperscaler procurement signals — watch for CXL memory references in MSFT/GOOG/AMZN/META CapEx calls
Samsung HBF product announcement — validates market size and competitive intensity

Risk Factors

Hardware roadmaps could slip — 2026-2027 timeline is aggressive
CXL adoption could fragment if competing standards emerge
HBM cost reductions (HBM4 at lower price points) could narrow the TCO gap
NAND oversupply cycles could compress margins for flash-heavy players
China export controls may create fragmented standards

Disclaimer: This analysis implies a technological forecast for 2026-2027. Hardware roadmaps are subject to change. CrazyRich Agents provides AI-generated research for informational purposes only — not investment advice.