Curated topic hubs

Read the blog by engineering cluster, not just by publish date

These hubs are small, curated archive pages built from the current article metadata. Some are broad system lanes, and others are narrower entity hubs for GB10, Modal, Mamba3, Megatron layout work, or TPU sparse attention. Each one starts with the shortest useful reading order rather than a raw tag dump.

Entity Hub
GB10 and Blackwell Bring-Up
A curated GB10 and Blackwell reading path: consumer-versus-datacenter tensor paths, driver-visible false positives, arch-patch repros, and the serving or precision choices that survived contact with the hardware.
9 curated articlesStarts with: Training the MegaCpp SLM Ensemble on GB10: a Grace Blackwell war storyLatest article date: 2026-04-20
Best if you care about what GB10 actually proved, where tcgen05 evidence stops, and how those hardware limits changed the rest of the MegaCpp stack.
Open hub
Entity Hub
Modal Training and Benchmark Operations
A curated Modal reading path: when the rented-GPU surface was useful, what broke on multi-GPU launches, how receipts were recorded, and how we kept the lane debuggable.
7 curated articlesStarts with: Modal Training Platform OverviewLatest article date: 2026-04-18
Best if you need the MegaCpp Modal lane as an engineering surface, not a marketing comparison.
Open hub
Entity Hub
Mamba3 Architecture, Kernels, and Runtime Tradeoffs
A curated Mamba3 reading path: why MegaCpp kept a hybrid stack, how the kernels evolved across CUDA, TileLang, and TPU, and where the runtime wins actually held.
11 curated articlesStarts with: Mamba 3 + Transformers: Why MegaCpp Uses a Hybrid Stack for C++Latest article date: 2026-04-19
Best if you want the Mamba3 lane as one connected engineering story instead of scattered kernel notes.
Open hub
Topic Hub
MLA Integration, Dispatch, and Weight Absorption
A curated MLA reading path: the weight-absorption contract, Megatron-safe integration boundaries, dispatch and FP8 edges, and the adapter surfaces that keep MLA connected to the rest of the stack.
9 curated articlesStarts with: MLA weight absorption: what we kept and what we dropped for the C++ specialistsLatest article date: 2026-04-19
Best if you want MLA as a real system boundary with concrete implementation tradeoffs, not just as a glossary term.
Open hub
Topic Hub
Evaluation, Benchmarks, and Verifier Loops
A curated evaluation reading path: verifier-first harnesses, ablation structure, benchmark receipts, and the evidence rules that keep comparisons from collapsing into anecdotes.
12 curated articlesStarts with: How We Evaluate the MegaCpp SLM Ensemble on Real C++ WorkLatest article date: 2026-04-18
Best if you want to separate trustworthy evidence from vague score reporting across the MegaCpp archive.
Open hub
Entity Hub
Megatron Parallelism and Layout Boundaries
A curated Megatron reading path: the parallelism map, what actually splits, how NVIDIA and TPU wrappers differ, and the migration surfaces around NAM56R-style layouts.
9 curated articlesStarts with: EP, PP, TP, CP, SP, DP: The Parallelism Map We Actually UseLatest article date: 2026-04-19
Best if you keep hitting TP, PP, EP, FSDP2, or Megatron wrapper terms across the blog and want one stable reading path.
Open hub
Entity Hub
TPU Sparse Attention and Pallas Kernels
A curated TPU sparse-attention reading path: block-sparse contracts, Pallas kernel choices, SPMD sharding, and the runtime surfaces that keep long-context TPU work stable.
9 curated articlesStarts with: Block-sparse attention on TPU v6e: block masks, MXU-friendly tiles, and stable contractsLatest article date: 2026-04-19
Best if you care specifically about sparse attention, Pallas, and long-context TPU kernel work rather than the TPU stack as a whole.
Open hub
Topic Hub
H200 Training and Kernel Bring-Up
A curated path through the H200 lane: operator bring-up, step-time anatomy, memory pressure, and the NVIDIA kernel surfaces that actually moved the stack.
15 curated articlesStarts with: H200 Bringup and Naming: What Had to Be Made ExplicitLatest article date: 2026-04-18
Best if you care about real multi-GPU bring-up, memory cliffs, and which Hopper-era optimizations survived contact with production runs.
Open hub
Topic Hub
TPU v6e and XLA Runtime Surfaces
A curated reading order for TPU work: bring-up, PJRT and Torch/XLA boundaries, SPMD sharding, and the kernel/runtime traps that made TPU performance non-obvious.
15 curated articlesStarts with: TPU v6e Host BringupLatest article date: 2026-04-19
Best if you want the TPU lane as an engineering system rather than a benchmark screenshot.
Open hub
Topic Hub
C++ Data Pipelines and Corpus Packaging
A curated archive for the C++ data path: corpus selection, semantic enrichment, packaging into training artifacts, and the file-level durability choices that keep the pipeline sane.
14 curated articlesStarts with: Building the C++ Training Data Pipeline: What Worked, What BrokeLatest article date: 2026-04-19
Best if you want to understand where the C++ training rows come from and why the pipeline is intentionally shard-heavy.
Open hub
Topic Hub
MoE, Routing, and Distributed Model Splits
A curated path through the expert stack: what the specialist path changed, how routing works, and how the parallelism map constrains the model layout.
7 curated articlesStarts with: Specialists: What the Expert Path Actually Changed in the StackLatest article date: 2026-04-18
Best if you are trying to connect expert routing decisions to real distributed-training and Megatron boundaries.
Open hub

Read the blog by engineering cluster, not just by publish date

GB10 and Blackwell Bring-Up

Modal Training and Benchmark Operations

Mamba3 Architecture, Kernels, and Runtime Tradeoffs

MLA Integration, Dispatch, and Weight Absorption

Evaluation, Benchmarks, and Verifier Loops

Megatron Parallelism and Layout Boundaries

TPU Sparse Attention and Pallas Kernels

H200 Training and Kernel Bring-Up

TPU v6e and XLA Runtime Surfaces

C++ Data Pipelines and Corpus Packaging

MoE, Routing, and Distributed Model Splits