Entity Hub

Megatron Parallelism and Layout Boundaries

A curated Megatron reading path: the parallelism map, what actually splits, how NVIDIA and TPU wrappers differ, and the migration surfaces around NAM56R-style layouts.

This hub is for readers who need the execution order of Megatron concepts, not just a glossary. Start with the parallelism map, then move into the boundary documents and finally the migration or recipe surfaces that turn those ideas into launchable layouts.

megatron

tensor-parallel

pipeline-parallel

sequence-parallel

context-parallel

expert-parallel

nam56r

nemotron

Curated set

Articles in reading order

Why this hub

Best if you keep hitting TP, PP, EP, FSDP2, or Megatron wrapper terms across the blog and want one stable reading path.

Parallelism Map

Build the vocabulary and application order before touching wrappers or recipes.

01
April 18, 2026•10 min read•Engineering Team
EP, PP, TP, CP, SP, DP: The Parallelism Map We Actually Use
What data, tensor, sequence, context, pipeline, and expert parallelism each own, how they compose, and where the real integration risks still live.
The shortest accurate map of the parallel axes MegaCpp actually uses.
Distributed Training
Expert Parallel
Pipeline Parallel
Tensor Parallel
Read article
02
April 18, 2026•9 min read•MegaCpp Engineering
Tensor Parallel and Sharding: What Actually Splits, What Still Stays Global
A code- and doc-grounded walkthrough of tensor parallelism in public hybrid recipes, including where TP helps, where it does not, and how it fits into hybrid NAM52 and NAM56R workloads.
What really splits under tensor parallelism, what stays global, and why the difference matters later.
Tensor Parallel
Sharding
Distributed Training
Sequence Parallel
Read article
03
April 18, 2026•6 min read•Engineering Team
Context Parallel and Sequence Parallel: Similar Names, Different Jobs
An explanation of SP versus CP using TP-aware helpers, long-context bring-up patterns, and hybrid model design.
The cleanest companion read once TP terminology starts colliding with long-context layout terms.
Context Parallel
Sequence Parallel
Long Context
Tensor Parallel
Read article

Boundaries in the Real Stack

These explain where Megatron stops, where wrappers start, and how the layout lands on actual hardware.

04
April 18, 2026•10 min read•Engineering Team
What Megatron Can and Cannot Split
A grounded look at split-friendly and split-hostile model surfaces: TP, SP, PP, EP, recurrent state, side embeddings, and why some boundaries remain architectural rather than automatic.
The boundary document for what the Megatron lane can express without custom seams.
Megatron
Tensor Parallel
Pipeline Parallel
MoE
Read article
05
April 18, 2026•12 min read•David Gornshtein
DualPipe and 3D Parallelism on H200 and GB10
How MegaCpp lays out the TP × PP × DP × EP cube on H200 multi-node systems and GB10, integrates DualPipe / DualPipeV with our hybrid layer pattern, accounts for pipeline bubbles, and launches the deployment training job.
How the parallel map behaves once the layout has to survive real H200 and GB10 execution.
Pipeline Parallelism
Tensor Parallelism
Dualpipe
H200
Read article
06
April 18, 2026•7 min read•David Gornshtein
Hybrid FSDP/DDP on NVIDIA: Megatron DDP plus FSDP2 for the ensemble
How MegaCpp combines Megatron-Core DistributedDataParallel with PyTorch FSDP2 across H200 and GB10, the gradient-bucket sizing rules we ship, the freeze plan for the eight specialists, and the failure modes that defined the contract.
The NVIDIA-side wrapper and sharding story once Megatron is not the only layout owner anymore.
FSDP2
Megatron
Distributed Training
H200
Read article

Migration and Recipe Translation

These finish the picture by showing how layouts are expressed and ported.

07
April 19, 2026•2 min read•David Gornshtein
Migration policy: native Megatron vs narrow custom seams
Why MegaCpp ports only what Megatron or Nemotron do not already provide, and why ambiguous mappings should fail closed instead of being reinterpreted silently.
The decision framework for staying native versus carrying narrow custom seams around Megatron.
Migration
Megatron
Nemotron
Porting Policy
Read article
08
April 19, 2026•3 min read•David Gornshtein
NAM56R Megatron translation
Why translating NAM56R into Megatron-native syntax is a fail-closed planning step, not a blind string rewrite.
The concrete translation story for a MegaCpp hybrid model into Megatron-native terms.
NAM56R
Megatron
Translation
Hybrid
Read article
09
April 19, 2026•3 min read•David Gornshtein
How to express a Nemotron-style recipe as pure Megatron CLI
Why MegaCpp keeps high-level recipe objects and then lowers them into a smaller native Megatron flag surface instead of treating one giant launcher as the source of truth.
The recipe-level companion piece when the conceptual map has to become real launch arguments.
Megatron
Nemotron
Recipes
Launchers
Read article

Keep exploring

Adjacent topic hubs

These hubs cover nearby parts of the blog without turning the archive into a giant taxonomy.

Megatron Parallelism and Layout Boundaries

Parallelism Map

EP, PP, TP, CP, SP, DP: The Parallelism Map We Actually Use

Tensor Parallel and Sharding: What Actually Splits, What Still Stays Global

Context Parallel and Sequence Parallel: Similar Names, Different Jobs

Boundaries in the Real Stack

What Megatron Can and Cannot Split

DualPipe and 3D Parallelism on H200 and GB10

Hybrid FSDP/DDP on NVIDIA: Megatron DDP plus FSDP2 for the ensemble

Migration and Recipe Translation

Migration policy: native Megatron vs narrow custom seams

NAM56R Megatron translation

How to express a Nemotron-style recipe as pure Megatron CLI

Adjacent topic hubs

GB10 and Blackwell Bring-Up

Modal Training and Benchmark Operations

Mamba3 Architecture, Kernels, and Runtime Tradeoffs

MLA Integration, Dispatch, and Weight Absorption

Evaluation, Benchmarks, and Verifier Loops

TPU Sparse Attention and Pallas Kernels

H200 Training and Kernel Bring-Up

TPU v6e and XLA Runtime Surfaces

C++ Data Pipelines and Corpus Packaging

MoE, Routing, and Distributed Model Splits