Entity Hub

Megatron Parallelism and Layout Boundaries

A curated Megatron reading path: the parallelism map, what actually splits, how NVIDIA and TPU wrappers differ, and the migration surfaces around NAM56R-style layouts.

This hub is for readers who need the execution order of Megatron concepts, not just a glossary. Start with the parallelism map, then move into the boundary documents and finally the migration or recipe surfaces that turn those ideas into launchable layouts.

megatron
tensor-parallel
pipeline-parallel
sequence-parallel
context-parallel
expert-parallel
nam56r
nemotron
Curated set
9
Articles in reading order
Why this hub

Best if you keep hitting TP, PP, EP, FSDP2, or Megatron wrapper terms across the blog and want one stable reading path.

Parallelism Map

Build the vocabulary and application order before touching wrappers or recipes.

  1. 01
    April 18, 202610 min readEngineering Team

    EP, PP, TP, CP, SP, DP: The Parallelism Map We Actually Use

    What data, tensor, sequence, context, pipeline, and expert parallelism each own, how they compose, and where the real integration risks still live.

    The shortest accurate map of the parallel axes MegaCpp actually uses.

    Distributed Training
    Expert Parallel
    Pipeline Parallel
    Tensor Parallel
  2. 02
    April 18, 20269 min readMegaCpp Engineering

    Tensor Parallel and Sharding: What Actually Splits, What Still Stays Global

    A code- and doc-grounded walkthrough of tensor parallelism in public hybrid recipes, including where TP helps, where it does not, and how it fits into hybrid NAM52 and NAM56R workloads.

    What really splits under tensor parallelism, what stays global, and why the difference matters later.

    Tensor Parallel
    Sharding
    Distributed Training
    Sequence Parallel

Boundaries in the Real Stack

These explain where Megatron stops, where wrappers start, and how the layout lands on actual hardware.

  1. 04
    April 18, 202610 min readEngineering Team

    What Megatron Can and Cannot Split

    A grounded look at split-friendly and split-hostile model surfaces: TP, SP, PP, EP, recurrent state, side embeddings, and why some boundaries remain architectural rather than automatic.

    The boundary document for what the Megatron lane can express without custom seams.

    Megatron
    Tensor Parallel
    Pipeline Parallel
    MoE
  2. 05
    April 18, 202612 min readDavid Gornshtein

    DualPipe and 3D Parallelism on H200 and GB10

    How MegaCpp lays out the TP × PP × DP × EP cube on H200 multi-node systems and GB10, integrates DualPipe / DualPipeV with our hybrid layer pattern, accounts for pipeline bubbles, and launches the deployment training job.

    How the parallel map behaves once the layout has to survive real H200 and GB10 execution.

    Pipeline Parallelism
    Tensor Parallelism
    Dualpipe
    H200
  3. 06
    April 18, 20267 min readDavid Gornshtein

    Hybrid FSDP/DDP on NVIDIA: Megatron DDP plus FSDP2 for the ensemble

    How MegaCpp combines Megatron-Core DistributedDataParallel with PyTorch FSDP2 across H200 and GB10, the gradient-bucket sizing rules we ship, the freeze plan for the eight specialists, and the failure modes that defined the contract.

    The NVIDIA-side wrapper and sharding story once Megatron is not the only layout owner anymore.

    FSDP2
    Megatron
    Distributed Training
    H200

Migration and Recipe Translation

These finish the picture by showing how layouts are expressed and ported.

  1. 07
    April 19, 20262 min readDavid Gornshtein

    Migration policy: native Megatron vs narrow custom seams

    Why MegaCpp ports only what Megatron or Nemotron do not already provide, and why ambiguous mappings should fail closed instead of being reinterpreted silently.

    The decision framework for staying native versus carrying narrow custom seams around Megatron.

    Migration
    Megatron
    Nemotron
    Porting Policy
  2. 08
    April 19, 20263 min readDavid Gornshtein

    NAM56R Megatron translation

    Why translating NAM56R into Megatron-native syntax is a fail-closed planning step, not a blind string rewrite.

    The concrete translation story for a MegaCpp hybrid model into Megatron-native terms.

    NAM56R
    Megatron
    Translation
    Hybrid
  3. 09
    April 19, 20263 min readDavid Gornshtein

    How to express a Nemotron-style recipe as pure Megatron CLI

    Why MegaCpp keeps high-level recipe objects and then lowers them into a smaller native Megatron flag surface instead of treating one giant launcher as the source of truth.

    The recipe-level companion piece when the conceptual map has to become real launch arguments.

    Megatron
    Nemotron
    Recipes
    Launchers

Keep exploring

Adjacent topic hubs

These hubs cover nearby parts of the blog without turning the archive into a giant taxonomy.