Entity Hub

Modal Training and Benchmark Operations

A curated Modal reading path: when the rented-GPU surface was useful, what broke on multi-GPU launches, how receipts were recorded, and how we kept the lane debuggable.

This hub starts with the operator view of why Modal was in the stack at all, then narrows into the multi-GPU runtime issues, benchmark evidence, and the receipts or cold-start details that keep the lane reproducible.

modal
benchmarks
multi-gpu
debugging
cold-start
receipts
Curated set
7
Articles in reading order
Why this hub

Best if you need the MegaCpp Modal lane as an engineering surface, not a marketing comparison.

Why This Surface Exists

Start here if you need the big-picture reason Modal stayed in the stack.

  1. 01
    April 18, 20268 min readMegaCpp Engineering

    Modal Training Platform Overview

    Why we use Modal for ad-hoc training and benchmark jobs, how the image, GPU, volume, and secret model is wired, and when Modal wins against reserved H200 or TPU capacity.

    The broad platform overview and the runtime assumptions that mattered in practice.

    Modal
    Training
    Benchmarks
    Infrastructure
  2. 02
    April 18, 20264 min readDavid Gornshtein

    Modal vs Owned H200:8 vs TPU: Which Surface We Use and Why

    How we decide between Modal, reserved H200:8 hosts, and TPU slices based on operator overhead, latency to first useful step, benchmark hygiene, and failure isolation.

    The cleanest comparison of where Modal helped and where owned H200 or TPU lanes were the better surface.

    Modal
    H200
    TPU
    Infrastructure
  3. 03
    April 18, 20265 min readDavid Gornshtein

    Modal Multi-GPU Pain and the Fixes That Actually Landed

    NCCL topology, GPU isolation, eviction and OOM-kill behavior, observability gaps, and the guide we follow when a Modal multi-GPU job hangs on the first forward pass.

    The article to read before trusting a multi-GPU launch recipe on Modal.

    Modal
    Multi-GPU
    NCCL
    FSDP2

Benchmark and Evidence Layer

These are the receipts that keep the Modal lane grounded instead of anecdotal.

  1. 04
    April 18, 20265 min readDavid Gornshtein

    Benchmarking the MegaCpp stack on Modal: multi-GPU lessons from rented boxes

    What we learned running the training stack on rented H100, H200, and B200 boxes through Modal: three benchmark lanes, an 8-GPU FSDP2 hang, and the bookkeeping that lets the numbers survive a week.

    The multi-GPU benchmark readback once the launch and warmup path were stable enough to compare.

    Modal
    Benchmarks
    Multi-GPU
    Fsdp
  2. 05
    April 18, 202612 min readMegaCpp Engineering

    Modal Benchmark Receipts: What Counted as Evidence and What Did Not

    A grounded guide to benchmark receipts using compile posture, backend identity, and narrow evidence records rather than headline throughput claims.

    What counted as evidence, what did not, and how the benchmark receipt surface stayed honest.

    Modal
    Benchmarks
    Receipts
    Throughput
  3. 06
    April 18, 20269 min readMegaCpp Engineering

    Modal Debugging Guide for Training and Benchmark Failures

    A grounded guide for debugging Modal failures in MegaCpp: cold starts, multi-GPU hangs, image drift, detached collector issues, and volume or output-state bugs.

    The shortest useful debugging path once the lane is failing instead of benchmarking.

    Modal
    Debugging
    Benchmarks
    Training

Image and Runtime Friction

These explain the operational tax that sits underneath the cleaner benchmark graphs.

  1. 07
    April 18, 20265 min readMegaCpp Engineering

    Modal image construction and the cold-start tax we actually pay

    How we layer the Modal training image, why every wheel is pinned to the training stack, how persistent volumes absorb the inductor-cache hit, and the 30-90 second startup tax we accept as the price of burst compute.

    The image-build and cold-start costs that shaped the rest of the Modal workflow.

    Modal
    Docker
    Cold Start
    Inductor Cache

Keep exploring

Adjacent topic hubs

These hubs cover nearby parts of the blog without turning the archive into a giant taxonomy.