Entity Hub

GB10 and Blackwell Bring-Up

A curated GB10 and Blackwell reading path: consumer-versus-datacenter tensor paths, driver-visible false positives, arch-patch repros, and the serving or precision choices that survived contact with the hardware.

This hub is for readers who want the GB10 lane in the right order. Start with the broad war story and the tensor-path proof summary, then move into the gate-by-gate repro pieces and finally the stack, serving, and precision follow-through.

GB10
tcgen05
driver-research
sm121a
libcuda
nvfp4
Curated set
9
Articles in reading order
Why this hub

Best if you care about what GB10 actually proved, where tcgen05 evidence stops, and how those hardware limits changed the rest of the MegaCpp stack.

Start Here

Build the hardware and runtime picture before drilling into the patch lanes.

  1. 01
    April 18, 202617 min readDavid Gornshtein

    Training the MegaCpp SLM Ensemble on GB10: a Grace Blackwell war story

    Field notes from bringing the MegaCpp SLM Ensemble up on NVIDIA GB10 and DGX Spark: silicon surprises, NaN bisects that ate days, regressions caused by our own patches, and the software-stack choices that held.

    The broad GB10 war story: what was attempted, what held up, and what turned out to be wishful thinking.

    GB10
    Blackwell
    SM121A
    NVFP4
  2. 02
    April 20, 202611 min readDavid Gornshtein

    What Our GB10 Experiments Actually Prove About Blackwell Consumer vs Datacenter Tensor Paths

    Our GB10 tests show that some Blackwell datacenter-targeted SASS can be accepted and executed on consumer silicon, but they do not prove that the Blackwell Tensor Core Generation 5 matrix-instruction path (tcgen05.mma) physically executes on GB10. Older stronger claims overstate what the evidence supports.

    The shortest accurate summary of which Blackwell tensor-path claims are backed by public evidence and which are still missing.

    GB10
    Blackwell
    CUDA
    Tensor Core
  3. 03
    April 20, 20269 min readDavid Gornshtein

    Why Driver-Visible Paths Can Look Like Hardware Support on GB10, Even When Silicon Proof Is Missing

    A field report on GB10 reverse engineering: how libcuda tables, helper cubins, and signed capability metadata can make tcgen05 look reachable from software while still falling short of proving that the underlying silicon really exposes the same path as B200 or GB100.

    Read this before trusting any driver-visible capability bit as proof of real hardware execution.

    GB10
    Blackwell
    CUDA
    Driver Research

Gate Walk and Patch Lanes

These are the concrete repros and gate-by-gate explanations once the top-level claim is clear.

  1. 04
    April 20, 20268 min readDavid Gornshtein

    Reproducing the sm_100a -> sm_121a Cubin Patch on GB10: CUDA/C++ Code, ELF Edits, and the Exact Point Where tcgen05 Stops

    A practical GB10 reproduction guide for the narrow result we can defend publicly: a patched sm_100a baseline cubin executes on GB10, while tcgen05-oriented probes stop at later driver-side gates rather than producing a publication-grade tcgen05 proof.

    The public arch-field repro from sm_100a to sm_121a, including the exact point where tcgen05 stops.

    GB10
    Blackwell
    CUDA
    C++
  2. 05
    April 20, 20269 min readDavid Gornshtein

    Inside the GB10 Driver Patch Lane: libcuda Tables, Helper Cubins, Linux Hooks, and Why Deeper Patching Still Is Not tcgen05 Proof

    A public-safe walkthrough of the deeper GB10 driver research lane: what was patched in libcuda, what changed in the cubin and toolchain path, where Linux- and loader-level hooks entered the picture, and why that deeper progress still stops short of publication-grade tcgen05 proof.

    The deeper driver patch lane and why even aggressive patching still does not count as clean tensor-path proof.

    GB10
    Blackwell
    CUDA
    libcuda

Serving and Precision Follow-Through

Once the bring-up and gate story is understood, these explain the downstream execution choices.

  1. 08
    April 18, 20264 min readDavid Gornshtein

    NVFP4 Inference for the MegaCpp SLM Ensemble

    Why we train in FP16/BF16 and ship in NVFP4, what Blackwell and GB10 actually give us, and which kernels survive the trip from B200 to DGX Spark.

    The precision-policy readback for the GB10 inference lane once tensor-path assumptions were narrowed.

    NVFP4
    Blackwell
    GB10
    Inference

Keep exploring

Adjacent topic hubs

These hubs cover nearby parts of the blog without turning the archive into a giant taxonomy.