# GB10 Repro Bundle

This directory is the GitHub-ready reproduction pack for the public GB10
reverse-engineering story.

It keeps two lanes separate on purpose:

- the public-safe lane that anyone can repeat from the command line with CUDA,
  a patched cubin, and exact Driver API receipts;
- the deeper `driver_patch_lane/` that explores copied-`libcuda` patching and
  helper routing, but is **not** presented as silicon proof.

## What this bundle proves cleanly

- a baseline `sm_100a` cubin can be patched to `sm_121a` by rewriting only the
  ELF `e_flags` architecture field;
- that patched baseline cubin loads, launches, synchronizes, and returns the
  expected arithmetic output on GB10;
- `tcgen05`-oriented probes hit additional software and metadata gates after
  the baseline arch check;
- driver-visible paths and deeper patch lanes are different from a clean
  end-to-end execute receipt.

## What this bundle does not claim

- proven `tcgen05.mma` parity with B200 or GB100;
- proven TMEM availability on GB10;
- that the copied-`libcuda` patch lane is the same thing as clean shipping
  support.

## Start here

Use this order if you want the cleanest article -> walkthrough -> raw-file
handoff:

1. Read
   [`What our GB10 experiments actually prove about Blackwell tensor paths`](/blog/gb10-blackwell-tensor-paths-what-we-actually-proved/)
   first for the claim boundary.
2. Read
   [`Reproducing the sm_100a to sm_121a cubin patch on GB10`](/blog/gb10-sm100a-cubin-patch-repro/)
   next for the staged repro story that matches this bundle most directly.
3. Read
   [`Why driver-visible paths can look like hardware support on GB10`](/blog/gb10-driver-gates-and-false-capability-signals/)
   third if you need the warning-story explanation for the layered gates.
4. Read [the local command walkthrough](./README_walkthrough.md) for the exact local
   command sequence.
5. Keep [the gate matrix](./README_gates.md) and
   [the public wording guardrail](./public_claims.md) nearby as compact guardrails while
   you run or cite the bundle.
6. Open [the one-command runner](./run.sh), [the ELF arch-field patch helper](./patch_elf.py),
   [the reserved-SMEM symbol patch helper](./patch_symbols.py), [the capability-record strip helper](./patch_nvinfo.py),
   [the minimal alloc probe](./kernel_alloc_only.cu), and
   [the exploratory tensor-path probe](./kernel_sm100a.cu) only after the article and
   walkthrough lanes are clear.
7. Leave the deeper copied-driver research lane for last:
   [`Inside the GB10 driver patch lane`](/blog/gb10-libcuda-driver-patch-lane-and-why-it-still-is-not-silicon-proof/),
   [the copied-driver lane overview](./driver_patch_lane/README.md), then
   [the copied-driver patch helper](./driver_patch_lane/patch_libcuda.py).

If the blocked part is vocabulary rather than procedure, use
[`MegaCpp model glossary`](/blog/megacpp-model-glossary/) for
`sm_121a`, `tcgen05`, `reservedSmem`, `.nv.info.<kernel>`, `.nv.capmerc`, and
`.nv.merc.rela.*`, then return to step 1.

## File map

- [The build recipe](./Makefile): builds the baseline and gate-walk artifacts
- [The one-command runner](./run.sh): public-safe walkthrough
- [The Driver API loader](./loader.cpp): CUDA Driver API loader with load / launch /
  synchronize receipts
- [The host attribute reader](./query_attrs.cpp): device-attribute readback for the host
- [The baseline arithmetic kernel](./kernel_baseline.cu): the narrow positive proof lane
- [The minimal alloc probe](./kernel_alloc_only.cu): minimal `tcgen05.alloc`
  probe
- [The exploratory tensor-path probe](./kernel_sm100a.cu): fuller exploratory probe surface
- [The ELF arch-field patch helper](./patch_elf.py): rewrites only the ELF `e_flags` arch field
- [The reserved-SMEM symbol patch helper](./patch_symbols.py): rewrites weak undefined
  reserved-SMEM symbols
- [The capability-record strip helper](./patch_nvinfo.py): strips selected `.nv.info` capability
  records
- [The compact gate matrix](./README_gates.md): compact gate matrix
- [The command walkthrough](./README_walkthrough.md): exact commands plus
  interpretation
- [The public wording guardrail](./public_claims.md): wording guardrails
- [The copied-driver lane overview](./driver_patch_lane/README.md): cautionary
  explanation of the deeper lane
- [The copied-driver patch helper](./driver_patch_lane/patch_libcuda.py):
  copied-`libcuda` patch script with public-safe warning header

## Quick start

```bash
make all
./query_attrs
make run-baseline
make probe-alloc-gates
```

For the guided sequence:

```bash
./run.sh
```

## Step-by-step summary

1. Build the baseline and probe cubins with [the build recipe](./Makefile).
2. Use [the ELF arch-field patch helper](./patch_elf.py) to rewrite `sm_100a -> sm_121a`.
3. Run [the Driver API loader](./loader.cpp) against the patched baseline cubin.
4. Confirm the positive receipt in [the gate matrix](./README_gates.md).
5. Walk the minimal `tcgen05.alloc` lane with [the reserved-SMEM symbol patch helper](./patch_symbols.py)
   and [the capability-record strip helper](./patch_nvinfo.py).
6. Stop the public claim at the point described in
   [the public wording guardrail](./public_claims.md).
7. If you intentionally want the deeper user-space driver lane, move into
   [the copied-driver lane](./driver_patch_lane/).

## Practical scope

The safest way to use this bundle is:

- treat `k_baseline` as the narrow positive proof;
- treat the `k_tcgen05_alloc` walk as evidence of layered software gates;
- treat `kernel_sm100a.cu` as exploratory surface area, not public proof;
- treat [the copied-driver patch helper](./driver_patch_lane/patch_libcuda.py)
  as research tooling for copied drivers, not a shipping-support claim.

## Related public articles

- [`What our GB10 experiments actually prove about Blackwell tensor paths`](/blog/gb10-blackwell-tensor-paths-what-we-actually-proved/):
  the clean claim boundary for what the public receipts do and do not prove.
- [`Reproducing the sm_100a to sm_121a cubin patch on GB10`](/blog/gb10-sm100a-cubin-patch-repro/):
  the step-by-step walkthrough that matches this bundle most directly.
- [`Why driver-visible paths can look like hardware support on GB10`](/blog/gb10-driver-gates-and-false-capability-signals/):
  the warning-story explainer for why layered gates and helper paths are easy to
  over-read.
- [`Inside the GB10 driver patch lane`](/blog/gb10-libcuda-driver-patch-lane-and-why-it-still-is-not-silicon-proof/):
  the deeper copied-`libcuda` lane, kept intentionally separate from the
  baseline proof path.
- [`MegaCpp model glossary`](/blog/megacpp-model-glossary/):
  the term-decoding lane for `sm_121a`, `tcgen05`, `reservedSmem`,
  `.nv.info.<kernel>`, `.nv.capmerc`, and `.nv.merc.rela.*` before you drop
  into the local walkthrough or raw files.
- [Local command walkthrough](./README_walkthrough.md):
  the compact local bridge from the public articles back into the exact command
  sequence in this bundle.
- [Public wording guardrail](./public_claims.md):
  the local wording guardrail to keep the evidence hierarchy intact when this
  bundle is cited elsewhere.