SLM Ensemble Architecture

This page documents the public MegaCpp model lane cleanly: NAM56R, a Mamba 3 plus Transformer hybrid for C++ engineering, with 16 routed experts, top-4 activation, about 4.73B total parameters, and about 3.03B active parameters.

NAM56R

Model lane

Public architecture lane for C++ work

Mamba 3 + Transformer

Hybrid core

State-space and attention in one stack

16 / top-4

Expert routing

Routed experts with four active per token

4.73B / 3.03B

Scale

Total params / active params

Architecture Overview

Sequence backbone

NAM56R uses a Mamba 3 plus Transformer hybrid so long-context state tracking and precise structural reasoning live in the same public model story.

Routed compute

The public route is a 16-expert design with top-4 activation. That keeps the model story concrete without pretending every internal experiment is a shipped public feature.

Precision policy

We describe NVFP4 inference, FP16 training, and Muon optimization as public engineering choices because they are part of the operating envelope readers can reason about.

Public Boundaries

What this documentation commits to

One public product: SLM Ensemble.
One public model lane: NAM56R.
One domain focus: C++ engineering tasks, codebases, and build-aware workflows.
Hardware-specific receipts, ablations, and performance detail belong in technical posts, not in architecture slogans.

What stays outside this page

Private datasets, internal directory names, and operational host details.
Unpublished benchmark tables or invented product matrices.
Claims that depend on unpublished implementation surfaces or unstable examples.
Research branches that have not been stabilized into the public MegaCpp story.

How we frame the C++ product surface

Repository-grounded code editing and review.
Compiler and build-system context rather than generic chat-only prompting.
Cross-file reasoning over code, docs, and developer workflows.
A narrow public scope that can stay consistent across docs, product pages, and engineering writing.

Implementation Notes

Model identity

Public pages should refer to the product as SLM Ensemble and the architecture lane as NAM56R. This avoids the older invented story about separate public specialist SKUs.

Data and training narrative

Data preparation, masking, augmentation, TPU and NVIDIA training lessons, and kernel-level performance notes are documented as scoped engineering articles so they can carry citations and code references responsibly.

Deployment boundary

The product page and docs page define the stable public surface. The blog carries the narrower stories about memory pressure, sharding, kernels, and framework integration tradeoffs.

Next reading

Go deeper where the claims can stay scoped

For TPU bring-up, NVIDIA memory tradeoffs, distributed sharding, data preparation, kernels, and ablation notes, use the blog and the supporting documentation pages. This documentation page stays focused on the stable public model story.

We keep expanding the public documentation, examples, and technical articles as each stable surface is ready to describe cleanly.

Read Technical Articles Browse documentation