SLM Ensemble Architecture
This page documents the public MegaCpp model lane cleanly: NAM56R, a Mamba 3 plus Transformer hybrid for C++ engineering, with 16 routed experts, top-4 activation, about 4.73B total parameters, and about 3.03B active parameters.
Architecture Overview
Sequence backbone
NAM56R uses a Mamba 3 plus Transformer hybrid so long-context state tracking and precise structural reasoning live in the same public model story.
Routed compute
The public route is a 16-expert design with top-4 activation. That keeps the model story concrete without pretending every internal experiment is a shipped public feature.
Precision policy
We describe NVFP4 inference, FP16 training, and Muon optimization as public engineering choices because they are part of the operating envelope readers can reason about.
Public Boundaries
What this documentation commits to
- One public product: SLM Ensemble.
- One public model lane: NAM56R.
- One domain focus: C++ engineering tasks, codebases, and build-aware workflows.
- Hardware-specific receipts, ablations, and performance detail belong in technical posts, not in architecture slogans.
What stays outside this page
- Private datasets, internal directory names, and operational host details.
- Unpublished benchmark tables or invented product matrices.
- Claims that depend on unpublished implementation surfaces or unstable examples.
- Research branches that have not been stabilized into the public MegaCpp story.
How we frame the C++ product surface
- Repository-grounded code editing and review.
- Compiler and build-system context rather than generic chat-only prompting.
- Cross-file reasoning over code, docs, and developer workflows.
- A narrow public scope that can stay consistent across docs, product pages, and engineering writing.
Implementation Notes
Model identity
Public pages should refer to the product as SLM Ensemble and the architecture lane as NAM56R. This avoids the older invented story about separate public specialist SKUs.
Data and training narrative
Data preparation, masking, augmentation, TPU and NVIDIA training lessons, and kernel-level performance notes are documented as scoped engineering articles so they can carry citations and code references responsibly.
Deployment boundary
The product page and docs page define the stable public surface. The blog carries the narrower stories about memory pressure, sharding, kernels, and framework integration tradeoffs.
Go deeper where the claims can stay scoped
For TPU bring-up, NVIDIA memory tradeoffs, distributed sharding, data preparation, kernels, and ablation notes, use the blog and the supporting documentation pages. This documentation page stays focused on the stable public model story.
We keep expanding the public documentation, examples, and technical articles as each stable surface is ready to describe cleanly.