slm
ensemble
specialists
cpp
architecture

Meet the Eight: Inside the MegaCpp Specialist Ensemble

Profiles of the eight specialist SLMs in the MegaCpp ensemble — what each one is good at, what it was trained on, and what to never ask it.

6 min readDavid Gornshtein
Meet the Eight: Inside the MegaCpp Specialist Ensemble

The scaling hypothesis is dead in our corner of the world. Instead of one 70B generalist that knows a little about everything, we run eight 4B-8B specialists that each know one slice of C++ engineering deeply. Every specialist is a sparse MoE model: 4B-8B total parameters, 0.8B-1.6B active per token (~10% activation ratio), NVFP4 at inference, trained on 100-200B tokens drawn from our curriculum-mapped C++ corpus (v2_simple through v6_enriched with structure-aware parquet metadata — AST node types, call edges, type edges, chunk boundaries). Each specialist is trained through the same four-phase curriculum (4K syntax, 16K file-local, 64K repo graph, structure-aware) but on a domain-skewed data mix. Below: what each one does, what it is not for, and the data that made it.

1. Algo-SLM — Algorithms and Data Structures

Best at: Pseudocode-to-C++ translation, complexity analysis, choosing the right container, implementing classic algorithms (graphs, DP, string algorithms, numerical routines) in idiomatic modern C++.

Training mix: Heavy weight on CP-style repositories, absl/algorithm, Boost.Graph, EASTL, numeric libraries (Eigen cores, GSL), LeetCode-shaped reference solutions, and algorithm textbook transcriptions. Phase 1 (v3_simple 4K) dominates early training — tight self-contained functions with clear pre/post conditions. Phase 2 adds v4_context_graph shards that pair algorithms with their callers so the model learns when to reach for a flat_hash_map vs. btree_map.

Not for: Build systems, linker errors, platform-specific syscalls, long-running async pipelines, or anything where the answer depends on repository-level state beyond ~16K tokens. It will happily hand you a beautiful Dijkstra that ignores your existing graph abstraction. Route architectural decisions to Design-SLM and repo-scale changes to the Orchestration-SLM.

Size: 7B total / 1.4B active.

2. Template-SLM — Templates and Metaprogramming

Best at: SFINAE, concepts, CRTP, variadic templates, if constexpr ladders, expression templates, tag dispatch, and reading the kind of compiler error that starts with note: candidate template ignored and continues for three screens.

Training mix: Boost (MPL, Hana, Fusion, Mp11), range-v3, fmt, spdlog internals, Eigen expression templates, Abseil type traits, libc++/libstdc++ headers, and every C++20/23 concepts-heavy library we could index. Structure-aware Phase 4 is critical here: the structure_ids column lets the model distinguish class_decl from func_body from typedef, which matters enormously when reasoning about where a requires-clause legally lives.

Not for: Runtime performance tuning, I/O, system calls, or "just make it work" production patches. Template-SLM will refactor your one-line fix into a five-concept constrained template because that is what its world looks like. It is also not a linker — symbol visibility and ODR issues route to Build-SLM.

Size: 8B total / 1.6B active.

3. Memory-SLM — RAII, Allocators, Smart Pointers

Best at: Ownership modeling, lifetime analysis, custom allocators, arena/pool designs, unique_ptr/shared_ptr/weak_ptr tradeoffs, move semantics, and catching dangling references and aliasing bugs before they reach ASan.

Training mix: EASTL allocators, Abseil memory internals, folly Arena/F14, mimalloc, jemalloc, tcmalloc, Chromium base/memory, LLVM BumpPtrAllocator, and kernel slab code. We deliberately over-sample diffs that change ownership (constructor/destructor edits, std::move insertions) using the v4_context_graph packing so the model sees the caller of a moved-from object in the same window as the move itself.

Not for: Algorithmic correctness, template metaprogramming edge cases, or build configuration. It will propose an arena allocator before asking whether the hot path is even allocation-bound. Pair it with Algo-SLM for complexity reasoning and Debug-SLM for actual leak traces.

Size: 7B total / 1.4B active.

4. Concurrency-SLM — Parallelism and Synchronization

Best at: std::atomic memory orderings, lock-free queues, thread pools, coroutines (co_await/co_return), executors, std::jthread/stop tokens, TBB, OpenMP pragmas, and the tricky business of not introducing data races while fixing a data race.

Training mix: folly (MPMCQueue, ProducerConsumerQueue, Futures), TBB, Intel OneAPI, libcds, Seastar, Boost.Asio, Abseil synchronization primitives, and curated ThreadSanitizer reports paired with their fixes. Phase 3 (64K repo graph) matters here because races almost always live across files — the model needs to see the producer and consumer in the same window.

Not for: GPU kernels, SIMD micro-optimization, distributed systems, or algorithmic design. Concurrency-SLM thinks in terms of happens-before; it does not think in terms of cache lines unless you explicitly ask. Distributed coordination, Raft, gossip protocols — those are out of scope; route to a human.

Size: 8B total / 1.6B active.

5. Systems-SLM — Low-Level, OS, Syscalls

Best at: POSIX and Win32 syscalls, epoll/kqueue/io_uring, signal handling, /proc inspection, ELF/Mach-O layout, dynamic linking, page tables conceptually, and kernel-module-adjacent userspace code. Knows why your fork() plus threads just corrupted state.

Training mix: Linux kernel selftests and userspace helpers, liburing, musl, glibc, FreeBSD libc, LLVM libunwind, Chromium sandbox code, gVisor, DPDK, and strace/ltrace/perf output paired with the code that produced it. We bias heavily toward Phase 3 context because syscalls only make sense with their surrounding control flow and error handling.

Not for: Templates, higher-level architecture, or anything where the answer is "use a library". Systems-SLM will reach for mmap when std::vector is fine. It is also not a security reviewer — it knows how syscalls work, not whether your use of them is safe against an adversary.

Size: 8B total / 1.6B active.

6. Build-SLM — CMake, Bazel, Clang Tooling

Best at: CMakeLists.txt authoring, target-based design, FetchContent/find_package, toolchain files, cross-compilation, Bazel BUILD files, rules_cc, Ninja, clang-tidy configuration, sanitizer flags, and decoding undefined reference and multiple definition errors.

Training mix: Large-scale open-source build trees (LLVM, Chromium BUILD.gn, Abseil, gRPC, Envoy, Bazel's own repo), vcpkg/Conan recipes, compile_commands.json examples, and paired before/after diffs of build-system refactors. The preamble and namespace structure categories in v6_enriched give it a strong prior on include and visibility layout.

Not for: Actual C++ logic, algorithms, or runtime behavior. Build-SLM reasons about what compiles and links, not what runs correctly. Do not ask it to design an API. Do not ask it to optimize a hot loop. It will, however, gladly tell you why your template-heavy header just blew out your compile cache.

Size: 6B total / 1.2B active.

7. Debug-SLM — GDB, Sanitizers, Ground-Truth Integration

Best at: Reading stack traces, interpreting ASan/UBSan/TSan/MSan output, writing GDB Python scripts, navigating core dumps, bisecting regressions, and — most importantly — grounding its answers in live debugger state rather than hallucinating a variable's value.

Training mix: Curated GDB/LLDB sessions with annotated transcripts, sanitizer reports paired with their root-cause patches, LLVM compiler-rt internals, rr replay traces, kernel BUG: reports, and Valgrind logs. Debug-SLM is the only specialist with a ground-truth tool channel: at inference time it can query a live GDB/LLDB bridge for actual register, memory, and backtrace values, and its training includes the tool-use trajectories that teach it to ask before guessing.

Not for: Green-field implementation, design, or algorithmic work. Debug-SLM assumes something is already broken. Hand it working code and it will invent a bug to diagnose. Use it strictly as a reactive, evidence-driven specialist.

Size: 7B total / 1.4B active.

8. STL-SLM — Standard Library Fluency

Best at: Picking the right standard container and algorithm, idiomatic <ranges>, <algorithm>, <numeric>, iterator categories, <chrono>, <filesystem>, <format>, <expected>, std::string_view vs. std::string tradeoffs, and knowing which <execution> policy is safe.

Training mix: libc++ and libstdc++ source (both implementation and test suites), MSVC STL where licensing allows, cppreference-derived examples, Abseil's STL-compatible containers, range-v3, and a curated stream of "replace raw loop with algorithm" diffs. Structure-aware training is especially valuable here: the model learns that a func_body consisting of a raw for-loop over v.begin()/v.end() almost always has a <ranges> or <algorithm> rewrite.

Not for: Third-party libraries, custom allocator design, template metaprogramming beyond what the standard requires, or concurrency primitives beyond std::atomic basics. STL-SLM is deliberately narrow. When the answer is "reach for Boost" or "write a custom container", route elsewhere.

Size: 6B total / 1.2B active.


Why eight, and why these eight

Each specialist is cheap enough to keep resident alongside its peers — the full ensemble fits in roughly 32 GB of NVFP4 VRAM, less than a single 70B generalist in FP16. The Orchestrator (documented separately) routes each request to one or more specialists based on the structural signature of the prompt: template-heavy tokens bias toward Template-SLM, stack traces bias toward Debug-SLM, CMakeLists.txt tokens bias toward Build-SLM, and so on. Specialists disagree often, and that disagreement is the signal — when Algo-SLM and Memory-SLM both weigh in on a hot-path container choice, the ensemble's answer is almost always better than either alone. That is the whole bet: narrow models, wide coverage, honest handoffs.

David Gornshtein • Datasunrise OÜMore posts →