MegaCpp by Datasunrise OÜ
</>

C++ AI Built Around a Single Public Product

Datasunrise OÜ is building SLM Ensemble, a focused AI product for C++ engineering published under the MegaCpp brand and built around our NAM56R lane: a Mamba 3 plus Transformer hybrid with 16 routed experts, top-4 activation, about 4.73B total parameters, and about 3.03B active parameters.

NAM56R
Public model lane
16 / top-4
Expert routing
4.73B
Total params
3.03B
Active params
Thesis

The long-context compression war misses the point for code.

Every recent architecture is fighting the same war through different forms of compression. Mamba 2 squashes the entire context into a single state vector. M2RNN compresses into a matrix, but parallelizes poorly. The Mamba-adjacent models in the vLLM ecosystem — Kimi Delta Attention among them — run their own state-tracking tricks. DeepSeek 4 took a different route with CSA and HCA: it doesn't change the underlying math, it just squashes 4 prompt-token embeddings into 1 (up to 128 on a block-wise basis) and runs attention over those flattened super-tokens.

All of this is cute for natural language. For code, it is digging in the wrong direction.

In code, the model does not need to guess which tokens depend on which. The AST (Abstract Syntax Tree) gives that information exactly. If the code compiles and we are hunting a bug, the tree tells us the relationships. If the code is incomplete and we're filling a gap with FIM / IFIM or writing new code from scratch, the structural connections remain predictable — a real parser like Tree-sitter maps them cleanly, without loss.

The grammar is already given. Training a model to statistically guess it is not the right optimisation when the exact syntax structure is already known.

Product

SLM Ensemble

One focused product for C++ engineering. The rest of the site exists to explain how we build it, measure it, and where we are still experimenting.

Language Models

SLM Ensemble

A focused AI product for C++ workflows, published by Datasunrise OÜ under the MegaCpp brand and built around the NAM56R family: a Mamba 3 plus Transformer hybrid with specialist routing, shared tooling, and evaluation around one public product.

  • NAM56R hybrid stack
  • 16 routed experts, top-4 active
  • Published notes and samples
Learn More
How We Work

Engineering Before Slogans

We prefer careful scope and explicit evidence. Public pages describe the direction, while the blog and sample repos carry the deeper implementation detail.

Hybrid Model Research

We explore hybrid state-space and attention architectures for long-context C++ work and publish the engineering tradeoffs behind those experiments.

Practical Deployment Discipline

Training, inference, and evaluation are treated as one system. We document what is measured, what is still experimental, and where the boundaries are.

Workflow Grounding

MegaCpp is designed around real C++ repositories, compiler-aware context, and code-review workflows rather than generic chatbot prompts.

C++ Native

The public story stays narrow: one product, one domain, and one audience. Modern C++, legacy C++, build systems, and review flows all live inside that scope.

Public Position

Narrow Scope, Better Claims

Datasunrise OÜ is not trying to present MegaCpp as a universal AI company. We are building one public product for C++ engineering, and we would rather under-claim than publish benchmarks or hardware promises without public receipts.

That means the homepage stays simple. Product details, model architecture, training notes, and evaluation methodology belong on deeper pages where they can be scoped properly and linked to supporting material.

If you want the implementation story, read the technical articles and sample repository. If you want the company story, it is even simpler: Datasunrise OÜ operates the MegaCpp brand, David Gornshtein and Boris Tamarkin are the founders, and SLM Ensemble is the public product.

Want the technical details?

The blog covers architecture choices, evaluation design, data preparation, and deployment notes in much more detail than the homepage should.