We're MegaCpp
A small team operating out of Tallinn, Estonia under Datasunrise OÜ, with an unreasonable obsession with making C++ development less painful and more intelligent.
Why We Started This Journey
It started, as many things do, with frustration. We were watching the AI revolution unfold, seeing transformative tools emerge for Python, JavaScript, even COBOL—but C++ developers were left holding the short end of the stick.
The problem wasn't that AI couldn't generate C++ code. It could. The problem was that it generated code like someone who'd read the standard but never actually compiled anything. It would suggest new without delete, mix up const and constexpr, and produce template errors that would make your compiler weep.
So we asked ourselves: what would it take to build AI that actually understands C++? Not "has seen C++ in training data" but genuinely comprehends memory management, template instantiation, and the subtle difference between undefined behavior and implementation-defined behavior.
The answer turned out to be: a lot. We needed specialized tokenizers that treat std::vector as a single concept. We needed training data curated specifically for C++, with comments that explain the why, not just the what. We needed integration with actual debuggers so our models could see code running, not just sitting in text files.
And we needed a real training stack, not a notebook. Our nanochat POC proved out the architecture, then cppmega became the production training base: Megatron integration, GB10 and Blackwell support, FP16/BF16 training, NVFP4 inference, reproducible pinned runs.
One intense year later, here we are. A small company with big ambitions, building the infrastructure for the next generation of C++ development.
What We Believe
Specialization Over Scale
We believe in doing one thing exceptionally well rather than doing everything adequately. C++ deserves dedicated tools, not afterthoughts.
Ground Truth Matters
Our models don't guess what code does—they observe it running. Debugger integration isn't a feature, it's a philosophy.
Developer Experience First
If it takes 47 seconds to get a response, it's not helping. We obsess over latency because your flow state is sacred.
Build in the Open
We're not hiding behind 'proprietary' labels. Our architectures, training recipes, and eval methodology are documented. Science advances by sharing.
How We Got Here
The Frustration
After years of watching GPT-4 confidently generate incorrect template metaprogramming, we decided enough was enough.
nanochat POC
We built the nanochat POC: Mamba 3 + Transformers hybrid, Muon optimizer, doc masking. It convinced us that small specialist models were the right bet for C++.
cppmega Stack
We promoted the POC to cppmega: the real training stack behind the ensemble. Megatron integration, GB10 and B200 support, NVFP4 inference path, reproducible runs.
SLM Ensemble
Our specialist language models hit production. The scaling hypothesis officially dead in our corner of the world.
Datasunrise OÜ
Legal form
Private limited company
Register code
16466545 (registered 23.03.2022)
Status
Entered into the register, 23.03.2022
Address
Vesivärava tn 50-201, Kesklinna linnaosa, Tallinn, Harju maakond, 10152, Estonia
Want to Join the Mission?
We're always looking for people who share our obsession with making C++ development better. Check out our team or get in touch.