About Us

We're MegaCpp

A small team operating out of Tallinn, Estonia under Datasunrise OÜ, with an unreasonable obsession with making C++ development less painful and more intelligent.

Vesivärava tn 50-201, Kesklinna, Tallinn, Estonia
Our Story

Why We Started This Journey

It started, as many things do, with frustration. We were watching the AI revolution unfold, seeing transformative tools emerge for Python, JavaScript, even COBOL—but C++ developers were left holding the short end of the stick.

The problem wasn't that AI couldn't generate C++ code. It could. The problem was that it generated code like someone who'd read the standard but never actually compiled anything. It would suggest new without delete, mix up const and constexpr, and produce template errors that would make your compiler weep.

So we asked ourselves: what would it take to build AI that actually understands C++? Not "has seen C++ in training data" but genuinely comprehends memory management, template instantiation, and the subtle difference between undefined behavior and implementation-defined behavior.

The answer turned out to be: a lot. We needed specialized tokenizers that treat std::vector as a single concept. We needed training data curated specifically for C++, with comments that explain the why, not just the what. We needed integration with actual debuggers so our models could see code running, not just sitting in text files.

And we needed a real training stack, not a notebook. Our nanochat POC proved out the architecture, then cppmega became the production training base: Megatron integration, GB10 and Blackwell support, FP16/BF16 training, NVFP4 inference, reproducible pinned runs.

One intense year later, here we are. A small company with big ambitions, building the infrastructure for the next generation of C++ development.

Our Values

What We Believe

Specialization Over Scale

We believe in doing one thing exceptionally well rather than doing everything adequately. C++ deserves dedicated tools, not afterthoughts.

Ground Truth Matters

Our models don't guess what code does—they observe it running. Debugger integration isn't a feature, it's a philosophy.

Developer Experience First

If it takes 47 seconds to get a response, it's not helping. We obsess over latency because your flow state is sacred.

Build in the Open

We're not hiding behind 'proprietary' labels. Our architectures, training recipes, and eval methodology are documented. Science advances by sharing.

Timeline

How We Got Here

Q1 2026

The Frustration

After years of watching GPT-4 confidently generate incorrect template metaprogramming, we decided enough was enough.

Q1 2026

nanochat POC

We built the nanochat POC: Mamba 3 + Transformers hybrid, Muon optimizer, doc masking. It convinced us that small specialist models were the right bet for C++.

Q2 2026

cppmega Stack

We promoted the POC to cppmega: the real training stack behind the ensemble. Megatron integration, GB10 and B200 support, NVFP4 inference path, reproducible runs.

Q2 2026

SLM Ensemble

Our specialist language models hit production. The scaling hypothesis officially dead in our corner of the world.

Legal Entity

Datasunrise OÜ

Legal form

Private limited company

Register code

16466545 (registered 23.03.2022)

Status

Entered into the register, 23.03.2022

Address

Vesivärava tn 50-201, Kesklinna linnaosa, Tallinn, Harju maakond, 10152, Estonia

Want to Join the Mission?

We're always looking for people who share our obsession with making C++ development better. Check out our team or get in touch.