Contact

Talk to MegaCpp

Datasunrise OÜ is the company behind the MegaCpp brand in Tallinn, Estonia. Use this page for company, product, or collaboration inquiries.

Company contact

If you want to discuss SLM Ensemble, company details, or public technical materials, email us directly.

Public technical materials
Browse the MegaCpp blog

Registered company

The public company record behind the MegaCpp brand is Datasunrise OÜ, registered in Estonia.

Datasunrise OÜ
Private limited company
Register code 16466545
Address
Vesivärava tn 50-201
10152 Tallinn, Estonia

Datasunrise OÜ - Kesklinna linnaosa, Tallinn, Estonia

Open-source port in progress

cppmega — our framework, ported onto NVIDIA Megatron

github.com/DatasunriseOU/cppmega is the NVIDIA Megatron-based port of our internal monster of a training framework. Our long-term plan is simple: we will open-source the whole thing — framework, data, and checkpoints — once the public version is genuinely solid for other people to run.

Right now we are porting it onto Megatron piece by piece, with our own improvements layered in, and pushing pre-training and post-training fully onto FP8 and NVFP4. The goal of that effort is concrete: we want enthusiasts to be able to reproduce our pre-train and post-train on a pair of H200:8 nodes — not on a fleet of ten machines — once the public release is ready.

Until then, the repo is a moving target. We will hand over the working pipeline, data, and checkpoints to the community as soon as that version is real. In parallel, we are kicking off training of our production model ensemble right now — the public release will follow the version we are confident other people can pick up and run.

Fun toy, not the real thing

Tinkering on a pair of Mac M4 Max with MLX

On the side, we play around with training on a pair of Mac M4 Max boxes using Apple's MLX stack. To be clear: this is a fun toy, not a serious training setup. You cannot do meaningful training of modern model architectures on two Mac M4 Max machines, on ten of them, or even on ten or thirty GB10 boxes — the memory geometry, interconnect, and compute budgets are simply not in the right shape. If you want the long version of why, the engineering blog is the place: we walk through memory budgets, parallelism, comms cost, and the GB10 stack honestly.

But if you enjoy reading small models, kernels, and training scripts that fit on a desk, you can follow along with the MLX experiments — small architectures, small datasets, and a lot of “what happens if we try this on Apple silicon?” energy.