Talk to MegaCpp
Datasunrise OÜ is the company behind the MegaCpp brand in Tallinn, Estonia. Use this page for company, product, or collaboration inquiries.
Company contact
If you want to discuss SLM Ensemble, company details, or public technical materials, email us directly.
Registered company
The public company record behind the MegaCpp brand is Datasunrise OÜ, registered in Estonia.
Datasunrise OÜ - Kesklinna linnaosa, Tallinn, Estonia
cppmega — our framework, ported onto NVIDIA Megatron
github.com/DatasunriseOU/cppmega is the NVIDIA Megatron-based port of our internal monster of a training framework. Our long-term plan is simple: we will open-source the whole thing — framework, data, and checkpoints — once the public version is genuinely solid for other people to run.
Right now we are porting it onto Megatron piece by piece, with our own improvements layered in, and pushing pre-training and post-training fully onto FP8 and NVFP4. The goal of that effort is concrete: we want enthusiasts to be able to reproduce our pre-train and post-train on a pair of H200:8 nodes — not on a fleet of ten machines — once the public release is ready.
Until then, the repo is a moving target. We will hand over the working pipeline, data, and checkpoints to the community as soon as that version is real. In parallel, we are kicking off training of our production model ensemble right now — the public release will follow the version we are confident other people can pick up and run.
Tinkering on a pair of Mac M4 Max with MLX
On the side, we play around with training on a pair of Mac M4 Max boxes using Apple's MLX stack. To be clear: this is a fun toy, not a serious training setup. You cannot do meaningful training of modern model architectures on two Mac M4 Max machines, on ten of them, or even on ten or thirty GB10 boxes — the memory geometry, interconnect, and compute budgets are simply not in the right shape. If you want the long version of why, the engineering blog is the place: we walk through memory budgets, parallelism, comms cost, and the GB10 stack honestly.
But if you enjoy reading small models, kernels, and training scripts that fit on a desk, you can follow along with the MLX experiments — small architectures, small datasets, and a lot of “what happens if we try this on Apple silicon?” energy.